Welcome to the penultimate (look it up!) lecture for COM/SOC 375: the Social Networks class of the University of Maine at Augusta. This week, we continue to consider the practical implications of social networks, turning from political, business and online uses of networks to the intrusive and manipulative use of social network analysis as a means of surveillance. Threats to privacy that emerge when we connect to others online. Even when we take pains to hide our names, the choices we make in our connections to others threaten to reveal us. When data miners can sift through many thousands of accounts to look for patterns in our associations, revelations made by our peers can tag us as secret deviants, even when we keep our own deviation a tightly-guarded secret. The more that we reveal our contacts online, the more our personal privacy fades away. Those who can track our associations can influence our fate. Over the past decade, a great deal of information about the newly vast nature and extent of government surveillance over communications and associations (the two building blocks of networks) has been revealed. At the highest levels of government, the tools of social network analysis are being used — to what end? Given the power of social network analysis to uncover and predict behavior, what are the ethical limits for its use?
In addition to reading this lecture, be sure to read the following materials on social network analysis and surveillance:
- Ewen MacAskill and Gabriel Dance: NSA Files Decoded: What the Revelations Mean for You
- Washington Post: CO-TRAVELER: How the NSA is Tracking People Right Now
- Kieran Healey: Using Metadata to Find Paul Revere
- Kristan J. Wheaton and Melonie K. Richey The Potential of Social Network Analysis in Intelligence
In this lecture, we’ll consider the following subjects in a combination of text, and images (be sure to review them all):
- Back to the Fortune 500: The Importance of More Complete Samples
- Social Network Surveillance #1: Big Brother is Watching
- Social Network Surveillance #2: Mining for Identity
- So What Now?
Finally, but very importantly, it’s time for you to start studying for the second and final exam. This exam will be comprehensive, covering the entire semester’s worth of material. It is also open-book, meaning that you will be able to refer to all readings, lectures, and notes when completing your work. You may complete the exam at home. The final exam will be made available on Blackboard on December 10, and is due by 11:59 pm Eastern Time, December 16. Your final exam answers should be posted to Blackboard in the section entitled “Exams.”
Back to the Fortune 500: The Importance of More Complete Samples
In your homework of a few weeks ago, I asked you to obtain membership lists of boards of directors for 6 of the big Fortune 500 companies, and then to show an membership overlap network based on those membership lists. Most of you obtained networks in which very few (and often no) membership overlaps were visible.
Does that mean that there are very few to no membership overlaps in the Fortune 500? No, not at all! It only reveals that among your set of just slightly over 1% of nodes in the network, no ties existed. We get a very different picture when all of the memberships compiled by all students — crossing some 69 of the Fortune 500 — are graphed in one network:
It is still the case that not all of these 69 corporations are tied directly or indirectly to one another (forming a connected component), but many of them are, and some corporations (such as Caterpillar and Procter & Gamble) are more central in this network than others (such as Aflac or Starbucks).
This difference highlights the importance of observing as many members of a network as possible to gain an accurate view of the actual network structure as experienced in the real world outside of your list, matrix or graph. If this many more connections are apparent when we increase the subset of corporations observed from 6 to 69, how much better-connected do you think our network might be if we increased the set of included corporations in the full network to the entire 500 corporations of the Fortune 500?
Social Network Surveillance #1: Big Brother is Watching
While discussion in previous weeks of this course focused on the practice of for-profit businesses in mining people’s online information, it has been revealed over the last three years that the governments of the world are among the primary seekers of your social media identity, with the U.S. government leading the way. Declaring that “You need the haystack to find the needle,” National Security Agency (NSA) chief and U.S. military General Keith Alexander has confirmed that the NSA has been running a massive but secret program to collect and store a variety of information about our social network contacts over electronic media, including text-message contact lists, e-mail contacts, discussion board usage, Facebook friendship, visits to major social networking sites and sexually-explicit video chats. The targets are hundreds of millions of people who have no criminal record and who are not suspected of wrongdoing. The information is obtained without warrants. The goal of this surveillance is to construct a model of everyday citizens’ social contacts, favorite activities and habits. In an indication of the value of this information, the NSA has paid Facebook well for its trouble.
The massive data mining activity is being justified as a response to the threat “of valid terrorist targets” meant to predict the likelihood of insurgent or terrorist activity, but surveillance data is also being shared with the Drug Enforcement Agency, the Justice Department, the Secret Service and the Department of Homeland Security — not for catching terrorists, but for finding everyday criminals in the “haystack” of everyday Americans, even though such warrantless techniques for collecting evidence violate the Fourth Amendment to the Constitution. In the wake of revelations regarding the NSA’s activity, concerns about being exploited for profit are being overshadowed by concerns about abuses of power by an irresponsible government. Indeed, in the Fall of 2013 it was revealed that the NSA considered making public revelations of embarrassing online activity by nonviolent critics of the U.S. government in order to discredit those critics and their ideas. If those who can data mine our social media activities can predict and then reveal our most personal secrets, our independence as free individuals is threatened. When we go online, we are exposed.
Although evidence has emerged that the National Security Agency is collecting the content of our communications, for some time the U.S. government insisted that only the “metadata” regarding communications was being collected. The implicit assurance was that “metadata” — information about who communicates with whom, when and how — is relatively harmless. Kieran Healey’s “Using Metadata to Find Paul Revere” reveals the power of metadata in its satirical Colonial-era writing about real Colonial-era data using the very same methods of 2-mode network analysis that we covered earlier in this course. If the British government had obtained access to this information, and if they’d had the simplest of 2-mode network analysis technology at their disposal, they could have stopped the America Revolution in its tracks.
Keep in mind, however, that 2-mode network analysis is not the only technology available to governments today. As Kristan Wheaton and Melonie Richey point out in “The Potential of Social Network Analysis in Intelligence,” the calculation of betweenness centrality and identification of cut points are two other basic social network methods that can reveal a great deal about the influencers in a network, all without anything more than meta-data at an analyst’s disposal. Consider also the vast difference in scale between metadata regarding pubgoers in Boston and metadata regarding all people placing phone calls, or sending letters, or using social media, or sending text messages. Many, many, many more people are involved in the latter than go to a pub. As noted in Ewen MacAskill and Gabriel Dance’s multimedia presentation of “NSA Files Decoded: What the Revelations Mean for You,” the decision in the NSA surveillance program to pursue contacts of suspected troublemakers to within “three hops” is an implementation of snowball sampling within a prescribed social network distance. As you know from our earlier discussion of “ripple effects” across network distance, large numbers of people can be involved at what seem to be only a few “hops” of network distance. The Guardian’s helpful ripple effect calculator for Facebook networks with an average degree of 190 shows that for every primary subject of surveillance, the NSA “3 hops” standard puts more than 5 million other people under warrantless surveillance:
It’s not only the biggest of “Big Brothers” — the federal government — that is engaged in surveillance regarding your behavior. Just a few days ago, investigative reporting by the Bangor Daily News revealed that various police agencies at the state and local level within Maine have been using software to monitor citizens’ social media activity. The Geofeedia surveillance software allows police agencies to monitor people according to the content of what they say, the characteristics of who they are, and the structure of the connections they make to other Mainers. This recent Bangor Daily News article reports assurances by a South Portland police officer: “Some people misunderstand and think that we’re prying into people’s personal accounts, but it has nothing to do with that.” Actually, Geofeedia software has a great deal to do with that, as screen shots of Geofeedia’s feature set show:
As you know from a semester’s worth of social network analysis, a great deal can be determined about a person from the nature of their demography, content of their speech, and the structure of their personal associations. The United States Constitution declares in its First Amendment that the right of the people to peaceably assemble and speak their minds shall not be infringed. Is that right restricted when free association and free speech are regularly monitored for evidence of prosecutable offense?
Social Network Surveillance #2: Mining for Identity
How exposed are our lives to surveillance? It turns out that modern-day spies don’t need billions of dollars or the coercive power of government to conduct surveillance. All it might take is one well-placed app. Even when surveillance doesn’t gather complete information about you, a bit of clever sleuthing can uncover the juiciest of personal details about your life.
For one indication of the potential for surveillance on the sly and on the cheap, you can visit Take This Lollipop — a web page that apparently takes its name from the adage that you never, ever should take candy from strangers. The Take This Lollipop web page asks you to authorize a Facebook app with innocuous-seeming privileges very similar to the privileges for Candy Crush Saga, Farmville or Slotomania. It then shows you exactly what authorizing a Facebook app does — anyone behind that app can find out where you work, where you eat, where you live, who your girlfriend or boyfriend is, and more. For someone who wants to mess with your life, finding out more about your relations and associations can be unnervingly easy.
We only provide data to our advertising partners or customers after we have removed your name and any other personally identifying information from it, or have combined it with other people’s data in a way that it no longer personally identifies you….
Your trust is important to us, which is why we don’t share information we receive about you with others unless we have:
- received your permission;
- given you notice, such as by telling you about it in this policy;
- or removed your name and any other personally identifying information from it.
Of course, for information others share about you, they control how it is shared…. When others share information about you, they can also choose to make it public.
Information that is always publicly available
The types of information listed below are always publicly available, and they are treated just like information you decided to make public:
This helps your friends and family find you. If you are uncomfortable sharing your real name, you can always delete your account.
- Profile Pictures and Cover Photos:
These help your friends and family recognize you. If you are uncomfortable making any of these photos public, you can always delete them. Unless you delete them, when you add a new profile picture or cover photo, the previous photo will remain public in your profile picture or cover photo album.
This helps you see who you will be sharing information with before you choose “Friends and Networks” as a custom audience. If you are uncomfortable making your network public, you can leave the network .
This allows us to refer to you properly.
Did you notice how the information that Facebook promises not to sell is the information it tells you will be always publicly available anyway? Conversely, the information Facebook hasn’t made available to everyone is the information that it intends to sell. Anyone who owns a public Facebook “page” representing a company, brand or organization can use this information to purchase tightly targeted advertisements. If I wished to advertise the UMA Social Media Certificate Program to residents of Augusta, Maine, I would start with this page, which asks me to identify certain combinations of demographic category, interest and places:
If I were to select for 25-45 year-old males in Augusta, Maine, Facebook would show me the number of those available to receive my advertisements and the number of those likely to respond:
The count of 2,400 people and a subset who might respond can be reduced further if I make certain selections. For instance, I can tell you that 7.9% of Augusta accounts are held by 13-18 year olds. 48% show an interest in education and 6.2% publicly report an interest in sex (with 58% of those being women). Among Augusta Facebook users, women interested in relationships with other women occur in three times greater numbers than men interested in men. All of a sudden, I’m not just advertising; I’m data mining the community of Augusta.
Is this kind of information any of my business? Regardless of whether it is my business or not I can find it, even as a non-Facebook-insider and with no special access. This is just one among a broader range of techniques available by which computer scientists with moderate analytical skill can accomplish two unnerving feats:
Feat #1. Deanonymization: taking anonymous social media accounts and figuring out the identity of people behind them. Yates, Shute and Rotman (2010) were able to unmask the identity of three “pseudonymous” bloggers — that is, bloggers writing under fake names. When the authors interviewed these bloggers, they found out that each had good reason to conceal his or her identity. “Quirky Slut,” one of the three bloggers, wrote about her active and varied sex life and worried about whether her activities would cause problems in her social life if her identity was revealed. By paying close attention to occasional odd pieces of information mentioned in Quirky Slut’s blog, and by purchasing information on Americans gathered from various sources by the Alesco Data Group, Yates et al were able to pin down the likely identity of this blogger, even though she had never revealed any names, contact information, phone numbers or other unique information about herself. It was only the combination of various general pieces information that nailed down the identity of Quirky Slut. From Yates et al p. 5:
In the FAQ page of her blog she wrote “I live in the Albuquerque, NM area.” No more specific geographic detail could be found. The greater Albuquerque metropolitan area is comprised of 44 different zip codes, so by living in a big city and being consistently vague, Quirky Slut is actually doing a pretty good job of protecting her anonymity.
On September 19, 2007 Quirky Slut wrote “My birthday is over. So long teenage years.” She had a previous post on September 17 in which she made no mention of her birthday, so September 18 is most likely the day. She most likely turned 20 that year, making her birth year 1987. Later, on March 25, 2009 she confirmed the year when she described an upcoming vacation. “We can really enjoy Las Vegas since we’re both 21 now,” she wrote.
In her entry on August 19, 2009, Quirky Slut disclosed her marital status when she wrote “I’m not married, nor am I attached to anyone.” She revealed her dwelling size on April 19, 2007 when she described her living arrangements, “Well, I sort of live with my parents but I live in an apartment above the garage they used to rent to students.” This will actually turn out to be the crucial piece of access enabling information that will yield a high
probability of uniquely identifying Quirky Slut.
The following criteria were used to create an Alesco leads list:
• Zip Code: 44 selected for the entire Albuquerque area
• Age: 20-21
• Gender: Female
• Marital Status: Single
• Dwelling Size: Single Family Home
The returned list included just 72 names.
Of those 72 names, just one name had a birthday of September 18, 1987. Quirky Slut’s blog has since been taken offline.
Yates et al (2010) describe an informal process for stripping away a few social media users’ anonymity. Other academics have developed bulk methods for discovering the names of large numbers of people hiding behind anonymous social media accounts. Narayanan and Shmatikov (2009) reveal a method for obtaining the identities of anonymous Twitter users who also use the photo-sharing service Flickr. By looking for similarities in the patterns of connections to others made in the Flickr and Twitter networks, the authors were able to identify anonymous Twitter users with a success rate of 88%.
Feat #2. Inferring Data: accurately predicting very personal details about people’s relationships, political beliefs, religion, and sexual preferences even though those people have not revealed such information in their social media usage. These are sorts of information that people can spend a great deal of energy trying to keep private. The “How Hetero?” campaign of Stockholm Pride uses a combination of word choice in Tweets and Twitter followers to give each Twitter user a “% Hetero” score that is perhaps meant to provoke reflection more than it is meant to be accurate:
“How Hetero” may have been a stunt, but other more serious efforts find success in uncovering sensitive information. Zheleva and Getoor (2009) are able to accurately predict characteristics of the social media users of Flickr, Facebook, Dogster and Bibsonomy — characteristics that those users didn’t share over social media — by looking at the presence of that characteristic among the other members of the groups joined by those users. Volkova and colleagues at Microsoft Research have recently (2015) published evidence of progress in their project to predict various aspects of individual identity, demography and psychology on the basis of the text of Twitter posts. Birds of a feather flock together; if you know the flock, you know the bird in the flock. Crandall et al (2010) report success in determining which Flickr users have personal relationships with one another by simply looking for pairs of accounts whose photographs occur at the same time and place over and over again. Other social media sites that register location at particular times, like FourSquare, can be turned into tools for uncovering possibly secret social relations.
A pair of Stanford researchers suggest that once basic metadata regarding a person’s activities have been obtained by data thieves, the quest to remain private may be futile. Starting with public concern regarding the collection of metadata by the government, Jonathan Mayer and Patrick Muchler show in “Metaphone: The Sensitivity of Telephone Metadata” that freely available (or easily purchased) information about individuals can wreak havoc on privacy. In combination with phone record metadata, these databases can uncover intensely private details regarding a person’s life. After obtaining nothing more than call records from some voluntary participants, Mayer and Muchler were able to find out the following in their research:
“Participant A communicated with multiple local neurology groups, a specialty pharmacy, a rare condition management service, and a hotline for a pharmaceutical used solely to treat relapsing multiple sclerosis.
Participant B spoke at length with cardiologists at a major medical center, talked briefly with a medical laboratory, received calls from a pharmacy, and placed short calls to a home reporting hotline for a medical device used to monitor cardiac arrhythmia.
Participant C made a number of calls to a firearm store that specializes in the AR semiautomatic rifle platform. They also spoke at length with customer service for a firearm manufacturer that produces an AR line.
Participant E had a long, early morning call with her sister. Two days later, she placed a series of calls to the local Planned Parenthood location. She placed brief additional calls two weeks later, and made a final call a month after.”
Through nothing more than our phone metadata, we reveal ourselves deeply.
The genie is out of the bottle. Barring an apocalypse, there will be no retraction of communications technology, and now that the techniques of data mining and social network analysis have been developed, there will always be people who are able and willing to use them for unsettling ends. So what do we as informed and educated citizens do now? What can reasonably, ethically and feasibly be accomplished to protect a person’s privacy and freedom in the network age? I don’t know a definitive answer to this question; can you come up with one? For this week’s discussion, do your best in the Padlet below:
If you cannot use this Padlet, share your answer in the comments section for the lecture, which you can find at the bottom of this page.
Finally, I’d like to remind you that other than participation in this week’s lecture using the above Padlet, there is no homework this week. Take the time to start to review social network material from recent weeks to prepare yourself for the take-home exam that takes place at the end of class. We’ll do a review and wrap-up of material next week, and in your last week of class you’ll complete that exam.
Crandall, David J., Lars Backstrom, Dan Cosley, Siddarth Suri, Daniel Huttenlocher and Jon Kleinberg. 2010. “Inferring Social Ties from Geographic Coincidences.” Proceedings of the National Academy of Sciences 107(52): 22436-22441.
Narayanan, Arvind and Vitaly Shmatikov. 2009. “De-anonymizing Social Networks.” 30th IEEE Symposium on Security and Privacy.
Mayer, Jonathan and Patrick Mutchler. 2014. “Metaphone: The Sensitivity of Telephone Metadata.” Web Policy: March 12. Accessed 04-01-2014 at http://webpolicy.org/2014/03/12/metaphone-the-sensitivity-of-telephone-metadata/.
Narayanan, Arvind and Vitaly Shmatikov. 2010. “Privacy and Security: Myths and Fallacies of ‘Personally Identifiable Information.’ Communications of the ACM 53(6): 24-26.
Volkova, Svitlana, Yoram Bachrach, Michael Armstrong, and Vijay Sharma. 2015. “Inferring Latent User Properties from Texts Published in Social Media.” In Association for the Advancement of Artificial Intelligence, pp. 4296-4297..
Yates, Dave, Mark Shute and Dana Rotman. 2010. “Connecting the Dots: When Personal Information Becomes Personally Identifying on the Internet.” ICWSM. 2010.
Zheleva, Elena and Lise Getoor. 2009. “To Join or Not to Join: the Illusion of Privacy in Social Networks with Mixed Public and Private User Profiles.” International World Wide Web Conference 2009: 531-540.