Welcome to the ninth lecture for the undergraduate social networks course at the University of Maine at Augusta, a lecture organized around the idea that social networks don’t just happen randomly. There are patterns in social networks which have a strong impact on human existence even if individual humans have little choice in the matter. Unless we are very unusual, our friends will have more friends than we do. Although a catastrophic event may seem to strike a small number of people, the nature of networks sends out a ripple effect that quickly touches us all. And while the old saying is true — birds of a feather do flock together — the intensity of that experience depends on the size of the flock you fly with. In this lecture, I walk through one of the reasons we study social networks and patterns like homophily in them: to understand why people in meaningful social situations form coalitions in the way they do.
Before you start this lecture, be sure to read Scott Feld’s “Why Your Friends Have More Friends Than You Do,” Bernard et al’s “Estimating the Ripple Effect of a Disaster,” and James Moody’s “Fighting a Hydra: A Note on the Network Embeddedness of the War on Terror.”
After you complete your reading and this lecture (including the participation element), be sure to complete your seventh homework assignment. For this homework assignment, due November 12, you should complete two steps. In Step One, identify an event (not including those described in the readings) in which a particular number of people were involved. Describe the event and how many people were involved. Using the techniques described by Bernard and Moody, identify how many people we can expect to find at a network distance of 1 from the event. Then identify how many people we can expect to find at a network distance of 2 from the event. In Step Two, post your work in the form of a single word processing document to our course Blackboard page, in the appropriate assignments section labeled “Homework #7: Trace the Ripples.”
The first pattern in social networks we’ll describe this week is homophily. The homophily principle declares that the probability of a social tie between two individuals becomes smaller the more different those two individuals are from one another in some socially-salient characteristic such as age, income, religion, racial/ethnic group or geographic location.
Sociologists Miller McPherson and Lynn Smith-Lovin (1987) identify two kinds of explanations for the existence of homophily in social relations. The theory of choice homophily supposes that people associate disproportionately with similar others because human beings prefer (for rational or irrational reasons) similar others. If choice homophily holds true, then if people enter into a room of similar and dissimilar strangers, they will seek out those who are like themselves and avoid those who are unlike them.
Induced homophily theorists begin with the observation that we rarely enter a room, a church, a neighborhood, a school, a business, or any other environment that is not in some way more homogeneous than the general population. In other words, the places and circumstances in which we meet people are already segregated (by law, through bias, or through unintentional processes that sort people). Induced homophily can occur because we form social ties with the people we encounter, and because the people we encounter are like us. No psychological preference is required for induced homophily to occur.
In the video below, we’ll look at some concrete and observable examples of homophily in real life. While homophily occurs in physical space, it also occurs in a virtual environment called “Blau space,” in which the main dimensions are not length, width or height but rather are social categories like age and education. Moving on, we consider how quickly a sense of personal connection to an event ripples through a society, traveling along the lines of pre-existing social ties. This is a fairly long video, comprising a significant portion of this week’s lecture:
Participate: Tell Me a Story About Homophily or Heterophily!
Do “birds of a feather flock together,” as the saying goes, or do “opposites attract,” as another popular saying has it? Homophily is the word we use to describe the former, and heterophily is the word we use to describe the latter. For this week’s student participation, I’d like you to tell a story about yourself or about someone you know in which either social ties formed between people who were similar (homophily) or who were dissimilar (heterophily). What kind of similarity or dissimilarity was involved, and for what characteristic? How did the homophilous or heterophilous ties form? Do you think that similarity or difference itself was the primary reason that these social ties formed, or was some other force at work?
If you can, please use the Padlet below to share your story. Remember, though, that this is a public website, so I’d like you to use your pseudonym rather than your real name. If for a technical reason you can’t get the Padlet below to work, please complete this week’s class participation by writing your contribution in the comments box at the end of this lecture.
Bringing Together Multiple Networks of Affiliation to Assess Homophily in Action: QAP Regression
This next section of the lecture is fairly advanced, having to do with an analysis technique called “QAP Regression.” I won’t expect you to produce your own QAP regression or to be able to discuss it on a test. This sections of the lecture is for students who want to tackle something tough, who want to take their social network analysis a bit beyond the undergraduate level. Are you about ready to graduate and thinking about grad school? Do you have a research project underway, perhaps for another class, that could benefit from some more advanced social network analysis? Are you just curious? Then by all means read on! If you don’t feel ready for this section, however, that’s fine — just move on to the final sections of the lecture having to do with ripple effects and the class size paradox.
If you’re still with me… let’s jump in!
So far in this semester, we’ve talked about creating social networks with direct information about the social ties that bind nodes together. But as we’ve learned, other networks can be generated indirectly, through information about shared memberships in organizations, shared location, and similarity of various sorts. For any single set of nodes, many different networks exist, a property known as multiplexity.
To consider what multiplexity looks like in real life, and how multiplexity can be used in a practical way to predict behavior in social networks, let’s look at some real-life networks regarding some recent political players here in the state of Maine. Take a look at the following individual-level data regarding recent members of the past 127th Maine State Senate of 2015-2016, available in Microsoft Excel (.xslx) and comma-delimited (.csv) formats:
Although this data is organized at an individual level, there are relations implicit within it. Let’s make those relations explicit.
Pulling out four columns from the dataset, let’s start with county:
Because some of the 35 members of the Senate represent districts in the same county, they can be seen as tied to one another in a network in which “represents the same county” is the relation. To get there, let’s recognize each county as its own “focus” (remembering Scott Feld’s article from earlier in the semester), give each county focus a column in an affiliation matrix in which 1=representing that county and 0=not representing that county, and finally convert this two-mode affiliation matrix into a one-mode adjacency matrix (following Lecture 7). The affiliation matrix of county representation for Senators would look like this:
… and the adjacency matrix of senators connected through joint county representation would look like this…
… and finally a network graph of the adjacency matrix would look like this:
Eight senators on the left are isolates because they are the only senators to represent their counties. The other 27 senators are connected to some of their peers by county.
Moving on, membership in gender and party groups can be thought of as individual traits:
But sharing those traits is like sharing group membership. This means that we can use the techniques of working with an affiliation matrix to construct a gender similarity network. I won’t burden the lecture with too many large affiliation matrices (and you know how to make one for yourself), so I’ll just link to a .csv file containing the affiliation matrix of senators tied by gender, then share its graph here…
and link to a .csv file containing the affiliation matrix of senators tied by party, then share its network graph here…
Finally, we might want to consider the consequential political behavior of our Maine State Senators over the last two years. The last portion of our individual-level database places a “1” in a cell if a Maine State Senator has cosponsored (signed on in formal support of) any of 26 bills (“LD” stands for “legislative document”) that make reference to domestic violence:
|LD 150||LD 199||LD 552||LD 574||LD 600||LD 657||LD 861||LD 921||LD 1019||LD 1037||LD 1100||LD 1155||LD 1268||LD 1375||LD 1381||LD 1402||LD 1497||LD 1526||LD 1531||LD 1563||LD 1603||LD 1606||LD 1631||LD 1639||LD 1643||LD 1674|
Support for a bill is like joining a group. This means that we can transform the above 2-mode information into 1-mode adjacency matrix in which the cells in the matrix will tell us how often two members of the Maine Senate cosponsored bills regarding domestic violence together. You can look at a .csv version of the resulting affiliation matrix here, and see it graphed here:
In previous network graphs, ties were either present (“1”) or absent (“0”), but this last network graph represents tie strength (the number of bills cosponsored by both of a pair of senators) by drawing the ties between nodes more thinly or thickly. The maximum tie strength in the network, 3, represents the three bills mentioning domestic violence that are cosponsored by Senator Volk and Senator Burns.
Now here’s the kicker: I’ve just shown you the multiplex relations between Maine state senators, the multiple different ways in which the same senators are tied to one another. What if one of these ways in which senators are connected to one another predicts another kind of connection?
What if being of the same gender predicts the number of bills regarding domestic violence two legislators cosponsor together? That outcome would show gender homophily in political relationships.
What if being in the same political party predicts the number of bills regarding domestic violence two legislators cosponsor together? That outcome would show partisan homophily in political relationships.
What if representing the same county predicts the number of bills regarding domestic violence two legislators cosponsor together? That outcome would show geographic homophily in political relationships.
These are hypotheses, and we can test them using an approach called “QAP Regression” that I discussed in a video during last week’s lecture. I’m not going to ask you to run your own QAP regression analyses to gain credit for this class, but you could do it. In case you’re interested, I’m going to show you how using the research program we’ve been using, R.
First, install the R package “sna.” You learned how to do this in Lecture 5.
Then, for a set of four networks like we’ve got, run the following script:
#Load Same-County Network
#Load Same-Gender Network
#Load Same-Party Network
#Load Number-Joint-Cosponsorships Network
#Join The Predicting Matrices Together
predicting_matrices[1,,] <- county_matrix
#Now Run a QAP Regression called model and Show the Results
The “loading” sections of this script work just like the loading sections of scripts you’ve written before, except that this time, we’re bringing in four matrices (for the four kinds of networks connecting senators that we reviewed above). After the “predicting matrices” (the county, gender and party similarity matrices) are joined together, the “netlm” command runs a QAP regression. The results look like this:
To interpret these results fully, you’d need to take a statistics class. But to cut right to the chase, let’s just look at three columns in the results. The first column labels rows “x1,” “x2,” and “x3”. These refer to county (the first matrix in the script), gender (the second matrix in the script), and party (the third matrix in the script). The next column, “Estimate,” shows the size of the effect of each of these networks in predicting our outcome network (the number of shared cosponsorships). If you have taken statistics, you might think of these as slopes. The larger the slope estimate, the bigger the effect of this network on the outcome we’re interested in (cosponsorship).
But most importantly, look at the last column. Multiplied by 100, it provides the percent chance that a randomly generated network would predict the outcome so well as the actual network (x1, x2, or x3) you’ve supplied. Typically, if this value (called a p value) is 0.05 or smaller, researchers accept the result as large enough to trust.
This is where the punch line comes in. The p value for x1 (being in the same county) and x2 (being of the same gender) is far bigger than 0.05, indicating a pretty big chance that the effect seen in the “Estimates” column was just produced by. We CAN’T comfortably conclude that being in the same county and being of the same gender is associated with higher levels of joint support of domestic violence bills in the Maine Senate. However, we CAN comfortably conclude that senators who are in the same party jointly support more bills together. How many more bills? Look to the “Estimates” column. About 0.98 more bills, on average. In other words, members of the same party on average support just about 1 more domestic violence bill together than members of different parties do. There seems to be little geographic or gender homophily in the Maine Senate, at least in regard to domestic violence bills. Partisan homophily, on the other hand, is notable in the Maine Senate.
Social network researchers use the same technique to test ideas about what leads to personal relationships between people, or about what grows trade relationships between countries, or about what affects a person’s tendency to join a terrorist movement. QAP regressions move us beyond describing the networks that are around us to helping us explain how and why networks come to take the shape they do.
As I noted at the start of this section, I don’t expect you to master this part of the lecture. Nevertheless, I wanted to introduce it to you to show you where network research can take you: quite far.
Ripples, Walks and Distances
The discussion of the ripple effects of an event in this week’s readings by Bernard et al and James Moody is at base a discussion of network distance, which is a very particular sort of a walk in a network. We’ve already encountered the idea of network distance, but it’s so vital to the understanding of the ripple effect that we should review it again.
A walk is one way of thinking about distance in a social network, and always refers to a sequence of adjacent nodes. A walk can be imagined as trip taken between two points: an origin node and a destination node in a graph.
Consider this sociogram, depicting relations between Al, Betty, Clem, Daphne, Edna and Frank. In this graph, a walk from Clem to Betty might follow the path Clem -> Daphne -> Al -> Betty. Such a walk would have a length of 3 (the number of intervening lines in the walk). This is the shortest possible walk between Clem and Betty.
Longer walks from Clem to Betty are possible. Consider the walk Clem -> Daphne -> Al -> Frank -> Betty, which has a length of 4. We can create even longer walks than that, because one node can appear more than once in a walk, just as people can take walks treading over the same piece of physical space again and again. The walk Clem -> Daphne -> Edna -> Daphne -> Al -> Frank -> Betty has a length of 6.
One of the reasons walks are called “walks” is that sociograms draw upon graph theory, a field of mathematics with an interesting anecdotal history. As you’ll recall from your reading earlier in the semester, social network analysis grows out of graph theory, which in turn was developed by mathematician Leonhard Euler. Euler simplified the complicated urban geography of the city Königsberg into land masses (nodes) and bridges (lines) in order to answer the question, “Can one walk over all seven bridges of Königsberg without crossing the same bridge twice?” The answer as it turns out, is no.
The idea of distance in a network stems from the idea of a walk. The distance between two nodes in a network involves a particular kind of walk: the shortest possible walk between them. In other words, the distance between two nodes is the smallest number of intervening ties one would need to travel in order to get from one node to the other.
In the sociogram below, there are many walks of different lengths that would take us from Clem to Betty…
Clem-Daphne-Al-Frank-Betty: Length 4
Clem-Daphne-Edna-Daphne-Al-Betty: Length 5
Clem-Daphne-Edna-Daphne-Al-Frank-Al-Frank-Betty: Length 8
… but the shortest walk from Clem to Betty (Clem-Daphne-Al-Betty) has a length of 3, and so we would say that the distance between Clem and Betty is 3.
The shortest distance between nodes in a network is also called the geodesic distance.
To move from distance and walks to ripples, make sure you watch the video in the first part of this lecture! I walk through calculation of ripple effects starting at about minute 36.
Your Friends Have More Friends Than You Do
When reading Scott Feld’s Why Your Friends Have More Friends Than You Do, the most important figures to look at are Figure 1 and Table 2:
In Table 1 Feld discusses the sociogram from Figure 1, developing an argument across three columns. Let’s follow his example. In column 1, Feld tells us how many friends each person has: Betty has one friend, as you can see, and Sue has 4 friends. In column 2, Feld looks at each person’s friends to see how many friends they in turn have. Betty only has Sue for a friend, and Sue has 4 friends, so the total number of friends of Betty’s friends is 4. Sue, on the other hand, has 4 friends: Betty, Dale, Pam and Alice. Betty has 1 friend, Dale has 3 friends, Pam has 3 friends and Alice has 4 friends. 1+3+3+4=11. The total number of friends of Sue’s friends is 11. Moving on to column 3, Feld calculates how many friends on average a person’s friends have. Sue’s 4 friends have a total of 11 friends, but they’re divided between the 4 of them. The average friend of Sue has 2.75 friends (that’s 11 divided by 4).
You’ll notice that Feld’s social fact — that “Your Friends Have More Friends Than You Do” — is not true in all individual cases, but instead is a trend across all cases. Sue is an exception to the rule; she has 4 friends, but on average her friends have only 2.75 friends. Alice also has slightly more friends than her average friend does. But there are many more people in the network who have fewer friends than their friends do. At the bottom of Table 1, Feld shows us that the average person in this network has 2.5 friends — but that the average number of friend a person’s friends has is 2.99. On average across the network, a person’s friends tend to have more friends than they do.
Use this video to practice Feld’s technique on a different network. Pausing the video before the end, can you describe how many friends the average node has? Can you describe how many friends a node’s average friend has?
In an article for Psychology Today, Satoshi Kanazawa colorfully extrapolates from Feld’s finding to a very particular kind of friendship, concluding that “your lover has had more lovers than you have”:
The reason is the same. There are 12 men who have had a lover who has had (or will have had) 12 lovers, but there is only one man who has had a lover who has had only one lover. But you should be grateful. The reason you got to be her lover in the first place is because she has had (and will likely have) many lovers. You are 12 times as likely to have sex with a woman who has had 12 lovers as you are to have sex with a woman who has had only one lover. Quite paradoxically, if your lover only had one lover, you are probably not him. And if your lover has never had a lover, you are definitely not him.
As Feld points out, the fact that on average your friends have more friends than you do is a particular incidence of the general class size paradox. Differences result in the perception of events based on whether we consider events from the point of view of the typical event or the typical person participating in an event. Let’s take Feld’s example from school: the class sizes at a university.
Imagine that at University X there are four classes, with 5, 5, 10 and 20 students in them respectively. The average class has a size of 10 students. But the typical student is not likely to be in a small class; she or he is most likely to be in a large class, because that’s where most of the students are. The average student at University X is in a class of 13.75 students. The average student is in a larger than average class. Failure to understand this structural difference leads to misunderstanding. While university students complain that they seem to always be stuck in gigantic classes, university administrators point out that most of their classes have relatively few students in them. Both are correct, and that’s the essence of the class size paradox.