Social Media Data Mining with Raspberry Pi: 9 Videos for the Complete Beginner

Since the start of this year, I’ve been working on a project to take a $30 Raspberry Pi 2 computer turn it to create a social media data mining machine using the programming language Python. The words “programming language” may be off-putting, but my goal is to work through the process step-by-step so that even a complete beginner can follow along and accomplish the feat.

The inexpensive, adaptable $30 Raspberry Pi 2I’m motivated by two impulses. My first impulse to help people gain control over and ownership of the information regarding interaction that surrounds us. My second impulse is to demonstrate that mastery of social media information is not limited to the corporate, the government, or the otherwise well-funded sphere. This is not a video series for those who already are technologically wealthy and adept. It’s for anyone who has $30 to spare, a willingness to tinker, but the feeling that they’ve been left out of the social media data race. I hope to make the point that anyone can use social media data mining to find out who’s talking to whom. The powers that be are already watching down at us: my hope is that we little folks can start to watch up.

I’m starting the project by shooting videos. The video series has further potential, but has proceeded far enough along to represent a fairly good arc of skill development. Eventually I’d like to transcribe the videos and create a written and illustrated how-to pamphlet; these videos are just the start.

Throughout the videos, I’ve tried not to cover up the temporary mistakes, detours and puzzling bugs that are typical of programming. No one I know of hooks up the perfect computer system or writes a perfect program on the first try. Working through error messages and sleuthing through them is part of the process, and you’ll see that occasionally in these videos.

Please feel free to share the videos if you find them useful. I’d also appreciate any feedback you might have to offer.

Video 1: Hardware Setup for the Raspberry Pi

Video 2: Setting up the Raspberry Pi’s Raspbian Operating System

Video 3: Using the Raspberry Pi’s Text and Graphical Operating Systems

Video 4: Installing R

Video 5: Twitter, Tweepy and Python

Video 6: Debugging

Video 7: Saving Twitter Posts in a CSV File

Video 8: Extracting and Saving Data on Twitter URLs, Hashtags, and Mentions

Video 9: Custom Input

Graphing #MEPolitics, the Maine Politics Twitter Network

On the social media platform Twitter, users post messages of 140 characters or less. Those messages can include links to web pages or communications to other Twitter accounts using the @ (“at”) sign. When a # sign is placed in front of a word in a Twitter post, the word becomes a “hashtag” and that post is added to a stream of all other posts using the same hashtag. Direct mentions and replies build pair bonds in the Twitter environment; hashtags build community.

For years, people interested in discussing Maine politics have used the #MEPolitics hashtag to broadcast, to speak and to listen. As Election Day 2014 approaches, volume of chatter on the #MEPolitics hashtag has increased. Who’s speaking most? Who is speaking to whom (and who isn’t)? What’s being talked about? To find out, I’ve gathered all posts (popularly called “Tweets”) using the #MEPolitics hashtag over the last weekend: October 24-26, 2014. The following is a graph of the resulting social network, in which each unique contributor to #MEPolitics is represented by a dot, each tie indicates that one contributor has mentioned or replied to another contributor in a Tweet, and contributors are placed closest to those in the network with whom they tend to communicate most:

Network of Twitter Posts using the #MEPolitics Hashtag from 10-24 to 10-26 2014. Ties indicate mentions or replies.

A few features of the #MEPolitics network are immediately apparent. First, nearly every one of the 603 participants in the #MEPolitics hashtag over the weekend is a communicator and not just a broadcaster; only 23 individuals posted Tweets during the period without referring to or being referred to in some way by another Twitter user (these are the loners colored light green in the lower-left of the graph). Second, most participants (565 out of 603 participants) are connected to one another either directly or indirectly in one giant conversation; the few unconnected conversations graphed in the lower-right corner are happening in small groups of 2 or 3. Third, the large conversation in which most Tweeters are participating is itself divided up into smaller clusters, in-groups whose members more frequently communicate with one another than with outsiders. These smaller clusters of conversation are color-coded in the graph above.

What’s going on inside those clusters of communication? To help clarify, I’ve depicted each Maine candidate for governor or federal office not as a simple dot, but rather using their profile picture. Also rendered by their profile images are the Twitter accounts of the Democratic Party and Republican Party of Maine. We can see from the graph that independent gubernatorial candidate Eliot Cutler and independent congressional candidate Richard Murphy are, not surprisingly, located in their own unique sub-community separated from the communities of discussion surrounding the major-party candidates. Perhaps more surprisingly, conversation involving Republican candidates is not embedded in a single Twitter community, but rather split among four sets. Indeed, both Senator Susan Collins and Governor Paul LePage have two Twitter accounts each, and each of their accounts is placed in its own commnunity. The Democratic Party and Democratic Party candidates, in contrast, are all located in the same sub-group of accounts. It is fair to say, at least in the context of Twitter communication and at least for this time period, that Maine Democrats have a more cohesive social media community than Maine Republicans.

A careful observer may notice the absence of one candidate and one party from this graph. Where is Republican congressional candidate Isaac Misiuk, for instance? Where is the Maine Green Independent Party, which is fielding a slate of 13 candidates in this cycle? The answer is that neither Misiuk nor the MGIP are included in the graph because neither participated in the #MEPolitics discussion, at least over the weekend.

Finally, there are some notable clusters of communication with non-party, non-candidate accounts at the center; these are indicated with a text label identifying the most central account of a cluster. M.E. McRider (BikinInMaine) is a conservative citizen (“Fighting the spread of the disease which is liberalism!“) who posted 130 provocative Tweets over the period, attracting 48 responses:

M.E. McRider bikinInMaine Twitter user declares Harry Reid officially a domestic enemy of the United States

On the left, blogger Bruce Bourgoine posted 46 Tweets over the weekend, a smaller number than McRider, attracting 36 responses:

Bruce Bourgoine posts a criticism of Rand Paul as a user of misinformation

The Kennebec Journal (KJ_Online) and Bangor Daily News (bangordailynews) are two Maine newspapers sitting at the center of their own circles of conversation. The Portland Press Herald, another prominent Maine Newspaper, isn’t in its own independent Tweeting group; rather, its Tweets are referred to predominantly by Democratic candidates and their followers.

Of course, it’s not just the structure of the #MEPolitics network that matters; the content of discussion this weekend matters too. With Election Day just a week and a half away, what subjects in Maine politics are being talked about the most? The ten most-used hashtags in last weekend’s #MEPolitics discussion were:

Top Ten Hashtags
1. #mepolitics: 2542 uses
2. #michaud2014: 386 uses
3. #michaud: 354 uses
4. #lepage: 320 uses
5. #hillaryclinton: 302 uses
6. #mike: 288 uses
7. #eliotcutler: 278 uses
8. #cutler: 246 uses
9. #maine: 224 uses
10. #poll: 206 uses

The weekend visit by Hillary Clinton on behalf of Democratic candidates and the race for Governor appear to have garnered the highest volume of attention. This pattern is borne out in a listing of the ten most linked-to web pages in #MEPolitics Tweets:

Top Ten Page Links
1. Story: Paul LePage leads polls: 45 links
2. Story: Michaud does best one-on-one: 24 links
3. Story: Hillary Clinton endorses Mike Michaud: 22 links
4. Editorial: the Governor’s race will determine health outcomes of sick Mainers: 21 links
5. Story: A retrospective on Mike Michaud’s record in the U.S. Congress: 14 links
6. Story: poll on bear baiting: 13 links
7. Video: Eliot Cutler asks Mainers to vote for someone else if he can’t win: 11 links
8. Story: Eliot Cutler benefits from out-of-state money: 11 links
9. Another Story: Eliot Cutler benefits from out-of-state money: 10 links
10. Michaud Campaign TV Ad: Cutler supporters who will vote for Mike Michaud: 10 links

Remember bear baiting? Although there are many letters to the editor being published about this controversial referendum, relatively few Twitter users are discussing the possible ban over social media. The subject of a bear baiting ban garnered only one link in the top ten links of the weekend. All other stories have to do with the race for the Blaine House.

You may notice a trend toward citing newspaper articles in the top ten link list. Let’s look at the ten most linked-to domains for a deeper look:

Top Ten Domains
1. 181 links
2. 109 links
3. 93 links
4. 37 links
5. (Kennebec Journal): 31 links
6. 22 links
7. 16 links
8. 16 links
9. 14 links
10. 14 links

Newspaper links are indeed the most popular, with the Portland Press Herald, the Bangor Daily News, the Kennebec Journal and the Lewiston Sun-Journal gaining spots in the top 10. Social media sites are also quite popular, with YouTube, Huffington Post and the blogging platform Blogspot representing the form. Campaign websites for Paul LePage and Mike Michaud make the list (notably, Eliot Cutler’s page does not). The final entrant in the top ten list of linked sources is the website, which proposes a new Constitutional Convention to amend the U.S. Constitution. Tweets mentioning this website consist almost entirely of posts made by M.E. McRider (handle @BikinInMaine) and responses to these posts.

McRider has made an impact this weekend in an otherwise election-centric week, and that impact is felt in discussion as well. Some Twitter users might elevate the salience of their favorite websites by simply posting a link again and again, a kind of anti-social behavior that some say borders on spamming. Yet McRider elicited responses as well, as evidenced by this last list of the ten most mentioned or replied-to accounts:

1. Mike Michaud (Democratic candidate for Governor)
2. Hillary Clinton
3. Eliot Cutler (Independent candidate for Governor)
4. Maine Democratic Party
5. Amy S. Fried, University of Maine political science professor and political columnist
6. Shenna Bellows (Democratic candidate for Senate)
7. M.E. McRider
8. Paul LePage (Republican candidate for Governor)
9. Bangor Daily News
10. Randy Billings, reporter for the Portland Press Herald

Last weekend, these were the speakers closest to the center of Maine political discussion on Twitter.

Methodological note: analysis and visualization was performed using NodeXL, a free and open-source plugin for Microsoft Excel that makes social media analysis accessible to almost anyone with a computer.

Finding and Extracting Variables from Web Pages with PHP: A How-to for Social Scientists in the Rough

“Data Mining”: Just Another Way for Social Scientists to Ask Questions

If social science is the study of the structure of interactions, groups and classes, and if interactions, groups and classes are increasingly tied to the online environment, then it is increasingly important for social scientists to learn how to collect data online. Fortunately, the approach to “data mining” online interaction is fundamentally the same as the approach to studying offline social interaction:

  1. We approach the subject,
  2. We query the subject, and
  3. We obtain variables based on the responses we’re given.

Because the online environment and our online subjects are different, the way we make online queries must be different from the way we make offline queries. In data mining we don’t question human beings who can flexibly interpret a question; instead, we question computers responsible for the architecture of the online social system, and they will only respond if questioned in precisely the right way.


Learning to Mine the Web for Social Data — Without a Computer Science Degree

I’ve been trying to learn how to mine social information from websites on my own, without the benefit of any formal education in computer science.  This is kind of fun even when it’s frustrating, as long as I remember that getting information from the online environment is like solving a puzzle.  On most websites, social information (relations, communications, and group memberships) is stored in a database (like XMLSQL or JSON); some content management software (like WordPress, Joomla or Drupal) takes the information stored in a database and posts it on web pages, surrounded by code that makes the information comprehensible to humans like you and me.  If websites are researcher-friendly, they allow databases to be queried directly through an Application-Programming Interface (API).

Many websites don’t let a person query their databases, even when all the information published on those websites is public.  What’s a social scientist to do?  Well, we could literally read each single web page, find the information about relations, communications and group memberships we’re interested in, write down that information, and enter it into our own database for analysis.  We could do this, hypothetically, but at the practical scale of the Internet it’s often impossible.  Manually collecting interactions on a website with 10,000 participants could take years — and by the time we were done, there would be a whole new set of interactions to observe!

Fortunately, because web pages on social websites are written by computers, there are inevitably patterns in the way they’re written.  Visit a typical page on a social media website and use your browser’s “View source” command to look at the raw HTML language creating that page.  You’ll find sections that look like this:

<div class=”post” postid=”32“><div class=”comments”><a name=”comments”></a><h3>3 Comments on “Lucille’s First Blog Post”</h3><div class=”commentblock”>
<div class=”comment” id=”444“><a href=”/member.php?memberid=”201” usertitle=”Tim – click here to go to my blog”> Tim</a>: Greetings! How are you, Lucille?</div>
<div class=”comment” id=”445“><a href=”/member.php?memberid=”1181” usertitle=”Lucille – click here to go to my blog”> Lucille</a>: Hey, Tom. I’m new here. How do I respond to your comment?</div>
<div class=”comment” id=”446“><a href=”/member.php?memberid=”201” usertitle=”Tim – click here to go to my blog”> Tim</a>: Congratulations, Lucille, you just did!  Welcome to the community.</div>

That may look like a cluttered mess, but if you look carefully you can find important information.  Some of that information is the content that users write.   Other pieces of information track posts, comments and users by number or name. These names and numbers (indicated in red above) can be thought of as social science variables, and encouragingly they’re placed in predictable locations in a web page:

variable preceded by followed by
post id <div postid=” “><div >
comment id <div id=” “><a href=”/member.php?
member id member.php?memberid=” ” usertitle=”
member name  usertitle=”  – click here to go to my blog

There should be a set of rules for finding these predictable locations, and my goal in data mining is to explain those rules in a computer program that automatically reads many pages on a website, much faster than I can read them.  In English, the rules would look like this:

“Find text that is preceded by [preceding text] and is followed by [following text].  This text is an instance of [variable name].”

Unfortunately, computers don’t understand English.  I am familiar with a language called PHP that can read lines of a web page.  I didn’t know of a command in PHP that would let me carry out the rule described above.  What to do?  Ask a friend.  I asked a friend of mine with a PhD in Computer Science if he could identify such a command in PHP. His answer: “Well, you don’t want to use PHP. The first thing to do is teach yourself Perl.” The Perl programming language, he went on to explain, has much more efficient and flexible approach to handling strings as variables, and if I was going to be serious about data mining efficiently, I should use Perl.

I can’t tell you how many times some computer science expert has told me I shouldn’t follow a path because it was “inelegant” or “inefficient.”  Well, that may be wonderful advice for professional computer programmers who have to design and maintain huge information edifices, or to those who have a few extra semesters to spare in their learning quest, but in my case I say a hearty “Baloney!” to that.  Research does not need to and often cannot wait for the most efficient or elegant or masterful technique to be mastered.  Sometimes the most important thing to do is to get the darned research done.

In my case, this means that I’m going to use PHP, even though it may not be elegant or efficient or flexible or have objects to orient or [insert computer science tech phrase here].  I’m going to use PHP because I know it and it will — clumsily or not — get the darned job done.  Good enough may not be perfect but it is, by definition, good enough.  As long as the result is accurate, I can live with that.


A Rough but Ready Method for Extracting Variables from Web Pages with PHP — Explode!

It took a bit of reading through PHP’s online manual, but eventually I found a method that works for me — the “explode” command.  In what follows, I’m going to assume that you are familiar with PHP.  If you aren’t, that’s OK — you’ll just have to find another way to extract information out of a web page.

The PHP command “Explode” takes a string — a line of text in a web page — and splits it into parts.  “Explode” splits your line of text wherever a certain delimiter is found.  A delimiter is nothing more than a piece of text you want to use as a splitting point.  Let’s use an example, the web page snippet listed above:

<div class=”post” postid=”32″><div class=”comments”><a name=”comments”></a><h3>3 Comments on “Lucille’s First Blog Post”</h3><div class=”commentblock”>

<div class=”comment” id=”444″><a href=”/member.php?memberid=”201″ usertitle=”Tim – click here to go to my blog”> Tim</a>: Greetings! How are you, Lucille?</div>

<div class=”comment” id=”445″><a href=”/member.php?memberid=”1181″ usertitle=”Lucille – click here to go to my blog”> Lucille</a>: Hey, Tom. I’m new here. How do I respond to your comment?</div>

<div class=”comment” id=”446″><a href=”/member.php?memberid=”201″ usertitle=”Tim – click here to go to my blog”> Tim</a>: Congratulations, Lucille, you just did! Welcome to the community.</div>


Let’s say I’d like to look through 5,000 web pages like this, representing 5,000 individual blog posts.  In each of these 5,000 web pages, the particular post id and comment ids and member ids may change, but the places where they can be found and the code surrounding them remain the same.  We’ll use the code surrounding our desired information as delimiters.

To get really specific, let’s say I’d like to extract a member id number from the above web page every place it occurs.

The first step is to find a line of the web page on which a member id number exists.  To do this, I’ll use the stristr command, which searches for text. The command if (stristr($line, ‘?memberid=’)) {…} takes a look at a line of a website ($line) and asks if it contains a certain piece of text (in this case, ?memberid=).  If the piece of text is found, then what ever commands inside the brackets { } are executed.  If the piece of text is not found, then your computer won’t do anything.

So far, we have:

if (stristr($line, ‘?memberid=’))


What goes inside the brackets?  Some exploding!  Our first line of code inside the brackets tells the computer to split a line of website code using the text memberid= as the delimiter.

$cutstart = explode(‘memberid=’, $line);

This leaves a line of website code in two pieces, with the delimiter memberid= removed.  Those two pieces are set by the explode command to be $cutstart[0] and $cutstart[1]:

Original line of text: <div id=”444″><a href=”/member.php?memberid=”201″ usertitle=”Tim – click here to go to my blog”> Tim</a>: Greetings! How are you, Lucille?</div>

$cutstart[0]: <div id=”444″><a href=”/member.php?

$cutstart[1]: “201” usertitle=”Tim – click here to go to my blog”> Tim</a>: Greetings! How are you, Lucille?</div>

Where’s the member id number we want?  It’s the number right at the start of $cutstart[1], sitting in between the double quotation marks.  To get at that, let’s add another line of code to explode $cutstart[1] which tells the computer to split $cutstart[1] into pieces at the spots where there are double quotation marks.  The command in the second line of code inside the brackets is:

$cutend = explode(‘”‘, $cutstart[1]);

and takes $cutstart[1] apart, turning it into the pieces $cutend[0]$cutend[1], $cutend[2], $cutend[3] like so:

original $cutstart[1]: “201” usertitle=”Tim – click here to go to my blog”> Tim</a>: Greetings! How are you, Lucille?</div>

$cutend[0]: 201

$cutend[1]: usertitle= 

$cutend[2]: Tim – click here to go to my blog

$cutend[3]: > Tim</a>: Greetings! How are you, Lucille?</div>

Which part am I interested in?  Only the member id number, and finally that’s what I’ve got in $cutend[0].  If I want, I can rename it to help me remember what I’ve got:

$memberid = $cutend[0];

Taken all together, the code looks like this.

if (stristr($line, ‘?memberid=’))
$cutstart = explode(‘memberid=’, $line);
$cutend = explode(‘”‘, $cutstart[1]);
$memberid = $cutend[0];

This may not be the most elegant or efficient solution, but it’s pretty simple — and most importantly, gosh darn it, it works.  A novice data miner like me will never get hired away by Google for basic programming like this, and if you’re a social scientist with mad programming skills you may scoff at the elementary nature of this step.  That’s OK; this isn’t written for the Google corporation or wicked-fast coders.  I wrote all this out because the code was a big step for me in becoming a better, more complete social scientist.  If you’re looking to take the same step, I hope this post helps you along.

Credit goes to Tizag for helping me to understand the “explode” command a bit better. In turn, if you can think of a way for me to explain this more clearly or fully, please let me know by sharing a comment.

Combining Results of Multiple Twitter Searches into one File on the Cheap

Twitter is a great subject for social media research because 1) it is used by a lot of active and influential people and 2) its data is presumed public, obviating privacy concerns. Yet the sheer volume of Twitter data poses problems for researchers, especially those without thousands of extra dollars needed to harness insane amounts of computer power. Part of the solution for modest researchers at small institutions like myself is to study relatively small-scale subjects. Another part of the solution is to tie together multiple low-cost solutions and not look for one magic software package to address all needs.

I’m working on a project right now in which I’ve been following all tweets by and tweets mentioning members of the Maine State Legislature over time. I could write a program in PHP using the Twitter API to accomplish this… if I had a bit more time and know-how. I’ll try to get these later, but for now, I’m running multiple copies of the program Tweet Archivist Desktop, each of which captures and saves tweets by or regarding one Twitter account as they’re posted. Tweet Archivist Desktop costs just $9.99 for a perpetual license, which I consider well work the price.

Tweet Archivist Desktop creates a separate .csv dataset for each of the searches I’m saving. To gather them all together, I’m following advice shared helpfully by solveyourtech. On my Windows laptop, I’m entering the command prompt and combining all csv files in a folder into a single csv file with a variant of the “copy” command.

copy command in Windows command prompt combines multiple csv files into one

Blue for Boys, Pink for Girls? (Paoletti in Search Context)

I’ve been reading and enjoying Pink and Blue: Telling the Boys from the Girls in America. In the book, University of Maryland American Studies Associate Professor Jo B. Paoletti uses catalogs, sewing patterns, historical portraits, newspaper advertisements and similar media to document the emergence of pink as a color for girls’ clothing and blue as a color for boys’ clothing. Paoletti traces color preferences to changes in textile and cleansing technology, connection of local media outlets into national media networks, feedback between consumers and marketers, social mobility, changes in psychological theories of development and the reaction of new generations against the generations before.

Paoletti not only makes a case that the emergence of pink for girls and blue for boys in clothing is relatively new, but more strongly asserts that pink was often seen as a boy’s color and blue as a girl’s color in many areas of the United States early in the 20th Century. Paoletti’s “favorite primary source” for this claim in the book comes from a 1918 issue of the Chicago clothing trade magazine The Infants’ Department:

“Pink or Blue? Which is intended for boys and which for girls? This question comes from one of our readers this month, and the discussion may be of interest to others. There has been a great diversity of opinion on this subject, but the generally accepted rule is pink for the boy and blue for the girl. The reason is that pink being a more decided and stronger color, is more suitable for the boy; while blue, which is more delicate and dainty is prettier for the girl. In later years the shade of pink has been much improved. Perhaps if we had the delicate flesh tints when baby layettes were first sold, the rule might have been reversed.

“The nursery rhyme of ‘Little Boy Blue’ is responsible for the thought that blue is for boys. Stationers, too, reverse the colors, but as they sell only announcement cards and baby books, they can not be considered authorities.

“If a customer is too fussy on this subject, suggest that she blend the two colors, an effective and pretty custom which originated on the other side, and which after all is the only way of getting the laugh on the stork.”

Paoletti finds additional text sources that seem to display an inconsistent set of color choices around the turn of the century: for instance, Paoletti pairs a pink-for-girls and blue-for-boys quote of the novel Little Women with a blue-for-girls and pink-for-boys recommendation by the 1890 Ladies Home Journal (Paoletti, p. 87).

In a July 2012 letter to the Archives of Sexual Behavior, Assistant Professor of Psychology Marco Del Guidice counters Paoletti’s historical claim by using a search of the millions of books scanned in by Google for the phrases like “blue for boys,” “pink for girls,” “blue for girls” and “pink for boys.” You can perform just such a search for yourself right here, showing these results:

Pink for Girls, Blue for Boys, Blue for Girls, Pink for Boys Google Ngrams Book Search from 1820-2008

Del Guidice interprets these results as a systematic refutation of Paoletti’s “anecdotal” claims, even going so far as to term them “A Scientific Urban Legend”:

“Gender-coded references to pink and blue begin to appear around 1890 and intensify after World War II. However, all the gender-color associations found in the database conform to the familiar convention of pink for girls and blue for boys. An equivalent search of the British English corpus revealed exactly the same pattern. In other words, this massive book database contains no trace of the alleged pink-blue reversal; on the contrary, the results show remarkable consistency in gender coding over time in both the U.S. and the UK, starting from the late nineteenth century and continuing throughout the twentieth century.

“If one considers the totality of the evidence, the most parsimonious conclusion is that the Pink-Blue Reversal (PBR) as usually described never happened, and that the magainze excerpts cited in support of the PBR are anomalous or unrepresentative of the broader cultural context. Not only do the present findings run counter to the standard PBR account; they also fail to support Paoletti’s claim that pink and blue were inconsistently associated with gender until the 1950s.

The replication of Del Guidice’s Google Books search shown above show that his contention may be slightly too strong:.the phrase “blue for girls” is not absent entirely from the corpus of books Google has scanned, but rather is present at a low level throughout the 20th Century. Indeed, around the turn of the 20th Century, there appears to have been a period in which the appearance of “blue for girls” rivaled the appearance of “pink for girls” in books. One such source is a 1920 skit in the Chicago-published journal “Public Libraries,” performed by the staff of the Pomona Public Library for the Pomona, California City Council and library board of trustees to illustrate librarians’ daily activities:

Pink for Boys and Blue for Girls in 1920 Library Skit

“Dear me! Someone wants to know what colored ribbons to use for a boy baby and what for a girl. I can never remember! But thank goodness it’s catalogued here somewhere … (Paws through catalog drawer) Oh, here it is: “Infants: Colors for boy and girl” (Returns to phone) Hello, Pink is used for boys and blue for girls … Yes … That’s right. You’re welcome.”

But while Paoletti’s claim that blue for boys and pink for girls is not dominant before World War II appears to be supported, and while some mention of “blue for girls” appears in early scanned books, Del Guidice’s more general point appears to be borne out: there is no point at which pink for girls and blue for boys is swamped by contrary mentions.

Remembering Pete Seeger 1-28-14: Collective Memory, Shared on Twitter

Activist folksinger Pete Seeger died at the age of 94 on January 27, 2014. As word of Seeger’s death spread on January 28, Twitter was flooded with tributes, including 28,226 posts made to the social media outlet’s #PeteSeeger hashtag channel by 9 PM. Of those posts, 21,617 (some 76.8%) were “re-tweets” of others’ posts. Pete Seeger wouldn’t have minded: he was a staunch believer in people forming publics to sing together, hearing a call and issuing a response, finding a tune and amplifying it not by microphones but in sheer numbers.

What did the world sing today about Pete Seeger? To answer that question, I tuned the Tweet Archivist Desktop (a handy $10 tool) to the #PeteSeeger hashtag, where it archived users’ public posts silently and efficiently in a background window on my computer. I used NodeXL (free and open-source) to find the most common word pairs in posts and to visualize them in the graphic you see below. When pairs are connected into chains and webs, the result is a semantic network that captures the spirit of the day.

Remembering Pete Seeger: a data visualization of a semantic network of the most common words and their connections in the 28,226 #PeteSeeger Twitter contributions from midnight to 9 PM on January 28 2014

In case you’re wondering, the word “communist” only appears 29 times in all those posts, far too rarely to reach the threshold required to appear in the image. “Thank” or “thanks” appears over 2,000 times.