Social Media Data Mining with Raspberry Pi: 9 Videos for the Complete Beginner

Since the start of this year, I’ve been working on a project to take a $30 Raspberry Pi 2 computer turn it to create a social media data mining machine using the programming language Python. The words “programming language” may be off-putting, but my goal is to work through the process step-by-step so that even a complete beginner can follow along and accomplish the feat.

The inexpensive, adaptable $30 Raspberry Pi 2I’m motivated by two impulses. My first impulse to help people gain control over and ownership of the information regarding interaction that surrounds us. My second impulse is to demonstrate that mastery of social media information is not limited to the corporate, the government, or the otherwise well-funded sphere. This is not a video series for those who already are technologically wealthy and adept. It’s for anyone who has $30 to spare, a willingness to tinker, but the feeling that they’ve been left out of the social media data race. I hope to make the point that anyone can use social media data mining to find out who’s talking to whom. The powers that be are already watching down at us: my hope is that we little folks can start to watch up.

I’m starting the project by shooting videos. The video series has further potential, but has proceeded far enough along to represent a fairly good arc of skill development. Eventually I’d like to transcribe the videos and create a written and illustrated how-to pamphlet; these videos are just the start.

Throughout the videos, I’ve tried not to cover up the temporary mistakes, detours and puzzling bugs that are typical of programming. No one I know of hooks up the perfect computer system or writes a perfect program on the first try. Working through error messages and sleuthing through them is part of the process, and you’ll see that occasionally in these videos.

Please feel free to share the videos if you find them useful. I’d also appreciate any feedback you might have to offer.

Video 1: Hardware Setup for the Raspberry Pi

Video 2: Setting up the Raspberry Pi’s Raspbian Operating System

Video 3: Using the Raspberry Pi’s Text and Graphical Operating Systems

Video 4: Installing R

Video 5: Twitter, Tweepy and Python

Video 6: Debugging

Video 7: Saving Twitter Posts in a CSV File

Video 8: Extracting and Saving Data on Twitter URLs, Hashtags, and Mentions

Video 9: Custom Input

Installing R and the package igraph on a Mac: As Always, Not Quite the Same

The incredibly useful research program called R is available on many platforms — Linux, Windows and Apple computers — and can run the same scripts across all three of its different versions.  That said, the experience of getting R to run those scripts is not quite the same on an Apple Mac.  This seems to be some kind of unwritten rule for Macs — whatever your program, on a Mac the menus, procedures and names of commands will somehow end up being different.

So what?  Well, if you’re just getting started with R, you’ll need to occasionally get some tips and tricks for making the program work.  Most of the how-to blog posts and videos you can find out there use examples using a Linux or Windows system — and they just won’t work for a Mac.  I found this out the hard way when teaching students to use the igraph package for R to perform social network analysis.  A few of my students have Macs at home, and it didn’t take long for them to cry for help, because the R program they were dealing with looked very little like the R program I’d been showing them.

If you find yourself in the same boat, and are running into trouble using R and igraph, I hope the following video will be of some help. Using a screen capture of a Mac running OS X, I briefly demonstrate the experience of installing R and running a script with the igraph package on from an Apple vantage point.  One difference is that there are a few menu options you’ll need to select when installing igraph to actually make it run.  In another simple but crucial difference for Macs, you’ll need to select all the text in your script before running it.  THEN, and only then, use the “Execute” command.  That’s not necessary on a Windows computer, but it’s a make-or-break move on a Mac.

Why? Don’t ask me why. It’s the same old story that we’ve had for thirty years: it’s just different on a Mac.

The walkthrough video:

Please leave a comment if you have a question or need clarification, and I’d be glad to be of help if I can.

Combining Results of Multiple Twitter Searches into one File on the Cheap

Twitter is a great subject for social media research because 1) it is used by a lot of active and influential people and 2) its data is presumed public, obviating privacy concerns. Yet the sheer volume of Twitter data poses problems for researchers, especially those without thousands of extra dollars needed to harness insane amounts of computer power. Part of the solution for modest researchers at small institutions like myself is to study relatively small-scale subjects. Another part of the solution is to tie together multiple low-cost solutions and not look for one magic software package to address all needs.

I’m working on a project right now in which I’ve been following all tweets by and tweets mentioning members of the Maine State Legislature over time. I could write a program in PHP using the Twitter API to accomplish this… if I had a bit more time and know-how. I’ll try to get these later, but for now, I’m running multiple copies of the program Tweet Archivist Desktop, each of which captures and saves tweets by or regarding one Twitter account as they’re posted. Tweet Archivist Desktop costs just $9.99 for a perpetual license, which I consider well work the price.

Tweet Archivist Desktop creates a separate .csv dataset for each of the searches I’m saving. To gather them all together, I’m following advice shared helpfully by solveyourtech. On my Windows laptop, I’m entering the command prompt and combining all csv files in a folder into a single csv file with a variant of the “copy” command.

copy command in Windows command prompt combines multiple csv files into one