It’s an open secret: to be a university professor is to be a perpetual student. Learning doesn’t stop with the PhD; there’s always something new to read, always something new to discover, always something new to write, always something new to analyze, always a new technique to understand. This is why academics love the summer: finally, after teaching what we’ve already learned, we can learn some more!
One of my projects this summer is to bone up on the basics of a computer program for data analysis and visualization called R. When I was a graduate student in the 1990s, statistical software was produced exclusively by companies at a fairly steep price. Even now SAS 9.4, a software package used for data analysis in the academic and business communities, costs many thousands of dollars for an individual license (it’s so expensive that SAS won’t publish its price publicly). If you were lucky, you had access to a university lab with software already installed. If you didn’t have access and you wanted to run an analysis beyond the simplest level, you were simply out of luck.
All that changed with the introduction of R, a free and open-source program that runs on Windows computers, Mac computers, Unix computers and even web servers. Methodologists from all kinds of disciplines are increasingly devoted to the development and extension of R, meaning that the latest analytical techniques are regularly added to R through easily added plug-ins called “packages.” R is easy to download, quick to install, and …
… well, I’d like to say it’s easy to run, but the truth is that for a generation that has grown up using pointing and clicking, it may be a bit intimidating to see a program with a command prompt that requires you to work almost entirely by entering text commands at prompts or developing programs of saved commands:
Still, with a bit of practice, it’s not much harder to type in text commands than it is to choose options in a drop-down menu. The difference is that with drop-down menus, all options are presented to you in an organized fashion. When you use R, you have to start out knowing what the commands are, and if you don’t know, you have to go find out. It’s not R’s responsibility to show you what to do; it’s your responsibility to learn what R can do. This is learning unbounded.
I became familiar with R by necessity earlier this year, when I needed to generate robust variance estimates in order to account for clustering in a sample. That option isn’t available in most free menu-driven statistical programs, and I had a budget of $0 for my research project, so I installed R and the package rms by Frank E. Harrell, Jr. R got the job done.
Since then, I’ve become aware that R can do much more than run a statistical analysis. It can be used to gather data automatically. It can be used to write automated webpages. It can be used to create simulations. It can visualize patterns in data with amazing graphics and videos (browse through the Google+ community for Statistics and R to get a taste of the possibilities). But this level of high-end performance requires a more fundamental understanding of R than I’ve got right now. To get back to basics and build myself a good foundation of understanding, I’ve started EdX’s Introduction to R Programming course. This is another example of learning unbounded. It’s an entirely online educational experience, I haven’t paid a cent to enroll, and I’m finding myself interacting with people from all over the globe in the course’s discussion sections. Students in this course are asked to introduce themselves and say a little bit about where they’re from. On a whim this morning, I tallied up the countries represented among students in the R course. They are:
The United States isn’t even the top spot for R students; that position is taken by India, and there are 48 nations sending at least one student to the course. Just as the way we produce knowledge is changing, so is the way we learn how to produce knowledge.
P.S. Faced with a generation of academic and business analysts flocking to R, SAS has lost significant market share. Earlier this year, SAS responded by making a partial version of its software available for free. This software is called SAS University Edition and can be downloaded here. I’ve found installation to be more complicated and time-consuming than for R (the whopping download of a 1.8 GB installation file and the need to first install Oracle VM VirtualBox management software accounts for most of this difficulty), but I’m hopeful that I’ll have this second package of analytical software up and running soon so that I can compare the ease and power of the two programs.