Convocation Remarks on the University of Maine at Augusta theme for 2014: “Innovation”

Convocation at the University of Maine at Augusta, September 19 2014

UMA Convocation Fall 2014
Framing the Theme – “Innovation”

Good afternoon.  Last spring, the UMA Faculty Colloquium Committee identified a special theme of innovation to reflect the University’s 50th anniversary. The committee asks that every member of the faculty, staff and student body read and reflect upon a book about innovation, Outliers by Malcolm Gladwell.  Look for activities throughout the year celebrating UMA’s 50 years of innovation.  As we kick off the year today, I’ve been asked to frame the theme of innovation in a few remarks.

When most of us hear the word “innovation,” we focus on the creation of something new.  But there is more to innovation than newness.  The word “innovation” comes from the Latin innovare, to renew or to make new.  What do we renew?  What do we make new?  Something that was already there.  To innovate is to make something new out of what came before.

To write a “novel” means literally to create a story that is new.  But in the introduction to her novel Frankenstein, a novel of ghastly innovation, author Mary Shelley admits stitching together her story from the science, philosophy and mythology of the day before adding her own animating spark.  “Everything must have a beginning,” Shelley writes, but “that beginning must be linked to something that went before…. Invention does not consist in creating out of void… the materials must, in the first place, be afforded.[i]”  The innovative stories we tell are based on what came before.

Every human being on Earth is a unique innovation, a Frankenstein experiment of sorts, with a genome ripped from our parents and stitched together in a brand new way.  Thanks to mutation, even identical twins don’t have exactly the same set of genes.  But neither is any human being entirely new.  We are variations on the genetic themes set by our parents, and as social scientists know we draw heavily from our environment in fashioning our public selves.  The new, innovative you is based on what came before.

The University of Maine at Augusta is itself an innovation.  Our history tells us that 50 years ago, there was no college or university in Augusta – and when UMA held its first classes on September 12 1965, it had no campus of its own.  Our first classrooms were in Cony High School, set aside for use after school hours; that’s innovative.  Our bookstore was fit into a Cony High School coat closet; that’s innovative[ii].  Even these humble beginnings were not completely new, but based on what came before: an existing school, repurposed and reimagined. In its next 50 years, UMA will rely on already existing strengths as it finds innovative new ways to fulfill its purpose.

And what is that purpose?  What is a university for?  At first glance, it may appear to some that a university is a business selling a product called a diploma to customers called students.  Once purchased, the diploma product can be redeemed by the customer for future economic profit.  Well, it certainly takes money for a person to live and for a university to run.  But is an education just another consumer purchase?  Is a university an assembly-line factory?  Are faculty here to sell?  Are students here to shop?

I think not.  We are here because we share a dream.  We dream of becoming more than we are.  We dream of remaking ourselves, putting parts of our lives that came before together with something new and adding an animating spark.  We know this dream of innovation can come true because we see it happen here every day — for some sooner, for some a bit later.  The poet Adelaide Anne Procter shares a truth we at UMA know well: if we miss our first shot at remaking ourselves a second chance, a third chance will come.  It is never too late.  Procter writes:

“Have we not all, amid life’s petty strife,

Some pure ideal of a noble life

That once seemed possible? Did we not hear

The flutter of its wings, and feel it near,

And just within our reach? It was. And yet

We lost it in this daily jar and fret,

And now live idle in a vague regret;

But still our place is kept, and it will wait,

Ready for us to fill it, soon or late.

No star is ever lost we once have seen,

We always may be what we might have been[iii].”

 

This is the heart of innovation: to draw from what came before, to honor those who inspire your work today, to dream of being more than you are.


[i] Shelley, Mary. 1818.  Frankenstein, or, the Modern Prometheus.  London: Lackington, Hughes, Harding, Mavor and Jones.

[ii] Brookes, Kenneth. 1977.  The Story of the University of Maine at Augusta: The Jewett Years.  University of Maine at Augusta publication.

[iii] Procter, Adelaide Anne. 1864. “A Legend of Provence” (excerpt).  P. 191 in The Poems of Adelaide A. Procter.  Boston: Ticknor and Fields.

Finding and Extracting Variables from Web Pages with PHP: A How-to for Social Scientists in the Rough

“Data Mining”: Just Another Way for Social Scientists to Ask Questions

If social science is the study of the structure of interactions, groups and classes, and if interactions, groups and classes are increasingly tied to the online environment, then it is increasingly important for social scientists to learn how to collect data online. Fortunately, the approach to “data mining” online interaction is fundamentally the same as the approach to studying offline social interaction:

  1. We approach the subject,
  2. We query the subject, and
  3. We obtain variables based on the responses we’re given.

Because the online environment and our online subjects are different, the way we make online queries must be different from the way we make offline queries. In data mining we don’t question human beings who can flexibly interpret a question; instead, we question computers responsible for the architecture of the online social system, and they will only respond if questioned in precisely the right way.

 

Learning to Mine the Web for Social Data — Without a Computer Science Degree

I’ve been trying to learn how to mine social information from websites on my own, without the benefit of any formal education in computer science.  This is kind of fun even when it’s frustrating, as long as I remember that getting information from the online environment is like solving a puzzle.  On most websites, social information (relations, communications, and group memberships) is stored in a database (like XMLSQL or JSON); some content management software (like WordPress, Joomla or Drupal) takes the information stored in a database and posts it on web pages, surrounded by code that makes the information comprehensible to humans like you and me.  If websites are researcher-friendly, they allow databases to be queried directly through an Application-Programming Interface (API).

Many websites don’t let a person query their databases, even when all the information published on those websites is public.  What’s a social scientist to do?  Well, we could literally read each single web page, find the information about relations, communications and group memberships we’re interested in, write down that information, and enter it into our own database for analysis.  We could do this, hypothetically, but at the practical scale of the Internet it’s often impossible.  Manually collecting interactions on a website with 10,000 participants could take years — and by the time we were done, there would be a whole new set of interactions to observe!

Fortunately, because web pages on social websites are written by computers, there are inevitably patterns in the way they’re written.  Visit a typical page on a social media website and use your browser’s “View source” command to look at the raw HTML language creating that page.  You’ll find sections that look like this:


<div class=”post” postid=”32“><div class=”comments”><a name=”comments”></a><h3>3 Comments on “Lucille’s First Blog Post”</h3><div class=”commentblock”>
<div class=”comment” id=”444“><a href=”/member.php?memberid=”201” usertitle=”Tim – click here to go to my blog”> Tim</a>: Greetings! How are you, Lucille?</div>
<div class=”comment” id=”445“><a href=”/member.php?memberid=”1181” usertitle=”Lucille – click here to go to my blog”> Lucille</a>: Hey, Tom. I’m new here. How do I respond to your comment?</div>
<div class=”comment” id=”446“><a href=”/member.php?memberid=”201” usertitle=”Tim – click here to go to my blog”> Tim</a>: Congratulations, Lucille, you just did!  Welcome to the community.</div>
</div></div></div>


That may look like a cluttered mess, but if you look carefully you can find important information.  Some of that information is the content that users write.   Other pieces of information track posts, comments and users by number or name. These names and numbers (indicated in red above) can be thought of as social science variables, and encouragingly they’re placed in predictable locations in a web page:

variable preceded by followed by
post id <div postid=” “><div >
comment id <div id=” “><a href=”/member.php?
member id member.php?memberid=” ” usertitle=”
member name  usertitle=”  – click here to go to my blog

There should be a set of rules for finding these predictable locations, and my goal in data mining is to explain those rules in a computer program that automatically reads many pages on a website, much faster than I can read them.  In English, the rules would look like this:

“Find text that is preceded by [preceding text] and is followed by [following text].  This text is an instance of [variable name].”

Unfortunately, computers don’t understand English.  I am familiar with a language called PHP that can read lines of a web page.  I didn’t know of a command in PHP that would let me carry out the rule described above.  What to do?  Ask a friend.  I asked a friend of mine with a PhD in Computer Science if he could identify such a command in PHP. His answer: “Well, you don’t want to use PHP. The first thing to do is teach yourself Perl.” The Perl programming language, he went on to explain, has much more efficient and flexible approach to handling strings as variables, and if I was going to be serious about data mining efficiently, I should use Perl.

I can’t tell you how many times some computer science expert has told me I shouldn’t follow a path because it was “inelegant” or “inefficient.”  Well, that may be wonderful advice for professional computer programmers who have to design and maintain huge information edifices, or to those who have a few extra semesters to spare in their learning quest, but in my case I say a hearty “Baloney!” to that.  Research does not need to and often cannot wait for the most efficient or elegant or masterful technique to be mastered.  Sometimes the most important thing to do is to get the darned research done.

In my case, this means that I’m going to use PHP, even though it may not be elegant or efficient or flexible or have objects to orient or [insert computer science tech phrase here].  I’m going to use PHP because I know it and it will — clumsily or not — get the darned job done.  Good enough may not be perfect but it is, by definition, good enough.  As long as the result is accurate, I can live with that.

 

A Rough but Ready Method for Extracting Variables from Web Pages with PHP — Explode!

It took a bit of reading through PHP’s online manual, but eventually I found a method that works for me — the “explode” command.  In what follows, I’m going to assume that you are familiar with PHP.  If you aren’t, that’s OK — you’ll just have to find another way to extract information out of a web page.

The PHP command “Explode” takes a string — a line of text in a web page — and splits it into parts.  “Explode” splits your line of text wherever a certain delimiter is found.  A delimiter is nothing more than a piece of text you want to use as a splitting point.  Let’s use an example, the web page snippet listed above:


<div class=”post” postid=”32″><div class=”comments”><a name=”comments”></a><h3>3 Comments on “Lucille’s First Blog Post”</h3><div class=”commentblock”>

<div class=”comment” id=”444″><a href=”/member.php?memberid=”201″ usertitle=”Tim – click here to go to my blog”> Tim</a>: Greetings! How are you, Lucille?</div>

<div class=”comment” id=”445″><a href=”/member.php?memberid=”1181″ usertitle=”Lucille – click here to go to my blog”> Lucille</a>: Hey, Tom. I’m new here. How do I respond to your comment?</div>

<div class=”comment” id=”446″><a href=”/member.php?memberid=”201″ usertitle=”Tim – click here to go to my blog”> Tim</a>: Congratulations, Lucille, you just did! Welcome to the community.</div>

</div></div></div>


Let’s say I’d like to look through 5,000 web pages like this, representing 5,000 individual blog posts.  In each of these 5,000 web pages, the particular post id and comment ids and member ids may change, but the places where they can be found and the code surrounding them remain the same.  We’ll use the code surrounding our desired information as delimiters.

To get really specific, let’s say I’d like to extract a member id number from the above web page every place it occurs.

The first step is to find a line of the web page on which a member id number exists.  To do this, I’ll use the stristr command, which searches for text. The command if (stristr($line, ‘?memberid=’)) {…} takes a look at a line of a website ($line) and asks if it contains a certain piece of text (in this case, ?memberid=).  If the piece of text is found, then what ever commands inside the brackets { } are executed.  If the piece of text is not found, then your computer won’t do anything.

So far, we have:

if (stristr($line, ‘?memberid=’))
{

}

What goes inside the brackets?  Some exploding!  Our first line of code inside the brackets tells the computer to split a line of website code using the text memberid= as the delimiter.

$cutstart = explode(‘memberid=’, $line);

This leaves a line of website code in two pieces, with the delimiter memberid= removed.  Those two pieces are set by the explode command to be $cutstart[0] and $cutstart[1]:

Original line of text: <div id=”444″><a href=”/member.php?memberid=”201″ usertitle=”Tim – click here to go to my blog”> Tim</a>: Greetings! How are you, Lucille?</div>

$cutstart[0]: <div id=”444″><a href=”/member.php?

$cutstart[1]: “201” usertitle=”Tim – click here to go to my blog”> Tim</a>: Greetings! How are you, Lucille?</div>

Where’s the member id number we want?  It’s the number right at the start of $cutstart[1], sitting in between the double quotation marks.  To get at that, let’s add another line of code to explode $cutstart[1] which tells the computer to split $cutstart[1] into pieces at the spots where there are double quotation marks.  The command in the second line of code inside the brackets is:

$cutend = explode(‘”‘, $cutstart[1]);

and takes $cutstart[1] apart, turning it into the pieces $cutend[0]$cutend[1], $cutend[2], $cutend[3] like so:

original $cutstart[1]: “201” usertitle=”Tim – click here to go to my blog”> Tim</a>: Greetings! How are you, Lucille?</div>

$cutend[0]: 201

$cutend[1]: usertitle= 

$cutend[2]: Tim – click here to go to my blog

$cutend[3]: > Tim</a>: Greetings! How are you, Lucille?</div>

Which part am I interested in?  Only the member id number, and finally that’s what I’ve got in $cutend[0].  If I want, I can rename it to help me remember what I’ve got:

$memberid = $cutend[0];

Taken all together, the code looks like this.


if (stristr($line, ‘?memberid=’))
{
$cutstart = explode(‘memberid=’, $line);
$cutend = explode(‘”‘, $cutstart[1]);
$memberid = $cutend[0];
}


This may not be the most elegant or efficient solution, but it’s pretty simple — and most importantly, gosh darn it, it works.  A novice data miner like me will never get hired away by Google for basic programming like this, and if you’re a social scientist with mad programming skills you may scoff at the elementary nature of this step.  That’s OK; this isn’t written for the Google corporation or wicked-fast coders.  I wrote all this out because the code was a big step for me in becoming a better, more complete social scientist.  If you’re looking to take the same step, I hope this post helps you along.

Credit goes to Tizag for helping me to understand the “explode” command a bit better. In turn, if you can think of a way for me to explain this more clearly or fully, please let me know by sharing a comment.