Combining Results of Multiple Twitter Searches into one File on the Cheap

Twitter is a great subject for social media research because 1) it is used by a lot of active and influential people and 2) its data is presumed public, obviating privacy concerns. Yet the sheer volume of Twitter data poses problems for researchers, especially those without thousands of extra dollars needed to harness insane amounts of computer power. Part of the solution for modest researchers at small institutions like myself is to study relatively small-scale subjects. Another part of the solution is to tie together multiple low-cost solutions and not look for one magic software package to address all needs.

I’m working on a project right now in which I’ve been following all tweets by and tweets mentioning members of the Maine State Legislature over time. I could write a program in PHP using the Twitter API to accomplish this… if I had a bit more time and know-how. I’ll try to get these later, but for now, I’m running multiple copies of the program Tweet Archivist Desktop, each of which captures and saves tweets by or regarding one Twitter account as they’re posted. Tweet Archivist Desktop costs just $9.99 for a perpetual license, which I consider well work the price.

Tweet Archivist Desktop creates a separate .csv dataset for each of the searches I’m saving. To gather them all together, I’m following advice shared helpfully by solveyourtech. On my Windows laptop, I’m entering the command prompt and combining all csv files in a folder into a single csv file with a variant of the “copy” command.

copy command in Windows command prompt combines multiple csv files into one

YouTube, Socially HalfBaked

In undergraduate courses, I often exhort students to express their ideas in measurable terms and to make sure that what they think they’re measuring and what they’re actually measuring have a reasonable connection.  This could be seen as the worry of a fussy academic, but there are real consequences to fuzzy thinking and fuzzy measurement in what some people call “the real world.”  I recently came across a “real-world” example of fuzzy research in the field of social media analytics that I’d like to share with you.   As this example shows, the use of trendy and colorful infographics can’t always bridge an information gap.

Thinking about YouTube: All Views? Views Per Video? Average Video Length?

On November 27 2013, the social media analytic company SocialBakers released a report in which it confidently declared that “Videos Under Two Minutes Generate the Most YouTube Views.” This is an ambiguous claim with at least two possible meanings:

Possible Meaning #1: If we count all YouTube views, most of the views will be of videos under two minutes long.
Possible Meaning #2: A video of less than two minutes in duration will tend to obtain more views than a longer video.

These possible meanings may sound similar, but they are substantially quite different. Meaning #1 brings to mind the saying that “most car crashes happen within a mile of home.” This may be true, but that fact doesn’t imply that driving close to home is more dangerous because we also do most of our driving within a mile of home. In the same vein, it might be that most video views are for videos that are under two minutes long, but if most videos are under two minutes long, that’s not at all surprising.

What we really want to know if we’re driving is what locations are more risky. For every mile we drive closer to home, are we more or less likely to crash? If we’re posting YouTube videos with the hope of obtaining views, what we really want to know is whether a single short video tends to snag more views than a single medium-length video or a single extended-length video. That question is expressed in Meaning #2.

It appears from the following text that SocialBakers is interested in testing the question expressed in Meaning #2:

“Using YouTube to reach your Fan’s can be a tricky proposition. Done right, and you can create something that your audience will remember for a long time after, and will want to share with their friends. Videos have the potential to really go viral. But how long should a video be? Make it too long, and people will be yawning and looking for something more interesting to occupy their time. Make it too short, and you might risk your content being easily forgettable and your message undelivered. We did some data investigation to get to the bottom of what video length, on YouTube, will makes the biggest impact….”

Sounds straightforward, doesn’t it? But watch as SocialBakers nimbly shifts back to Meaning #1:

“To do this, we looked at the 300 most viewed channels among different industries. The first thing we noticed is that videos between 16 seconds to 120 seconds generate almost 50% of all views on YouTube. The most successful videos are almost unanimously below 2 minutes in length.”

Did you notice the shift? In the second sentence from that passage, they’re measuring the number of views for all videos and comparing it to the number of views for all videos between 16 and 120 seconds. The problem is that there may just be a whole lot of videos between 16 and 120 seconds long — if so, it’s no wonder that they account for all those views. What we need to know to figure out whether this information is useful is another piece of information: what percent of YouTube videos are between 16 seconds and 120 seconds long. If such videos make up 70% of YouTube videos, then it’s not at all impressive that they generate 50% of all views. In fact, that result would be underwhelming. If, on the other hand, such videos make up just 20% of YouTube videos, then it would be quite impressive for them to garner 50% of all views.

Well, what does SocialBakers actually measure? To figure this out, let’s look at the company’s slickly-produced infographics from its brief report:

SocialBakers: Videos under two minutes generate the most YouTube views

This infographic doesn’t clarify matters at all. The numbers reported are percentages, but what are they percentages of? If you look closely, you’ll notice the large-text title implies that the percentages in the graphic are percentages of views (“generate the most YouTube views”). On the other hand, the tiny text underneath the graphic tells us that what SocialBakers has calculated is the “average length of YouTube videos,” not the share of views generated by YouTube videos.

SocialBakers’ second infographic makes it clear what’s going on. Take a close look at the numbers listed below, which are labeled “Lengths of YouTube Videos”:

SocialBakers: Common Lengths of YouTube Videos

All of the counts at the top of each bar add up to 579,112 videos. Those must be counts of videos, not counts of views, because a just one recent video from the top channel, PewDiePie, has gained nearly 2 million videos. The number of videos of 0-15 seconds (50,505) is 8.8% of 579,112. The number of videos of 16-30 seconds (90,619) is 15.6% of 579,112. The second infographic confirms for us that the first infographic is measuring the commonality of videos of different lengths — not the share of views obtained by videos of various lengths. Those two different-looking infographics are really just sharing the same information in different layouts.

SocialBakers’ infographics don’t have tell us whether a long video tends to obtain more views than a short video, because the infographics don’t measure the number of views per video. Those infographics don’t describe views at all (and there is no more data described in SocialBakers’ report to make up for this lack). Regardless, SocialBakers concludes that “Everyone Loves Short and Sweet Videos,” that “it is often far more effective to take up a small amount of viewing bandwidth in order to keep your audience entertained,” and that “you usually can’t go wrong by making sure your video is short and sweet.” Let’s not forget the title of SocialBakers’ report: “Videos Under Two Minutes Generate the Most YouTube Views.

Check That Data… If You Can

SocialBakers’ conclusions in the headline and text of its report don’t follow at all from the information SocialBakers has presented, but the uncomfortable truth is that most people will nod their heads and accept those conclusions anyway. If video producers follow SocialBakers’ recommendations on the basis of this report alone, they do so at their peril. If you are a consumer of social media advice, it is wise for you to be in the minority who check out claims.

A more thorough way to check out claims would be to replicate SocialBakers’ study. In order to carry out a replication, however, we would need to know what SocialBakers actually did in its study. SocialBakers shares some information in its infographics: we know from those graphics, for instance, that SocialBakers studied videos in the date range of July 1 to September 23, 2013. But did it study all new videos introduced during that period? All existing videos introduced during that period? Some other quantity entirely? We don’t know. We’re also unclear about how many videos SocialBakers measured; was it “videos from the top 300 most viewed brand channels across different industries” (infographic #2) or “videos from a sample of the top 300 most viewed brand channels” (infographic #1)? What kind of sample? What industries were selected and by what standard? Since we don’t know these details, we can’t replicate SocialBakers’ study to directly test its claims. This is probably not a mistake. If SocialBakers told you exactly how to replicate its work, after all, it would be releasing a proprietary business secret. Social media consulting as a business thrives on some secrecy, unlike social research as an academic pursuit, which thrives on the sharing of technique.

What we’ll have to settle for is a more indirect replication. This indirect replication starts with SocialBakers’ central claim for video producers: that a short video will gather more views than a long video.  SocialBakers has a 230-employee-strong stable of employees that can muster.  As a single busy individual, I’ll have to look at YouTube videos on a more modest scale.   I can take a fairly good look nonetheless: to follow the spirit of SocialBakers’ notion, I looked at the 10 YouTube channels with the most subscribers on November 30 2013:

1. Spotlight
2. PewDiePie
3. Smosh
4. HolaSoyGerman
5. JennaMarbles
6. RihannaVEVO
7. nigahiga
8. RayWilliamJohnson
9. OneDirectionVEVO
10. Machinima

I’ve gathered information on the length of, and number of views of, the ten most recent videos from each channel, resulting in 100 videos. This is an admittedly small set compared to that obtained by SocialBakers, but it has two advantages. First, these are the most recent successful videos by the most successful channels on YouTube, so if we are interested in emulating success, this is where we ought to look. Second, the procedure by which I obtained these measurements is “transparent,” meaning that I’ve told you exactly how it’s done. If you don’t believe my results, you can replicate my work to show me I’m wrong.

Let’s look at the results I obtained in three ways. First, we’ll look at the simple number of videos of various lengths. Because there are 100 total videos, these counts can also be read as percentages:

Number of Videos of Various Lengths (source: 10 most recent videos from each of the 10 most-subscribed YouTube video channels)

The results here are quite striking: the most common video length is not between 31 seconds and a minute, as reported in SocialBakers’ chart, but rather between 5 and 10 minutes. The ten most successful YouTube channels produce relatively lengthy videos, not short ones: only 5 out of their most recent 100 videos are of a minute or less in length, and only 9 out of the most recent 100 videos run for two minutes or less.

Second, let’s look at the raw number of views of these 100 videos:

Number of Video Views in Ranges of Different Video Lengths for the 10 most recent videos of the 10 most popular YouTube Channels

With over 1.1 billion video views, the videos between 3 minutes and 10 minutes in length clearly have the most views. However, from our first chart above we also know that videos between 3 minutes and 10 minutes in length account for the largest number of videos (72 out of 100 of them). Is the dominant presence of video views in this range due simply to the number of videos in the range? To find out, we can divide the total number of views in each length category by the total number of videos in a category. The result is the average number of views per video in a category, graphed below:

Average Number of Views per Video, by Length of Video, YouTube November 2013

Finally we can arrive at an answer to the question posed by SocialBakers: if we believe that the ten most popular video channels provide a model to emulate, and if we believe the length of a video is what drives people to view a video or not, then video producers seeking viewers would be well advised to upload videos of between 3 and 5 minutes in length. The next most advisable length for a video would be somewhere in the range of 5 to 10 minutes. Compared to the longer videos from these popular producers, videos of two minutes or less appear to be among the least popular on YouTube, not the most popular.

Keep Asking Questions

At this point, you may have more questions than answers. For instance, are the ten most popular video channels really the model to emulate? Could they have advantages that middle-range producers can’t touch? And is it possible that the length of a video isn’t what leads people to watch, but some other feature of a video that might itself be associated with length? To answer these questions, we’d need (yes) more research. But in order to get to this second tier of questions, we need to answer our first question — and that in turn means our measurements must be able to answer our question, and that we need to be specific in describing how our measurements are made.

1 2