Category: Data Is Beautiful

Social Network Dynamics in Pharmacology

4/24/2016

The Department of Pharmacology at Creighton University School of Medicine is small, but mighty. There are only 10 professors or principal investigators (PIs) in the department, but this small size has its advantages. Or at least that is what we tell ourselves. A recent paper in Nature argued that bigger is not always better when it comes to labs and we are putting that to the test. Ideally with a smaller faculty, there would be more collaboration. Everyone knows what everyone else is doing, more or less, so they can more efficiently leverage the various expertise found throughout the department.

To measure how interconnected the pharmacology department was I created a network analysis visualization based on who published with whom. Using NCBI’s FLink tool I downloaded a list of the publications in the PubMed database for each PI in the pharmacology department at CU. A quick script in R formatted the authors and created a two-column “edge list” for each author, basically a list of every connection. This was imported into the free, open-sourced network analysis program Gephi which crunched the numbers and produced a stunning map of the connections in the pharmacology dept:

Gephi automatically determines similar clusters (seen as different colors) which are unsurprisingly centered on the various PIs in the department since those are the publications I was looking at. Dr. Murray, the department chair, has the most connections, also known as the highest degree, at 292, followed by Dr. Abel. Drs. Dravid and Scofield are ranked 2nd and 3rd respectively for betweenness centrality, after Dr. Murray. They are the gatekeepers that connect Drs. Abel, Bockman, and Tu to Dr. Murray. Each point’s size is proportional to its eigenvalue centrality, similar to Google’s Pagerank metric of importance.

I was a bit surprised at how disperse the department was. 60% of the PIs could be connected, and many have strong relationships. However the rest are floating on their own islands. Dr. Oldenburg is relatively new so this is not surprising. The Simeones (who are married) are closely connected. Also unsurprising.

This was a quick and dirty analysis and a few of the finer points slipped through the cracks. Some of the names are common in PubMed (especially Tu). so I did my best to filter what was there and only look at publications affiliated with Creighton. Unfortunately this filters out publications from other institutions by the same author. Also not everyone is attributed the same way on every manuscript. This is especially true for Drs. KA Simeone and Gelineau-Van Waes who have published under different last names, but also because sometimes a middle name is given and sometimes it is omitted. I tried my best to standardize the spellings for each PI, but with over 700 nodes I could not double check every author to ensure there were not duplicates elsewhere. If more than one PI shows up on a paper, that paper may show up under both searches. This should not increase the number of edges, but would affect the “strength” of those connections.

The connections are about what I had imagined. The brain people are on one side, everyone else is on the other. Expanding the search to include the papers from coauthors outside of the pharmacology department might discover more interesting connections. Just for fun I went ahead and pulled the data for every paper on PubMed with a Creighton affiliation. I could not even find my department on the visualization without searching for it. It is massive. The breadth of Creighton’s interconnected-ness forces me to marvel at how vast the community of scientists must truly be. So many people working to improve the body of knowledge of the human race. We are really just small bacteria in a very large petri dish.

1 Comment

More Taylor Swift (GRAMMY edition)

2/17/2016

0 Comments

I don’t have cable. So I did not get the chance to watch the Grammys this year. I was, however, happy to hear that Taylor Swift won the Grammy for Album of the Year for 1989 (since I recently wrote a post about how great she is). When I was writing the aforementioned post I did notice that she was nominated, but I felt pretty confident the National Academy of Recording Arts and Sciences would give it to Kendrick Lamar’s To Pimp a Butterfly. This is Swift’s second Grammy for Album of the year (she also won for Fearless as we all know).

Since the data have already been scraped from the Billboard Hot 100, I might as well get some mileage out of them. For each week since November of 2014 (around when 1989 and To Pimp a Butterfly were released) I assigned any song by any of the five artists nominated for Album of the Year a point value from 1 to 100 based on its position in the Hot 100. Songs ranked number 1 were given 100 points, and songs ranked 100 were given 1 point et cetera. Then for each artist I added up the point values for each week the results of which you can see below:

Notice someone missing from this visual? None of the songs from the Alabama Shakes’ album Sound & Color made it to the Hot 100. This is despite the fact that their album was on the Billboard 200 for album sales for 26 weeks, peaking at number 1. Chris Stapleton has a little purple blip around December, seven months after Traveller was released. Kendrick makes it on here and there, but the graph is clearly dominated by Taylor Swift and The Weeknd. The Weeknd has by far the highest peaks, but Taylor proves her popularity with the largest total area under the curve, 13,676 “points” vs. The Weeknd’s 11,156 “points”. TSwift also has the highest average per week, though not by much.

Fun fact: Swift’s “Bad Blood” was minimally successful until she added some bars by Kendrick Lamar...which went on to win the Grammy for Best Music Video. Does that say more about Taylor or Kendrick?

We can debate whether song popularity should be the metric by which we measure the value of an album. Obviously a lot of people thought Sound & Color was a world-class album despite its absence from the Hot 100. In fact the National Academy of Recording Arts and Sciences insists that Album of the Year is to “honor artistic achievement, technical proficiency and overall excellence in the recording industry, without regard to album sales or chart position.” However, of those nominated this year, they did pick the one with the best album sales and chart position.

0 Comments

Taylor Swift’s Data is Beautiful

2/5/2016

0 Comments

“If you're horrible to me, I'm going to write a song about it, and you won't like it. That's how I operate.” – Taylor Swift

Taylor Swift. Love her, hate her, love her, or love her. There really is not much of a choice as everyone seems to be constantly showering Ms. Swift with praise. While her fans are eager to show Tay some love she also has no problem sharing that love with others. As Gawker points out, she has dated what appears to be every man in the universe.

Regardless of Taylor Allison Swift’s extracurricular activities you cannot deny that she is a powerhouse in the music industry. Of Taylor’s five studio albums, three have sold over a million copies in their first week making Taylor the first female artist to have done so. That is remarkable considering that I cannot recall the last time I actually bought any music. John Lennon famously claimed that The Beatles were “more popular than Jesus.” TSwift is not quite that popular, but she is close according to Google Search traffic:

Whether Taylor or The Beatles before her, these musicians are having a large impact on in the lives of the kids. Perhaps an impact comparable to The JC. (Justin Bieber actually IS a more popular search term than Jesus over the same time span, which should be worrisome.)

Being popular is nothing new for TSwift and we can measure that popularity using the charts produced by Billboard. Taylor first appeared on the Billboard Hot 100 on September 23, 2006 with her song “Tim McGraw” off of her debut album Taylor Swift. Billboard started tracking the hits of the day on its list Best Sellers in Stores way back in 1939. Shortly thereafter it started publishing two other lists, Most Played by Jockeys and Most Played in Jukeboxes. Three years later Billboard published its magnum opus, the Hot 100, which coalesced these various chart into a single definitive ranking across genres.

Today the Hot 100 is composed of three components: sales (35-45%), airplay (30-40%) and streaming (20-30%) which can vary weekly in order to hit the target of 100 songs. The Streaming Songs chart is the newest and began in 2007 with data from AOL and Yahoo. It expanded to include services such as Spotify in January of 2013 and one month later added YouTube views. The Digital Songs chart tracks sales data for digital downloads while the Radio Music chart tracks radio airplay audience impressions. Both charts are generated from data furnished by Nielsen Music. Additionally Billboard publishes its YouTube data as a separate chart on its website.

The Billboard charts provide a rich, publicly available data set. Using the tools over at Kimono I scraped the available data from the Hot 100, Radio Music, Digital Songs, Streaming Songs and YouTube charts since 2000. Now that I have data on hundreds of artists over the last one and a half decades I feel like anything is possible. Taylor Swift is our test case because (1) She is very popular (2) many of her songs make it onto the Hot 100 (3) her most recent album, 1989, was released after Spotify and YouTube were included on the Streaming Songs chart. Here is what 1989′s travel through the Hot 100 looks like:

Yes it’s a bunch of lines and dots, but they tell some interesting stories. Compare the difference between “Shake It Off” and its debut at the #1 spot with the slow methodical rise of “Style.” Or Bad Blood’s resurgence after the addition of Kendrick Lamar and the release of its star-studded music video. Overall, nine of the 15 songs from Swift’s fifth album appeared on the Hot 100. At first it seemed odd that Taylor’s songs would drop off the chart once they dropped to position #50. Was this some brilliant marketing strategy? Is it not worth promoting a song anymore after it has fallen so far? Unfortunately the answer is much more mundane. Billboard labels songs as “recurrent” and removes them after 20 weeks on the chart and after falling below the 50th position.

In November 2014 Swift pulled her music from the popular streaming site Spotify. This severely crippled her ability to gain on the charts through the streaming component of the Hot 100. TSwift did agree to have her music featured on Apple Music’s new service which launched June 30, 2015 after some controversy regarding its 3-month free trial period. Her ability to sway the actions of the second most valuable public company legitimizes her powerhouse status and further validates all of this work.

Swift can make up for her refusal to stream by dominating on another “streaming” service, YouTube. Below is a breakdown of her trends on each of the Billboard charts I analyzed. While the Hot 100 ranks songs 1-100, the other charts only display the top 25 songs in each category.

(NB physical sales of a CD are also a component of the Hot 100 ranking, but are not included above.)

Some take aways: Each song had a huge increase in digital downloads when 1989 was released at the end of October, but only “Blank Space” was able to maintain that (”Shake It Off” was released prior to 1989). However, the increase in digital downloads was enough for both “Bad Blood” and “Style” to make it onto the Hot 100. YouTube views are almost always above streaming views. This makes sense since Taylor did not allow streaming on other services. If “Shake It Off” and “Blank Space” had not been labeled as recurrent they would probably still be on the Hot 100 based on their YouTube views which are very persistent. YouTube views were a leading indicator for “Bad Blood.” It made it to the top 5 of the YouTube chart a week before charting on digital downloads, the Hot 100, or radio songs. It would appear that TSwift’s mastery of music videos can single-handedly make a single.

Finally I calculated the area under the curve (AUC) for each of Taylor’s songs off of 1989 and compared it to the number of YouTube views for each song. The AUC was calculated by assigning a 100 point value to position 1, 99 points to position 2 etc. and multiplied by a song’s weeks at that position. Without access to other streaming services like Spotify or Tidal and given the visual correlation between the YouTube and Streaming Songs charts I expected a strong relationship between YouTube views and success on the charts. The regression for TSwift was statistically significant. However Justin Bieber, who does not have the same qualms with distributing his music on Spotify had an even stronger correlation. This might have something to do with the fact that J-Biebs has a music video for every song off of his new album (although many of them have relatively few views) while Taylor only has videos for six of her songs. Would more music videos add buoyancy to some of her other songs? I suspect that it would.

I should quickly note that analyzing the Billboard Hot 100 is not a novel idea. On his blog Modern Insights (and minor observations) Michael Kling has plotted similar analyses looking at the rise and fall of artists and songs in the Billboard Hot 100. Also very recently Cristian Cibils from Stanford University published a brief paper where he used machine learning to predict a song’s future Hot 100 trajectory based on past chart performance. I would recommend you check their posts out. Next Big Sounds is a company that tracks streaming and social media to provide unique insights into what makes songs popular and actually makes money doing it.

This is just a quick overview of some of the interesting features that can be investigated using data from Billboard. Full disclosure I did most of this as a part of WNYC’s Note to Self’s Infomagical week. Tweet me which other artists you think might have interesting stories to uncover. Taylor Swift is here to stay, and while she is here she will continue to win over our hearts and most of the Grammys.

0 Comments

Hobbit, Spanish for…Hobbit

11/16/2015

0 Comments

"En un agujero en el suelo, vivía un hobbit”

Did you ever get your report card and have to masterfully change some of those F’s to B’s on the bus ride home? I doubt that this has ever successfully fooled anyone, but this movie trope illustrates that report cards can be terrifying. They do, however, provide a useful piece of information: feedback. Without feedback, it is difficult to tell if learning is happening. Am I closer to mastery now than I was when I started?

Well, over a year ago I travelled to Mexico to visit my good friend Dave in Chiapas. I had worked my way through Duolingo and even did Rosetta Stone for a bit, so while I would not have considered myself fluent, I figured I could get by. This assumption was quickly smashed against the rocks of reality at Customs and I spent the remainder of the trip staring blankly at anyone who had the misfortune of attempting to speak to me.

The trip was still a blast, but I made a commitment to improve my Spanish over the next year. In the Spanish language section of Barnes & Noble I picked up a copy of El Hobbit by the legendary J.R.R. Tolkien. Maybe you’ve heard of it. I had been wanting to reread it for a while so why not try in Spanish. My goal was to read one page per day with a goal of finishing it within a year. At 283 pages, this seemed more than achievable.

Just last week I finally learned how the story ended (happily ever after), but I had no way of knowing if I actually improved my Spanish at all. There was no feedback.

There was one piece of information that I could leverage to my advantage, the number of words I translated per page. In an effort to actually retain some information I would write the translation of words or phrases I did not know over the words in the book. So I counted these up for each page. You can see the breakdown of the page totals sorted by chapter in the graph below. The average for that chapter is shown in red.

Overall the chapters had a statistically significant downward trend! I may have learned something after all. This works out to be about 2/5 of a word improvement per chapter on average, from an average of 20 words per page in chapter 1 to just under 12 words per page in chapter 19.

I am still far from being a fluent speaker. In fact this did not teach me anything about speaking Spanish. But at least I have a report card to show my parents.

P.S. When I told my dad about this he responded, “Son, you’re a huge nerd.” Thanks pops.

0 Comments

Macklemore: Moped Market Maker

10/4/2015

0 Comments

“And I’m like ‘Honestly, I don’t know nothing about mopeds’” -Macklemore

Mopeds are a thing. A thing that exists. A thing that has had approximately zero influence on my life. People in Europe drive mopeds, people in America drive literally anything else. But then I watched the music video of Macklemore’s & Ryan Lewis’s new song “Downtown.” The video begins with Macklemore naively purchasing a moped which he proceeds to ride around the ‘Downtown’ for the remainder of the video.

“Oh yeah mopeds, those exist” I thought, and headed to Google to see if a moped is right for me [Spoiler: It is not]. But then I wondered how many other people were similarly inspired to dive into the fabulous(?) world of mopeds.

Google, of “Don’t be Evil” fame, obviously tracks every search performed on its site and the aggregated data is available through Google Trends. Below is the Google Trends plot for the search term “moped” from 2004 - present in the United States. An interactive version of this plot can be found here.

The most prominent feature is the large spike in the summer of 2008. When the financial crisis hit people were thinking of ditching their Hummers and opting for something that gets 70 mpg. What is curious is that this was not a persistent trend, even though people were supposedly strapped for cash for the next few years of the recession. Everyone did their moped research in 2008 and decided that mopeds were not for them. Conversely, everyone bought a moped in 2008 and they all still have them and have no need to ever search for anything moped related. Anecdotally, the number of mopeds I see is still approximately none so I will choose to believe the former.

Because Google Trends only displays relative search numbers, and the 2008 spike was such a huge outlier, I chose to rerun the analysis from December 2008 until the present to achieve greater “resolution” to answer the question of interest. Explicitly, how has “Downtown” affected searches for mopeds?

Because searches for mopeds are cyclical, we can average these to get an idea of how Macklemore’s song has deviated interest in mopeds. Here is a plot of the average number of relative moped searches by month (bold). Each individual year is also shown, with 2015 highlighted in red.

As you can see, 2015 has been a bit below average, but there was a small increase at the end of the month in August, which is when “Downtown” was released (indicated as a dotted line). Then in September, instead of dramatically plunging as Fall approaches, the number of searches for mopeds stayed relatively stable. September’s search traffic scored a 74, almost 17 points higher than the average for this time of year, and statistically different than the mean. More people than usual are actually looking for mopeds because of this ridiculous song.
There is no way to know if this increase in traffic will actually lead to more mopeds on the streets, especially when Winter is Coming. Nevertheless there appears to be a causal relationship between this song and moped searches. As “Downtown” continues to rise in the BillBoard Top 100 (it has only been on there since the September third and is currently sitting at number 16 at the time of writing) the last question we have to ask ourselves is: How much is Big Moped paying Macklemore?

Click Below to Watch the “Downtown” Music Video.

0 Comments

Pretty girls and graphs

5/19/2015

1 Comment

I know what you are all thinking: which state has the most beautiful college-aged women? A question for the ages to be sure, one that if I ask again in a few years would probably end with me in jail. Total Frat Move has been faithfully performing the Lord’s work and compiling a database answering that very question.

TFM’s ‘Instagram Babe of the Day’ showcases one Instagram account featuring a woman wearing clothes that her father must...just...not know about. Using the tools at Kimono Labs I assembled a list of 299 IBoDs and sorted them by state. Here are the results in a fraternity-approved-salmon colored map:

Florida is the winner and it is not even close at 67 ‘Babes’ of the 299, or 22%. California is the next closest at 49 followed by Arizona at 38, Texas at 21 and then Virginia and New York with 8 and 7 respectively. In fact together the top 3 states beat out the bottom 48 (including DC) combined. The entire country of Australia had 1 which puts it ahead of the six states with zero. America’s friendly neighbor to the North, Canada, came in at 4 beating out 34 states.

Clearly not all states are equally gifted. Warmer states seem to be dominating and the upper portion of the Midwest is just a deserted wasteland. But Wyoming only has 6 people per square mile. I didn’t expect there to be bronze bombshells patrolling the streets of Cheyenne. But even adjusting for the number of female college students per state does not even the playing field. Here is a graph of the total number of IBoDs plotted against female college students by state (data are in thousands, from 2008; available at the US Census. Thanks Obama).

Florida and Arizona are really just doing a great job. Keep it up gals. Texas and especially New York are under-performing their potential. Both have fewer TFM ‘Babes’ than would be expected based on their student population. Hawaii, with its low student population, actually has the highest rate of IBoDs. Around 1 in every 8500 college girls is on TFM. Florida’s rate is the only close competition at 1:9000 followed by a distant Arizona.

Babes of the Day is only one metric though. Is it really accurate in predicting the most beautiful states? I looked up some other rankings and chose two (the first two that came up on Google). The first was from the internet’s bastion of journalism, Buzzfeed, and the other was from Comcast Xfinity (even they have lists?). Both used a combination of arbitrary numbers to reach their equally dubious conclusions about the most attractive states. Here are the top 5 for each in descending order. (I switched back to using total number of IBoDs for this one because it gives a better list. Rigor is not the goal of this analysis)

No surprise, California shows up on each list. We also see Florida and Virginia show up on one of the other lists. The other two lists have Hawaii which would have been there if we had taken into account total number of college girls. So overall 3 of our top 5 ended up on the other lists which gives us a 60%. Plus 10% extra credit for Hawaii almost being on there gives us 70%. And C’s get degrees.

There is no way to differentiate between correlation and causation. Either warm weather attracts beautiful women, makes women beautiful or coerces them to send their names to TFMintern@grandex.co. Who cares though? I’m not complaining.

1 Comment

Forward>>

Social Network Dynamics in Pharmacology

More Taylor Swift (GRAMMY edition)

Taylor Swift’s Data is Beautiful

Hobbit, Spanish for…Hobbit

Macklemore: Moped Market Maker

Pretty girls and graphs

Archives

Categories

Home

Projects

Blog