Category: R

Cold City Surprises

7/16/2023

Globes are round, making it difficult to appreciate how far North or South different cities are. Here is a quick visualization of the latitudes for a pseudo-random selection of cities. The size of each circle corresponds to the average January high temperature in Fahrenheit. Dark blue circles indicate that the city is near water while light blue circles are farther inland.

Water obviously plays a big factor in determining winter temperature. Fargo, North Dakota is shorthand for "very cold place" but is south of Seattle with its average January high of 50 degrees. This is also seen in the latitude pair of Moscow (very cold but far from water) and Edinburgh (not too cold but right on the water).

Berlin is much farther North than I would have guessed! In my mind Berlin is a pleasant temperate city while it is much farther north than Fargo, or Minneapolis and almost as far north as frozen wasteland Edmonton, Canada.

Finally, I was also surprised to see Tokyo south of Washington DC as well as surpised to see how similar in latitude Beijing and DC are. My mental model of our planet had both of these East Asian capitals far north of DC

It is always interesting when you look at this planet of ours in a new way! Did anything surprise you in the figure?

0 Comments

District Running Recap

7/1/2023

0 Comments

During the height of the COVID-19 pandemic, everyone was trapped indoors. Various jurisdictions had different policies on appropriate justifications for venturing outside. I vividly remember the DC National Guard being deployed to manage the flow of people into the nearby fish market. However, throughout the pandemic, outdoor and socially distant running or exercising was generally allowed. Because the gyms were closed and the outdoors were open, my miles run per week shot up dramatically. I even wrote a blog post about it in April 2020.

Now the pandemic has subsided, but I the running has not. In fact, I’m running more than ever in a hunt for a Boston Marathon qualifying time. During this expanded training, and throughout the past three years, I have run many of the local DC streets. I have updated the running heatmap that I published in 2020 after several thousand additional miles in the District. I’ve been told the premium version of the Strava run tracking phone app will make a similar map for you. This data did come from Strava, but I’ve created my own map on the cheap using a pinch of guile (read: R and Excel).

This version of the map features a larger geographic area to capture some of the more adventurous runs that I have taken to places like the National Arboretum, the C&O Trail, and down past DCA airport. The highest densities of runs are along the main streets around downtown, Dupont Circle, and the National Mall. Rosslyn and south of the airport were two areas that I wanted to explore more in my 2020 post, both of which show up in darker red on this new figure.

To complement the geographic distribution of runs is a visual distribution of average pace vs. run distance. This was a challenge to analyze because the data and dates were downloaded in Spanish (don’t ask), but eventually I was able to plot each of the 798 (!) DC-based runs over the past five years. The runs in 2023 are highlighted in red.

Luckily, the analysis reveals that many of the fastest runs for each distance are in the past 6 months. Progress! My rededication to running at a more relaxed pace for my five mile recovery runs is also visible. The next step is to fill in the space this figure with red dots for the rest of 2023 with the ultimate goal of placing a new dot in the bottom right corner by 2024. Wish me luck.

0 Comments

Visualization of AAAS STPF Trends – 2023 Edition

6/28/2023

0 Comments

Back in January of 2020, when I was a brand-new AAAS Science and Technology Policy Fellow (STPF), I wrote a blog post centered around a plot of the history of the STPF. That figure contained data from all the way back in 1973 through 2019. Today’s post is an updated revisit to that post using the most recent trends in the fellowship. If you’re curious about the history of the AAAS STPF, I recommend checking out this timeline on their website and revisiting the previous version of this figure. Here’s the updated figure:

One major event has happened since the last update: the COVID-19 pandemic. This has not appeared to have had a dramatic affect on the trends in 2020 or after. In fact, the fellowship has recovered well following the pandemic with a record high number of fellows in 2022 (297). This was driven in part by increases in the number of fellows at State, USAID, and in “Other” agencies in the past three years. Since the previous analysis, three new agencies have received fellows for the first time: the Department of the Treasury, the Architect of the Capitol, and AAAS itself.

The AAAS STPF appears to be on an upward and healthy trajectory, unhindered by the global pandemic or any of our other global crises. One former fellow asked to comment on the updated figure said, "it shows the continued importance of science that drives solutions to global crises." Well said.

Coda: This is a quick post to update the trends. If nothing else, it keeps my R skills up to date and gives the new fellows the lay of the land. Since that original post, I've written two other AAAS STPF-themed posts on placement office retention and my own knowledge gained through the fellowship. Check those out to learn more about the fellowship.

0 Comments

AAAS STPF Redux

5/2/2020

2 Comments

Previously, I presented information on the agency composition over time of American Association for the Advancement of Science (AAAS) Science and Technology Policy Fellows (STPF). This analysis was based on the data collected from the publicly available database of AAAS fellows. The goal of the program is to place Ph.D.-level scientists into the federal government to broadly increase science- and evidence-based policy. The STPF program consists of both executive and legislative branch fellows. Executive branch fellows enter into a one-year fellowship, with the option to extend their fellowship after their first year. Additionally, fellows can reapply to transition their fellowship to other offices around the federal government.

To extend my previous analysis, I re-analyzed the data to discover the "destiny" of first-year fellows. How many fellows chose to stay in the same agency after their first year, leave the program, or switch agencies?

The number of fellows that “Do Not Switch” and remain in the same office for both years, has been trending upwards over time. Over the past five years, on average, 67.3 percent of fellows completed both years of their fellowship in the same agency. The remaining 32.7 percent either left after the first year or switched to a different agency. Reasons for this increasing trend are not discernible from the data. More and more offices (or even fellows) may be treating this as a full two-year fellowship rather than a one-and-one. Early executive branch fellows only had a one-year fellowship, leading to the dramatic 100 percent program exit after a single year.

Historically, the percentage of executive branch fellows who switch offices between their first and second years has been low and has never exceeded nine percent. However, the year with the largest number of switches was recent. The largest total number of switches was by fellows who began their first year in 2018, nine (8.1 percent) switched agencies. That year, six of the nine fellows who switched departments moved into the Department of State. The largest number of one-year switches out of a single agency was four.

The data were not analyzed for office switches within the same agency and only inter-agency transfers are recorded as changes. This may lead to under-reporting of position switches. Additionally, some fellows who leave the STPF program after their first year may have received full-time positions at their host agency. While they are no longer fellows, they did not actually leave leave their agency and conceivably could have stayed for a second year of the STPF program there.

Unlike the executive branch fellowship, AAAS’s legislative branch fellowship is only a one-year program. However, some legislative branch fellows apply for, and eventually receive, executive branch fellowships. Over the history of the fellowship program 106 fellows have moved directly from congress into an executive branch fellowship the following year. This is out of the total 1,399 congressional fellows. The previous five-year average for fellows moving from The Hill to the executive branch is 7.2 fellows per year or 36 total fellows in the past five years.

Overall, it appears that the AAAS STPF program’s opaque agency-fellow matching process is doing an increasingly good job of helping fellows find agencies where they are happy to live out the full two years of their executive branch fellowship experience.

2 Comments

How to Maximize your OPSEC Vulnerabilities Using Strava

4/18/2020

1 Comment

In these “strange times,” running has become a lifeline to the outdoors. It is one of the few legitimate excuses to venture outside of my efficiently-sized apartment. I started running in graduate school to manage stress and, even as my physical body continues to deteriorate, I continue to use running to shore up my mental stability. As the severity of the COVID-19 situation raises the stress floor across the nation, maintaining--or even developing--a simple running routine is restorative.

I use the Strava phone app to track my runs. This app records times and distance traveled which is posted to a social-media-esque timeline for others to see. I choose this app after very little market research, but it seems to function well most of the time and is popular enough that many of my friends also use it. My favorite feature of the app is the post-run map. At the end of each session, it shows a little map collected via GPS coordinates throughout my jog.

This feature is not without its flaws. In 2018, Strava published a heatmap of all its users’ data, which included routes mapping overseas US military bases. Publishing your current location data is a huge operational security (OPSEC) violation. Strangers could easily identify your common routes and even get a good idea of where you live. I recommend updating your privacy settings to only show runs to confirmed friends.

With all that said, I wanted to create my own OPSEC-violating heatmap. Essentially, can I plot all of the routes that I have run in the past 18 months on a single map? Yes! Thanks to the regulations in Europe’s GDPR, many apps have made all your data available to you, the person who actually created the data. This includes Strava, which allows you to export your entire account. It is your data so you should have access to it.

If you use Strava, it is simple to download all of your information. Just login to your account via a web browser, go to settings, then my account, and, under “Download or Delete Your Account,” select “Get Started.” Strava will email you a .zip folder with all of your information. This folder is chock full of all kinds of goodies, but the real nuggets are in the “activities” folder. Here you will find a list of files with 10-digit names, each one representing an activity. You did all of these!

These files are stored in the GPS Exchange (GPX) file format, which tracks your run as a sequence of points. The latitude and longitude points are coupled with both the time and elevation at that point. Strava uses this raw information to calculate all your run statistics! With this data an enterprising young developer could make their own run-tracking application.

But that’s not me. Instead, I am doing much simpler: plotting the routes simultaneously on a single map. Here is what that looks like:

Again, this is a huge OPSEC violation so please do not be creepy. However, the routes are repetitive enough that it is not too revealing. Each red line represents a route that I ran. Each line is 80% transparent, so lighter pink lines were run less frequently than darker red lines. You can see that I run through East Potomac Park frequently. Massachusetts Avenue is a huge thoroughfare as well. I focused the map on the downtown Washington D.C. area. I used the SP and OpenStreetMap packages in R for plotting.

The well-tread paths on the map are not really surprising, but it does give me some ideas for ways to expand my route repertoire. My runs are centered tightly around the National Mall. I need to give SW and NE DC a little more love. I should also do some runs in Rosslyn (but the hills) or try to head south towards the airport on the Virginia side of the river.

What did we learn from this exercise? Very little. This is an example of using a person’s own available data. What other websites also allow total data downloads? How can that data be visualized? Make yourself aware of where your data exists in the digital world and, if you can, use that data to learn something about your real world.

My R code is available on GitHub.

Note: Eagle-eyed readers may be able to identify a route where I walked across water. Is this an error or am I the second-coming? Who can say?

1 Comment

Visualization of AAAS STPF Trends

1/5/2020

0 Comments

Since 1973, the American Association for the Advancement of Science (AAAS) has facilitated the Science & Technology Policy fellowship (STPF). The goal of the program is to infuse scientific thinking into the political decision making process, as well as developing a workforce that is knowledgeable in both policymaking and science. Intuitively, it makes sense to place evidence-focused scientists in the government to support key decisions makers.

Each year doctoral-level scientists are placed throughout the federal government for one to two year fellowships. Initially the program placed scientists exclusively in the Legislative branch, but as the program grew, placements in the Executive branch became more common. In 2019, hundreds of scientists were placed in 21 different agencies throughout the federal government. As one of those fellows, I wanted to create a Microsoft Excel-based directory of current fellows. However, what began as a project to develop a simple CSV file turned into a visual exploration of the historic and current composition of the AAAS STPF program. Below are some of my observations.

Data was collected from the publicly available Fellow Directory.

In the beginning of the STPF program, 100% of fellows were placed in the Legislative Branch. This continued until the first Executive branch fellows around 1980 were placed in the State Department, Executive Office of the President (EOP), and the Environmental Protection Agency (EPA). In 1986, the number of Executive Branch fellows overtook the number of Legislative Branch Fellows for the first time.

Since those initial Executive Branch placements, fellows have found homes in 43 different organizations. The U.S. Senate has had the largest total number of fellows while the U.S. Agency for International Development (USAID) is the Executive Branch agency that has had the most placements. Unfortunately, for the clarity of the figure, agencies with fewer than twenty total fellow placements were grouped into a single "other" category. Despite the mundane label, this category represents strength and diversity of the AAAS STPF. The "other" category encompasses 25 different agencies including the Bureau of Labor Statistics, the World Bank, the Bill and Melinda Gates Foundation, and the RAND Corporation. In 2017, fellows were placed in 24 different organizations, the most diverse of any year.

The total number of fellows has dramatically increased over the past 45 years (as seen in the grey bar plot at the bottom of the figure). The initial cohort of congressional fellows in 1973 had just seven enterprising scientists. Compare that to 2013 when a total of 282 fellows were selected and placed. This year (2019) tied 2014 for the second highest number of placements with 268 fellows.

One of the most striking observations is the trends in placement at USAID. In 1982 USAID began to sponsor AAAS Executive Branch fellows, with one placement. Placements at USAID quickly grew, ballooning to over 50% of total fellow placements in 1992. However, just as rapidly, the placement fraction at USAID decreased during the 2000s despite only a small increase in the overall number of fellows. This trend ultimately began to reverse in 2010, and a large increase in the total number of fellows found placement opportunities at USAID. The reader is left to craft their own explanatory narrative.

One thing is clear from the data: the AAAS STPF is as strong as it has ever been. Placement numbers are close to all-time highs and fellows are represented at a robust number of agencies. Only time will tell if the experience these fellows gain will help them achieve the program's mission "to develop and execute solutions to address societal challenges."

If you want to learn more about the history of the STPF, including statistics for each class, AAAS has an interactive timeline on their website.

An unexpected surprise during the analysis was the discovery that Dr. Rodney McKay and John Sheppard (both of Stargate Atlantis fame) were STP fellows. Or--more likely--the developer for the Fellows Directory was a fan of the show. Unfortunately, as a Canadian citizen, Dr. McKay would be ineligible for the AAAS STPF.

0 Comments

FDA New drug Approval Trends 2019

7/9/2019

0 Comments

A little over halfway through the year and the US Food and Drug Administration (FDA) appears to be on track for either a big year of new drug approvals or....not. The number of new molecular entities (NMEs) approved by FDA's Center for Drug Evaluation and Research (CDER) are equal to the number approved at this point of the year in 2016 and only two product apporvals behind both 2018 and 2015. Despite starting the year off with the longest federal shutdown in history the FDA is keeping pace with past years.

However, the figure demonstrates another important fact: approval numbers mid-year do not correlate strongly with year-end approvals. While the number of approvals were similar in 2016 and 2018, the end year totals were wildly different. In 2018, CDER approved a record 59 NMEs while 2016 approved less than half of that number. Additionally, in 2017, the number of NME approvals at mid-year was much higher than any other year, but finished in line with the number of approvals in 2015 and well below the number of approvals in 2018.

It seems that the future could go either way. There could be a dramatic up-tic in CDER approval rate as in 2018 (perhaps from shutdown-delayed applications) or the rate could slow to a crawl like in 2016.

0 Comments

84 Lumber Won the Advertisement Game in Super Bowl LI

2/10/2017

1 Comment

Remember Super Bowl LI you guys? It happened, at minimum, five days ago and of course Tom Brady won what was actually one of the best Super Bowls in recent memory. Football, however, is only one half of the Super Bowl Sunday coin. The other half are the 60 second celebrations of capitalism: the Super Bowl Commercials. Everyone has a list of favorites. Forbes has a list. Cracked has a video. But it is no longer politically correct in this Great country to hand out participation trophies, someone needs to decide who actually won the Advertisement Game.

To tackle (AHAHA) this question I turned to the infinite online data repository, Google Trends, which tracks online search traffic. Using a list of commercials compiled during the game (AKA I got zero bathroom breaks) I downloaded the relative search volume in the United States for each company/product relative to the first commercial I saw for Google Home. [Author’s note: Only commercials shown in Nebraska, before the 4th quarter when my stream was cut, are included]. Here’s an example of what that looked like:

!The search traffic for a product instantly increased when a commercial was shown! You can see exactly in which hour a commercial was shown based on the traffic spike. Using the traffic spike as ground zero, I added up search traffic 24 hours prior to and after the commercial to see if the ad significantly increased the public’s interest in the product.

Below is a plot of each commercial, with the percent of search traffic after the commercial on the vertical axis and the highest peak search volume on the horizontal. If you look closely you will see that some of them are labeled. If a point is below the dotted line the product had less search traffic after the commercial than before (not good).

On average 86% of products had more traffic after their Super Bowl ad than before it. But there are no participation trophies in the world of marketing and the clear winner is 84 Lumber. Damn. They are really in a league of their own (another sports reference!). Almost no one was searching for them before the Super Bowl but oh boy was everyone searching for them afterwards. They used the ole only-show-half-of-a-commercial trick where you need to see what happens next but can only do that by going to their website. Turns out its a construction supplies company

Pizza Hut had a pretty large spike during their commercial, but it actually was not their largest search volume of the night. Turns out most people are searching for pizza BEFORE the Super Bowl. Stranger Things 2 also drew a lot of searches for obvious reason. We all love making small children face existential Lovecraftian horrors.

Other people loved the tightly-clad white knight Mr. Clean and his sensual mopping moves. The Fate of the Furious commercial drew lots of searches, most likely of people trying to decipher WTF the plot is about. Finally there was the lovable Avocados from Mexico commercial. No one was searching for Avocados from Mexico before the Super Bowl, but now, like, a couple of people are searching for them. Win.

So congratulations 84 Lumber on your victory in the Advertisement Game. I’m sure this will set a dangerous precedent for the half-ads in Super Bowl LII.

1 Comment

Football Game in One Graphic

10/24/2016

1 Comment

It’s possible to find play-by-play win probability graphs for every NFL game, but that does not tell me much about how the game itself was played. Additionally, I only sporadically have time to actually WATCH a game so using play-by-play data, R, and Inkscape I threw together this visualization of every play in this past Sunday’s game between the Kansas City Chiefs and New Orleans Saints. Why isn’t this done more often?

1 Comment

Social Network Dynamics in Pharmacology

4/24/2016

1 Comment

The Department of Pharmacology at Creighton University School of Medicine is small, but mighty. There are only 10 professors or principal investigators (PIs) in the department, but this small size has its advantages. Or at least that is what we tell ourselves. A recent paper in Nature argued that bigger is not always better when it comes to labs and we are putting that to the test. Ideally with a smaller faculty, there would be more collaboration. Everyone knows what everyone else is doing, more or less, so they can more efficiently leverage the various expertise found throughout the department.

To measure how interconnected the pharmacology department was I created a network analysis visualization based on who published with whom. Using NCBI’s FLink tool I downloaded a list of the publications in the PubMed database for each PI in the pharmacology department at CU. A quick script in R formatted the authors and created a two-column “edge list” for each author, basically a list of every connection. This was imported into the free, open-sourced network analysis program Gephi which crunched the numbers and produced a stunning map of the connections in the pharmacology dept:

Gephi automatically determines similar clusters (seen as different colors) which are unsurprisingly centered on the various PIs in the department since those are the publications I was looking at. Dr. Murray, the department chair, has the most connections, also known as the highest degree, at 292, followed by Dr. Abel. Drs. Dravid and Scofield are ranked 2nd and 3rd respectively for betweenness centrality, after Dr. Murray. They are the gatekeepers that connect Drs. Abel, Bockman, and Tu to Dr. Murray. Each point’s size is proportional to its eigenvalue centrality, similar to Google’s Pagerank metric of importance.

I was a bit surprised at how disperse the department was. 60% of the PIs could be connected, and many have strong relationships. However the rest are floating on their own islands. Dr. Oldenburg is relatively new so this is not surprising. The Simeones (who are married) are closely connected. Also unsurprising.

This was a quick and dirty analysis and a few of the finer points slipped through the cracks. Some of the names are common in PubMed (especially Tu). so I did my best to filter what was there and only look at publications affiliated with Creighton. Unfortunately this filters out publications from other institutions by the same author. Also not everyone is attributed the same way on every manuscript. This is especially true for Drs. KA Simeone and Gelineau-Van Waes who have published under different last names, but also because sometimes a middle name is given and sometimes it is omitted. I tried my best to standardize the spellings for each PI, but with over 700 nodes I could not double check every author to ensure there were not duplicates elsewhere. If more than one PI shows up on a paper, that paper may show up under both searches. This should not increase the number of edges, but would affect the “strength” of those connections.

The connections are about what I had imagined. The brain people are on one side, everyone else is on the other. Expanding the search to include the papers from coauthors outside of the pharmacology department might discover more interesting connections. Just for fun I went ahead and pulled the data for every paper on PubMed with a Creighton affiliation. I could not even find my department on the visualization without searching for it. It is massive. The breadth of Creighton’s interconnected-ness forces me to marvel at how vast the community of scientists must truly be. So many people working to improve the body of knowledge of the human race. We are really just small bacteria in a very large petri dish.

1 Comment

<<Previous

Cold City Surprises

District Running Recap

Visualization of AAAS STPF Trends – 2023 Edition

AAAS STPF Redux

How to Maximize your OPSEC Vulnerabilities Using Strava

Visualization of AAAS STPF Trends

FDA New drug Approval Trends 2019

84 Lumber Won the Advertisement Game in Super Bowl LI

Football Game in One Graphic

Social Network Dynamics in Pharmacology

Archives

Categories

Home

Projects

Blog