Crossing the Manhattan bridge (or applying single-molecule models to transit data)

About a year ago I moved from the Upper East Side (where Rockefeller University and my research job are located) to Brooklyn. This move came at the expense of a much longer commute using the Q line of the infamous NYC subway system. In recent years subway delays had got so out of hand that in 2017 New York Governor Andrew Cuomo declared a state of emergency to accelerate repairs of the decaying system. Even…

Continue reading

Analysis of one month of subway data – part 2: correlation between the total number of trains and the probability of delays

When I first decided to track trains in the NYC subway system I was sure I would find a correlation between the number of trains in the system and the percentage of trains running with delay. I was also sure it would be a positive relationship. The more trains there are in the system, the more traffic jams, and the more delays one should find — simple, right? Surprisingly, as we will see in the…

Continue reading

One month of subway tracking – part 1: an overview of the recorded data

What is the best time to catch a subway to beat rush-hour delays? What predictors have the largest impact on whether your train will get to its destination on time? To answer these questions I have been tracking the positions of all trains in the NYC subway system for several months in 20 s intervals. From these data, trains that run with delays can be identified, and, after some data-wrangling (not shown here), the following…

Continue reading

Installing CUDA and cuDNN on openSUSE Tumbleweed

SuSE Linux 6.2, released 20 years ago, was my first experience with a linux-based operating system. Even though I soon thereafter switched to Gentoo and later Ubuntu, I have always retained a soft spot for SUSE — so when I recently built a new data science machine I decided to give openSUSE Tumbleweed a try. Tumbleweed is a rolling release distribution, providing frequent updates of all packages without making you update the entire distribution in…

Continue reading

Towards a realtime map of subway delays

I have long wished a map of the NYC subway system existed that showed delays in realtime, much like modern mapping software displays delays on street maps. Using our understanding of the states a subway line can exist in (including the current state’s mean transit times between stations and their standard deviations) and our ability to plot the NYC subway lines, we are now in a position to build such a map. Since the rest…

Continue reading