For the Cloud Commuting Midterm, Jiwon, Haylee and I analyzed CitiBike in order to identify communities of stations based on number of trips between stations. By analyzing bike trips between stations, would we discover a pattern of stations connected to each other due to a high volume of trips between them, and what would this pattern reveal about current usage of the CitiBike system?
To identify “CitiBikeHoods”, we wrote a Python program to calculate number of bike trips between stations from August 2014 (the most recent data available to us.) Thanks to Salem Al-Mansoori for helping us write this program (code at end of post!)
Once we had our data, we imported to Gephi, which is a software program which created a Network Graph of the stations and identified communities of stations. Initially, we were hoping to write a program using a Python module called NetworkX in order to create the network graph (allows for more control in generating graph and identifying communities from the data), but we had trouble writing the program so opted to use Gephi.
The tricky part of importing our data to Gephi was giving the data to Gephi in a format Gephi liked. We had to rename our Origin Station and Destination Station columns (our nodes), “Source” and “Target” and the column containing the number of bike trips between the stations we renamed “Edge” since it would be the weight of the line connecting our nodes (stations!) We used a Network Graph to identify relationships between stations by grouping them into modules or “hoods.”
In the above graph, the outer yellow nodes are Brooklyn stations, the stations in the center are located in Lower Manhattan, the green stations are Midtown stations, the lower purple stations are Lower East Side/East Village while the upper purple stations are Murray Hill/FlatIron.
We then exported this data from Gephi in order to create a map using Mapbox to color in the stations according to the community the station belonged to. We chose to go with 63 communities but it might be better in the future to have less communities. View the code here. (Thanks to Adarsh Kosaru for helping us write code to color in our different communities with a randomly generated color in MapBox!)
What was immediately obvious in the mapping of communities was that Midtown is treated as one neighborhood, along with Brooklyn and Lower East Side, while other groupings of stations appeared to be random. In order to better understand the CitiBikeHoods, we compared to other maps of New York City. → Read more