Context
As part of a project for a studio course at Harvard, I worked with Bradley Scott to visualize and understand complex data from Mexico City’s open data repository. We looked at a graphical way to analyze a problem that is normally viewed solely as charts and reports. We created an interactive data visualization that allows you to go through different municipalities and gauge their risk factors. We completed the entire project from start to finish in 5 weeks.
Programs and tools used
The project was created in p5.js and designed within Figma.
Process
We began with data collection through the open data repository and started combing through datasets to see how much missing data there was. Thankfully, Mexico City’s data repository is very well-maintained and most of the data returned a value that wasn’t ‘NA.’
We then took a look at all of the datasets that were available and tried to figure out which datasets would allow us to separate different ways to look at the water crisis. The following image shows the collected datasets. One thing of note is that inherently by aggregating our own dataset of datasets, we are introducing certain bias to the data visualization. Furthermore, geological risk is made up of three different datasets, meaning that each risk within that category is less weighted in total.
If we were to re-do this assignment, we’d try to aggregate more datasets in the same category as well as adding more sociogeographical factors such as governance and social vulnerability indices. I think that it would give us a lot more fidelity.
Looking at the second category, we are only looking through the outbound cashflow per municipality but I think that it would be more robust if we looked at the total spend as well as employment rates for example.
Data analysis was conducted through preprocessing and analysis in python. Given that all of these datasets were collected by different teams, the naming conventions and metrics were slightly different. We went through and made sure that everything was the same. I was able to use a lot of the techniques that I was studying in my machine learning/data science class. One of the things that I wish I had done was to log-normalize the data instead of doing it between 0 and 100 because you lose a lot of fidelity in that way. However, I think given our representation of the work, it was an acceptable choice. More fidelity might’ve meant having way more squares and that might’ve started to look overwhelming.
Video walkthrough
p5.js program that renders through their editor. We created 3 different interactive screens that allow you to look at the problem at different altitudes.
Stills of the program
Overview
This view allows you take a look at the entirety of the city in one go. By abstracting the features of the city into squares, we are trying to give you the chance to focus on the contents of the city without being affected by prior conceptions of the different neighborhoods.
Single selection
This view allows you to see any municipality in larger detail. In the future we’d like to add a way to hover over each square and be able to get more information!
Comparison sankey diagram
The third view is a sankey diagram that allows you to directly compare the inequities between two municipalities.
Here we have older versions of the program. We originally started with a very technical design style. We wanted to mimic a dashboard that you might see in a spaceship. However, as we received feedback, we quickly realized that it was much too heavy and was detracting from the message.
I started doing more research for precedents and found these normalized maps. We realized that there was an opportunity to strip back the entire project and luckily for us, there are 16 municipalities in Mexico City which meant a very nicely laid out grid!
The following are our earlier sketches of what this might’ve looked like in another world. Maybe one day I’ll get my dream of having a cool geometric seismic map.