EO Dashboard Hackathon

MLJC | A Comparative Analysis

Awards & Nominations

MLJC has received the following awards and nominations. Way to go!

Global Finalist

A Comparative Analysis

The COVID-19 pandemic has had different impacts in different regions of the world. Your challenge is to perform a comparative analysis of the pandemic’s economic impacts in urban areas for the USA, Asia, and Europe using the EO Dashboard.

How was the lockdown from up there? A cross-country comparison in air quality

Summary

2020: under the pandemic wave, the world slowed down. Industries and transports were heavily impacted and beyond the effects on private lives and social interactions, the pandemic had repercussions on the environment.The question leading us was how much of these changes we could actually see in data from satellite. We focused on variations in greenhouse gases (SO2, NO2, O3, CO) and ozone, and on the differences of these variations across different world countries. To this end we performed a time-oriented analysis on time series and a complementary geography-oriented analysis, based on unsupervised feature extracting, to highlight differences between countries.

How I Addressed This Challenge

Our project produced two main elements:

A geography-oriented parameter of "emission changes similitude" based on a machine-learning black box model -- in particular an autoencoder -- which differentiates between countries based on a global structure of the emission time series;
A description of pattern anomaly in the emission time-series for the pandemic period, which compares the observed data to a sensible prediction for that period based on previous years.

Both these theoretical achievements have a visualization counterpart and are easily implementable on the dashboard. One of the major points of strength is the two-levels resulting interface, which guides intuitively the user interaction: at the first step the user is showed a map with a color-coding of countries, showing in the same color the clusters of "similar" emission series. Then, clicking on single countries, actual time series are shown in a pop-up window.

How I Developed This Project

Patterns:

Our guiding star was the awareness that data is nothing without patterns, and patterns are what we, as human beings, actually see and feel -- and give meaning to. In practice, this gave us two research directions: the need for pattern-focused techniques for data analysis and the challenge of graphically represent pattern similarity across countries. We aimed to keep these components together in a fruitful entanglement between analysis and visualization.

To analyze the data pattern we selected two complementary routes, the one space-oriented and the other time-oriented:

An autoencoder architecture to reduce time series to a bunch of scalar values representing emission changes during a given period; such scalar values could then be used to differentiate between countries.
A country-specific time series study to understand the change in emission pattern during the pandemic.

These routes correspond, in fact, to the journey of the user in the dashboard, starting from a description of the whole globe and then specializing to a certain situation, a certain place.

Our data:

Our source for data was the Copernicus Sentinel-5 level-2 (S5PL2). The data retrieving was performed thanks to the built-in functions in the xcube_sh library that allow a trivial request of these data. The selected air variables were methane (CH4), Carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3), nitrogen dioxide (NO2). These data have the important feature to be globally distributed allowing them to be reproducible for all the states around the globe. Other data are available such as AER_AI_340_380. However, for a question of time the previous data were not included in the current analysis despite they could be employed in future work. This was the reason behind the choice of these parameters. Another important point to highlight is the use of a JSON file to be able to acquire a geometry shapefile of the selected country. Is worth mentioning the fact that these countries' area definitions were changed to be included in a rectangular box bringing a sort of approximation in our data. The further effort could be made to take into account the correct shape of countries. In this sense, some function in xcube_sh could be employed.

Space patterns:

Autoencoders are neural networks able to learn, without supervision, a compressed representation of the input. After training, they can convert the input into a given number of features, corresponding to neurons in their central "bottleneck" layer. Our autoencoder converts the whole time series for all the gases to an array of just five features. Perhaps surprisingly, the resulting representation is quite faithful. The unsupervised approach is particularly fit to our case since we do not know, a priori, the fine-tuning the model proved itself to be a challenge; moreover, the code was quite heavy on RAM.

Fixed a time span, we could extract a bunch of values representing the whole evolution of our variable during that period: we could then observe how different countries clustered in the space resulting from these values.

Visualization would then immediately follow from an appropriate color-encoding of clusters, which would be implemented on a planet chart.

Time patterns:

To investigate the time-dependent changes in gas emissions for single countries we used Prophet. The prophet is open source software released by Facebook's Core Data Science team https://github.com/facebook/prophet.

It allows forecasting time series data fitting with yearly, weekly, and daily seasonality, plus holiday effects. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well. The idea was simple: selecting a single country, we would extract the trend, both global and seasonal, in emissions from the pre-CoViD years, predict the trend for 2020, and then compare it to the observed data. The main risk resided in the relatively short pre-CoViD time series, which amounted to less than two years and had relatively wide seasonality; however, results proved to be promising. Observed data during CoViD were clearly outside the uncertainty boundaries associated with the predicted trend, suggesting the pandemic strongly affected gas emissions related to economic activity. Moreover, it is worth to mention we observed CoViD brought data back the previous year for certain countries (such as China).

How I Used Space Agency Data in This Project

We involved data coming from the Copernicus Sentinel-5 level-2 (S5PL2). We chose it because it contained global data. We had the chance to extract data from 2018 to 2020 and analyze it, as described previously.

Using this dataset was easy, since we involved the euro data cube interface to apply our analysis.

Project Demo

The project presentation with the slides can be found on the github page: https://github.com/beavillata/EOChallenge

https://github.com/beavillata/EOChallenge/blob/main/hackathon_esa_pres.pdf

Here you can also find the code we used during the hackathon.

We also created a wetransfer link to download the presentation: https://wetransfer.com/downloads/d4f9d09f693e991e35ac419193455dd520210629221015/8b5773500db0cf83496dd168f48c7fe020210629221049/f436da

Earth Observing Dashboard Integration

We developed an HTML map using the folium python library to simulate a feasible implementation in the EODashboard. This type of map seems to us a simple example compared to the Dashboard but represents well the possible features to implement on it. First of all the difference between the mean during the pre-covid period and the current pandemic period has been introduced. A code color could be added to the current Dashboard. Thanks to the Prophet library, a slope of the previous variables can pop up from the map. Moreover, a prevision made by these libraries enables to make a comparison between the business-as-usual trend, e.g. such as the pandemic never happened, and the current trend. Last but not least, the most important feature is the number that identifies the cluster membership of every state. This can be added to the dashboard as a new global variable to distinguish the similarities between countries in the effect of the pandemic.

Data & Resources

We used Copernicus Sentinel-5 level-2 products in order to find data for CH4, CO, NO2, O3 and SO2.

We also involved Prophet, a open source software released by Facebook's Core Data Science team https://github.com/facebook/prophet.

Additionaly, folium was used to build a map for visual representation of the dataset.

Judging

This project has been submitted for consideration during the Judging process.