EO Dashboard Hackathon

Team-Tech 2.0 | A Comparative Analysis

A Comparative Analysis

The COVID-19 pandemic has had different impacts in different regions of the world. Your challenge is to perform a comparative analysis of the pandemic’s economic impacts in urban areas for the USA, Asia, and Europe using the EO Dashboard.

“Comparative Analysis” – ¡Regression REMIX!

Summary

Data mining and Visualization,Linear Regression, unsupervised Machine Learning model, Public health - COVID-19 impact,Social mobility - max/min aggregation, worst- and best-case responses from the citizenry,Filling EO Dashboard Gap for financial economic - NASDAQ stock prices as prototype,Air pollution - NO2, spatially averaged - bounding boxes,Strategic Areas of Interest (SAOIs) = ASIA (Karachi, Tokyo) ; Europe (Berlin, London) ; USA (Los Angeles, New York)

How I Addressed This Challenge

¿What did Team-Tech 2.0 develop?

We developed a database of daily observations: Madrid, Karachi, Tokyo, New York, Berlin, Miami, Mumbai- and so many more!!! We found RIGHT-CLICK and OPEN IN NEW TAB menu options work best to follow these links... thanks!
We wrote/tested code TokyoModel.py - Tokyo.pickle - LondonModel.py - London.pickle - LosAngelesModel.py - LosAngeles.pickle - Book1.xlsx Remember! Please right-click and use a new tab or window to most stably view these code and model files. Note: Book1.xlsx is a database that supports all of our models
The user can analyze regression models for themselves: INTERCOMPARE_Los_Angeles_Karachi_Berlin and INTERCOMPARE_London_New_York_Tokyo
Through six (6) intense days we developed a strong group dynamic, which let us work together as an international team - effective, innovative, and trying to be inclusive

¿Why is our project important?

Leveraged data to answer questions about public health and the variability of socio-economic indicators BOTH pre- AND post-Covid-19
Illuminated potential damages over-fast recovery poses to our environment
(BEFORE) What was Earth like in past decades before?
(DURING) How was the modern world impacted by the arrival of a pandemic?
(AFTER) Air pollution post-Covid-19 might rise back to or exceed prior levels, if we are not cognizant of human impact

¿What does our project do?

Provides the framework to prove under different time spans and SAOIs to prove which proxies best characterize this complex event
Assumes economic data will move proportionally with NO2 air pollution
Demonstrates proof of the best linear regression model for NO2 based on community mobility and Covid-19 data

¿How does Regression REMIX work?

Combining ~2,250 lines of codes in Python and R,
It constructs one CSV database of yearly resolution data (pre-Covid-19) and another for daily resolution data (during and post-Covid-19).
Then, a classic linear regression - approachable and understandable - is computed,
By method of training on part of the data and testing on a disjoint subset.
Next, by specifying different cities across the globe (the SAOIs), its spatial analysis is consistent through time
In order to focus on short-term prediction of future daily observations
And create an opportunity to aggregate year-long predictions that contrast trends the yearly pre-Covid-19 data

¿What do we hope to achieve with Regression REMIX?

Pattern analysis of these data to reveal which human activities impacted air pollution and what steps we can take to prevent future damages by controlling a particular activity

How I Developed This Project

¿What inspired Team-Tech 2.0 to choose this challenge?

¡To create something in reach for everyone!
Contribute a unique understanding about the Covid-19 pandemic
As global citizens, we appreciated how the "Comparative Analysis" was framed to focus on global similarities during this struggle

¿What was Team-Tech 2.0's approach to developing this project?

Days 1, 2, and 3

<****> Consulted all data from NASA/ESA/JXA - an immense amount

<****> Evaluated what data might have been missing from the Dashboard: explicit financial/economic indicators

<****> Teamwork to engender a collective vision from our own perspectives of how the pandemic impacted each of our worlds

Days 2, 3, 4, and 5

<********> Our approach took advantage of ESA's NO2 database as the response variable to daily changes in the social economy during Covid-19

Day 6 and beyond...

<************> Intercomparing daily models across SAOIs

<************> Learning from yearly ANOVA model across SAOIs

<************> Discussing delivery of our model as a new EO Dashboard layer, methodology to make daily predictions, aggregate by year, and compare to historical

¿What tools, coding languages, hardware, software did you use to develop your project?

Linear regression (daily prediction), limited testing of an ANOVA statistical standpoint (yearly archive)
Python (sklearn, pandas, matplotlib, numpy, pickle) and R (tidyverse, readr, stats)
Hardware: Kept it simple with laptops - anyone can replicate/compute this solution - although Hassan melted his graphics card from hacking too hard :)
Software: Earth Observatory Dashboard, Excel for viewing CSV files, Internet access/browsers, RStudio, IDLE

¿What problems and achievements did your team have?

PROBLEMS: narrowing the scope from broad ideas, general data availability, reconciling schedules and time zones

¿What achievements did your team have?

ACHEIVEMENTS: a sklearn style model with 70-90% accuracy for daily NO2 prediction, synthesis and publication of interconnected data layers describing Covid-19, community mobility via a Max Min Not Mean approach, cross-cultural communication and a truly global solution

How I Used Space Agency Data in This Project

Space agency data played a crucial role in inspiring and facilitating ¡Regression REMIX! We first looked to the Earth Observation Dashboard as a way to calibrate how we wanted to perform the "Comparative Analysis"

This interface also gave us on Team-Tech 2.0 an idea how to effectively design, organize, and manage our own database. We needed to know how to take data from various sources and cast them into a logical framework.

Specifically, ESA's NO2 data layer provided the dependent variable to our model.

Also, NASA's and JXA's data describing mobility and the progression of Covid-19 served as independent variables on which to build the regression.

Finally, we identified a gap where the Dashboard would benefit from financial economic indicators. Regression REMIX prototypes how to include a financial economic indicator that is the daily NASDAQ stock variability: low/high/closing prices in $USD and volume flows. This financial economic gap could also be serviced by the per country Consumer Price Index from this source: Consumer Price Index for All Urban Consumers: All Items

in U.S. City Average (CPIAUCSL) | FRED | St. Louis Fed (stlouisfed.org) We DID NOT include this independent variable in the short-term model, but future testing involving a nearest neighbor interpolation from monthly to daily resolution would let us immediately test the impact on NO2 prediction

Project Demo

Please take a look at our project "demo" by clicking the following link:

REGRESSION REMIX PROJECT SLIDES

We also prepared a Spanish-language document about the impact of Covid-19 globally.

It contains some beautifully rendered figures showing data mined from owid-covid-data.xls

Análisis Escrito Inicio del Covid 19 vs hoy

Earth Observing Dashboard Integration

Our Team-Tech 2.0's ¡Regression REMIX! solution is nothing more than an additional data layer into the Earth Observation Dashboard. It would give users access to a globally optimized model that synthesizes various factors.

The Next Level/El Próximo Nivel

Another level beyond just adding a static layer would be to give users a simple control for linear mixing of other layers. Our work can help users answer questions like, "Does a mix of 60% mobility + 30% Covid-19 infection counts + 10% stock market prices for Apple give a good explanation of NO2 levels in the atmosphere?"

Step-by-step instructions for integration

1. Automation of the CSV generation process

2. Create a headerless version of our CSV generation code

3. Upload the CSV directly in the Git repository (https://github.com/eurodatacube/eodash/tree/staging/app/public/data/trilateral)

4. Integrate CSV database and regression result into the Dashboard display - another indicator menu option

5. Work on the linear mixing control so users can customize the mix

Data & Resources

ESA's NO2 - TROPOMI

Served as dependent model variable for

regression fitting

Spatial averaging via bounding box for Areas

of Interest

Mobility (https://www.google.com/covid19/mobility/)

Independent variable in short-term model

Downloaded regional CSVs, processed into SAOIs

If multiple counties/city sub-areas, then

used Max Min Not Mean methods to one point per day

Covid-19 dataset (owid-covid-data.xls)

Independent variable in short-term model

Mined the Excel file for SAOIs

and columns of interest

Stock prices

(https://www.nasdaq.com/market-activity/quotes/historical)

Independent variable in the short-term model

Used pandas to forward fill week-day only values

into daily time series

Judging

This project has been submitted for consideration during the Judging process.