From Ocean Science to COVID-19: An Algorithm Finds a New Purpose
07/20/2020 What began as an effort to help scientists study the ocean has evolved into an effort to study a global pandemic.
In 2015, Chaowei Phil Yang, a professor at George Mason University in Fairfax, Virginia, set out to develop a new Google-like search portal to help scientists effectively search and discover oceanographic data collected by satellites, airplanes, and water-based sensors.
The portal, called Mining and Utilizing Dataset Relevancy from Oceanographic Dataset (MUDROD) uses machine learning algorithms to better understand and respond to its users’ queries about the ocean. MUDROD was funded by NASA’s Earth Science Technology Office and made available to the public in 2017.
Approximately three years later, when cases of COVID-19 began to spread rapidly around the world, Yang turned to MUDROD’s algorithms to analyze the growing data about the deadly virus. Instead of learning oceanographic lingo, MUDROD’s machine learning algorithms are learning about COVID-19.
Yang and his team updated MUDROD’s algorithm for the COVID-19 Spatiotemporal Rapid Response Gateway to pull information and trends out of COVID-19 data and news stories. The gateway helps users explore how death and infection rates are changing in real-time, how different countries are responding to quarantine restrictions, how people changed their travel patterns in response to government guidelines, how companies changed their hiring and layoffs during lockdowns, how reporters are covering the pandemic, and how air quality changed during the winter, spring and summer of 2020.
The gateway’s algorithms parse multiple public health and government datasets, as well as peer reviewed scientific publications and news reports around the world. “Our machine learning algorithm, at its core, aims to solve a big spatiotemporal data problem,” Yang said.
Exploring ocean data with machine learning
MUDROD was designed to specifically search a massive amount of ocean and climate data stored at the Physical Oceanographic Distributed Active Archive Center (PO.DAAC).
The PO.DAAC is a NASA Earth Observing System Data and Information System data center operated by the Jet Propulsion Laboratory in Pasadena, California. The center began archiving oceanographic data in 1978, when Seasat, NASA’s first ocean-observing satellite, launched. Since then, the PO.DAAC has accumulated millions of remotely sensed measurements on sea surface topography, ocean temperature, ocean winds, salinity, gravity, and ocean circulation.
MUDROD built its powerful machine learning algorithms by analyzing how PO.DAAC users were searching for data and asking questions, and clicking on results on the PO.DAAC websites. The end result? If a user were to ask MUDROD similar questions for data on ocean wind, its algorithm would learn to interpret the user’s desire to learn about ocean wind, surface wind, and even a mackerel breeze (a breeze strong enough to ruffle water and aid fishermen on the hunt for mackerel fish).
Additionally, “if a user searches a term and ends up clicking on an Ozone Monitoring Instrument (OMI) dataset, MUDROD might also pull up a dataset from the Geostationary Operational Environmental Satellite (GOES),” Yang said. That GOES data may be exactly what the user wanted but didn’t know how to find.
MUDROD is now a web portal that has helped lead to numerous publications in journals and at conferences. MUDROD’s machine learning capabilities were also incorporated into a newer project called OceanWorks, which aims to allow users to not just pull relevant ocean data but to analyze and create visualizations with the data, too, Yang said.
Different question, similar algorithm
An unexpected continuation of the MUDROD project is its application to the COVID-19 pandemic. Like oceanographic data, COVID-19 data come from multiple sources around the world.
“We are converting text about COVID-19 into vectors and analyzing those vectors using machine learning tools,” said Yun Li, a PhD candidate studying Earth Systems and Geoinformation Sciences at George Mason University. Similar to how the team tackled ocean data, it’s now pulling out COVID-19 related tweets and turning those tweets into numerical values in order to search and rank the data.
The end results are displayed on the new COVID-19 gateway, which is updated in real-time and supports rapid response to the pandemic. The gateway is part of the National Science Foundation’s Spatiotemporal Innovation Center and supported in part by NASA as well as various other government and non-government entities.
“We’re currently trying to customize machine learning methods used in MUDROD to predict fatality for people who have tested positive for COVID-19,” said Melanie Alfonzo Horowitz, an undergraduate student studying chemistry and biophysics at Johns Hopkins University in Baltimore, Maryland. Horowitz is planning to attend medical school upon graduation.
The algorithm is able to synthesize randomized health data, including age, sex, location, symptoms and chronic diseases, to provide guidance to physicians and hospitals, which are often operating at capacity and with limited resources.
One of the challenges in analyzing COVID-19 data to predict fatality is that the death rate is currently less than five percent. “We’re exploring different ways to analyze symptoms in order to strengthen the algorithm,” Horowitz said. The team is also working to widen their dataset to consider more symptoms and underlying health issues.
The team’s next steps will be to apply the algorithm to help predict the probability of patients needing hospitalization. “We want to help medical staff allocate their resources most effectively,” Yang said.