Project Selections for AIST-21

28 Projects Awarded Under the Advanced Information Systems Technology (AIST) Program

05/04/2022 – NASA’s Science Mission Directorate, NASA Headquarters, Washington, DC, has selected proposals for the Advanced Information Systems Technology Program (AIST-21) in support of the Earth Science Division (ESD). The AIST-21 awards will provide novel information systems and computer science technologies to reduce the risk, cost and development time of NASA space- and ground-based information systems, to significantly increase the accessibility and utility of science data, and to enable advanced observation measurements and Earth Science information capabilities.

ESD’s Earth Science Technology Office (ESTO) evaluated 65 proposals and will award a total of 28 proposals with a 2-year period of performance. The total amount of all the awards is $31M.
NASA’s Advanced Information Systems Technology (AIST) Program identifies, develops, and supports adoption of software and information systems, as well as novel computer science technology expected to be needed by the Earth Science Division in the next 5 to 10 years. Currently, the AIST Program is organized around three primary thrusts, the New Observing Strategies (NOS), the Analytic Collaborative Frameworks (ACF) and the Earth System Digital Twins (ESDT) . Proposals were solicited in either one of these three thrusts or in several very advanced and promising software technology areas and were expected to explicitly show how the resulting technology would be infused into at least one of ESD’s science domains. The AIST Program anticipates the technologies in these proposals will mature at least one Technology Readiness Level with an eventual goal to demonstrate their value to the relevant science communities. The awards are as follows:

—-

Pixels for Public Health: Analytic Collaborative Framework to Enhance Coastal Resiliency of Vulnerable Populations in Hampton Roads, Virginia
Thomas Allen, Old Dominion University Research Foundation
Digital Twin Infrastructure Model for Agricultural Applications
Rajat Bindlish, NASA Goddard Space Flight Center
Sensor-in-the-Loop Testbed to Enable Versatile/Intelligent/Dynamic Earth Observation (VIDEO)
William Blackwell, Massachusetts Institute of Technology/Lincoln Lab
Detection of artifacts and transients in Earth Science observing systems with machine learning
Yehuda Bock, University Of California, San Diego
Edge Intelligence for Hyperspectral Applications in Earth Science for New Observing Systems
James Carr, Carr Astronautics Corporation
Intelligent Long Endurance Observing System
Meghan Chandarana, NASA Ames Research Center
An Analytic Collaborative Framework for the Earth System Observatory (ESO) Designated Observables
Arlindo da Silva, NASA Goddard Space Flight Center
Innovative geometric deep learning models for onboard detection of anomalous events
Yulia Gel, University Of Texas, Dallas
A hosted analytic collaborative framework for global river water quantity and quality from SWOT, Landsat, and Sentinel-2
Colin Gleason, University Of Massachusetts, Amherst
DTAS: A prototype Digital Twin of Air-Sea interactions
Alison Gray, University Of Washington, Seattle
GEOS Visualization And Lagrangian dynamics Immersive eXtended Reality Tool (VALIXR) for Scientific Discovery
Thomas Grubb, NASA Goddard Space Flight Center
Stochastic Parameterization of an Atmospheric Model Assisted by Quantum Annealing
Alexandre Guillaume, Jet Propulsion Laboratory
Thematic Observation Search, Segmentation, Collation and Analysis (TOS2CA) System
Ziad Haddad, Jet Propulsion Laboratory
Towards a NU-WRF based Mega Wildfire Digital Twin: Smoke Transport Impact Scenarios on Air Quality, Cardiopulmonary Disease and Regional Deforestation
Milton Halem, University of Maryland Baltimore County
A scalable probabilistic emulation and uncertainty quantification tool for Earth-system models
Matthias Katzfuss, Texas A & M, College Station
Development of a next-generation ensemble prediction system for atmospheric composition
Christoph Keller, Universities Space Research Association, Columbia
Open Climate Workbench to support efficient and innovative analysis of NASA’s high-resolution observations and modeling datasets
Huikyo Lee, Jet Propulsion Laboratory
Ecological Projection Analytic Collaborative Framework (EcoPro)
Seungwon Lee, Jet Propulsion Laboratory
An Intelligent Systems Approach to Measuring Surface Flow Velocities in River Channels
Carl Legleiter, USGS Reston
Reproducible Containers for Advancing Process-oriented Collaborative Analytics
Tanu Malik, De Paul University
Terrestrial Environmental Rapid-Replicating Assimilation Hydrometeorology (TERRAHydro) System: A machine-learning coupled water, energy, and vegetation terrestrial Earth System Digital Twin
Craig Pelissier, Science Systems And Applications, Inc.
Knowledge Transfer for Robust GeoAI Across Space, Sensors and Time via Active Deep Learning
Saurabh Prasad, University Of Houston
Integration of Observations and Models into Machine Learning for Coastal Water Quality
Stephanie Schollaert Uz, NASA Goddard Space Flight Center
3D-CHESS: Decentralized, distributed, dynamic and context-aware heterogeneous sensor systems
Daniel Selva, Texas A&M Engineering Experiment Station
Kernel Flows: emulating complex models for massive data set
Jouni Susiluoto, Jet Propulsion Laboratory
A New Snow Observing Strategy in Support of Hydrological Science and Applications
Carrie Vuyovich, NASA Goddard Space Flight Center
SLICE: Semi-supervised Learning from Images of a Changing Earth
Brian Wilson, Jet Propulsion Laboratory
Coupled Statistics-Physics Guided Learning to Harness Heterogeneous Earth Data at Large Scales
Yiqun Xie, University of Maryland, College Park

—

Pixels for Public Health: Analytic Collaborative Framework to Enhance Coastal Resiliency of Vulnerable Populations in Hampton Roads, Virginia
Thomas Allen, Old Dominion University Research Foundation

Increasing coastal flooding owing to sea level rise and climate-change drivers of extreme precipitation combine to threaten vulnerable communities, posing imminent as well as evolving dynamic threats given sea-level rise and climate changes. A diversity of social, economic and cultural vulnerabilities, and coping capacities exist across coastal communities, yet decision support systems for response and planning alike are disparate and siloed. Vulnerable urban communities contend with the legacy of racial segregation and discrimination, with manifest disparities leading to unmet health related social needs (HRSNs) such as access to basic resources and health care to treat higher hazard exposures. Coastal cities such as Norfolk, VA, exhibit increased tidal, rainfall, and storm surge flooding owing to sea level and climate changes exacerbated by subsidence. Cities lack high-resolution compound flood forecasting and have disparity in exposure and inequitable outcomes. To address these hazards and proactively mitigate future vulnerability, this project proposes an innovative analytic collaborative framework (ACF) and a Digital Twin approach. We seek to more fully utilize Earth observations and computing to provide improved predictive decision support tools. We propose to design and demonstrate to an operational state a system linking an Earth Observation (EO) data source (Virginia Open Data Cube), a socio-spatial-health information “Digital Neighborhood” (DN) (Hampton Roads Biomedical Research Consortium), hydrodynamic models, and in situ flood sensor network. This Digital Twin approach will connect observational and physical environmental domains with human vulnerability. Although individual technologies are fairly mature, they remain siloed and uneven, with limited interoperability, and challenging to operationalize and innovate predictive models or ask what-if scenarios. EO data are leveraged with the Virginia Data Cube using Landsat, Sentinel, MODIS (and forthcoming NISAR) missions to improve hydrodynamic model prediction of flood events by calibration from satellite, autonomous unmanned systems, and linking to smart community IoT flood sensors. This project increases technology-readiness levels system of systems and emphasizes the catalyst role of geospatial integration of flood modeling, predictive analytics, and place-based community vulnerability. Dynamic uncertainties in sea level and flood processes are also analyzed to better plan for worsening future threats. Climate models are used to reconstruct historic flood attribution and estimate future probabilities of flooding, differentiating tidal, surge, extreme rainfall, and compound flood events. A GeoHub is developed to implement the framework and lay a foundation for adoption by flood forecasters, planners, health practitioners and emergency managers, reflecting growing recognition of the need for convergence of modeling and stakeholder engagement and participation (Baustian et al. 2020; Hemmerling et al. 2020.) The hub provides a resource of open science data, models and algorithms for fellow scientists and practitioners. We extend this portal to build stakeholder adoption in a proposed third year – drawing end-users to learn and train in a functional exercise simulation. The hub serves as a resource for forecasters, emergency managers, and hazard mitigation planners that integrates diverse vetted data, geospatial tools, and predictive spatial analytics of flood exposure with improved granularity for human health. The resulting technology will demonstrate new analytical and collaborative approaches for modeling, IoT sensor, and EO data integration, synergy between physical earth science and social science Digital Twins, and practical tools for timely and equitable flood response and planning.

—

Digital Twin Infrastructure Model for Agricultural Applications
Rajat Bindlish, NASA Goddard Space Flight Center

Fully coupled Earth system models are fundamental to timely and accurate weather and climate change information. These process-based estimates are then used as critical inputs in a variety of end-use earth system environments. For example, agricultural models require inputs of precipitation, temperature and moisture conditions along with historical data as key weather parameters to develop estimates of field operations schedules, from seeding to harvesting, with fertilizer and herbicide treatments in-between. Unexpected extreme weather and climate change has a major socioeconomic impact on agriculture and food security. Though important, integration of such end-use oriented and socio-economic models are currently not adequately captured in Earth system modeling efforts and need to be part of the vision for ‘Digital Twin’. Here we propose the development of such a prototype by integrating land/hydrology process models, agricultural system models, and remote sensing information.

The aim of this project is to develop an agriculture productivity modeling system over Continental United States as an example of incorporating representations of infrastructure oriented process, so that our understanding, prediction, and mitigation/response to Earth system process variability can be characterized. Crop growth, yield, and agricultural production information is critical for commodity market, food security, economic stability, and government policy formulation. The ability to model agriculture is important for NASA Earth Sciences Program towards its goals for providing timely and relevant information for “science decisions”. The current USDA National Agricultural Statistics Service (NASS) crop yield is mainly based on agricultural survey which relies on a limited number of samples though linear regression models based on remote sensed NDVI and NASS historical yield records are also used as complementary. Further improvement of crop yield estimation and better crop progress monitoring are needed.

This proposal will target Advanced and Emerging Technology (AET) by developing a modeling framework for Agricultural Decision Support System (ADSS). The specific goals of the proposal are to (1) establish a digital twin framework that enables the NASA remote sensing data products and land surface model products to be directly coupled with or assimilated into the crop growth model; (2) leverage the information from high-resolution remote sensing inputs (precipitation, temperature, solar radiation, soil moisture, snow water equivalent, ground water, leaf area index) through the NASA Land Information System (LIS) to estimate land surface variables (water and energy fluxes) at daily time scales; (3) implement crop growth models (APEX, RZWQM2 and DSSAT) to estimate crop growth stages, biomass, and crop yield under forecasted weather and projected climate conditions; (4) implement Bayesian Neural Network (BNN) model to predict final county level crop yield by using NASS historical yield reports, yield, biomass, and phenology outputs from APEX/ DSSAT model, and other variables from LIS; (5) develop tools to conduct ‘what if’ investigations to provide agricultural guidance; and (6) develop capability for disseminating non-confidential crop progress data, biomass and crop yield maps using an operational web application. The proposed environment will be developed in an interoperable manner to allow for future interactions with other ESDT efforts.

—

Sensor-in-the-Loop Testbed to Enable Versatile/Intelligent/Dynamic Earth Observation (VIDEO)
William Blackwell, Massachusetts Institute of Technology/Lincoln Lab

There has been significant recent interest in sensor systems that are reconfigurable, i.e. where one or more of the spectral, spatial, radiometric, and geometric (i.e. viewing angle) properties of the sensor can be changed dynamically. However, no laboratory-based test resources exist to evaluate and optimize end-to-end performance in a realistic fashion. We develop in this proposed Advanced and Emerging Technology (AET) program a methodology and test approach to provide the capability for the scene measured by the sensor (or other sensors that are acting collaboratively) to inform how the sensor is configured in real time during the scene measurement. This innovative New Observing Strategy (NOS) has the capability to dramatically improve the resolution of the retrieved atmospheric fields in regions in which that improvement is most beneficial while conserving resources in regions where the atmospheric fields are relatively homogeneous and therefore free of significant high spatial frequency content. The use case for this proposal is a highly versatile scanning microwave atmospheric temperature profiling radiometer, where ALL of the sensor response functions (spectral, spatial, radiometric, and geometric) are dynamically reconfigurable.

The technology to be developed and evaluated as part of this program has two components: (1) a Radiometric Scene Generator and its associated control software, and (2) intelligent processing and configuration software that would run on the sensor itself to detect and react to changes in the observed scene by dynamically optimizing the sensor response functions. This approach would significantly improve upon current state-of-the-art simulation-only approaches for this evaluation by placing an actual sensor in the observing loop, where the effects of sensor transfer function errors, calibration uncertainty, and processing algorithm imperfections are fully present in the end-to-end system evaluation and would be highly representative of on-orbit performance.

A key aspect of this program to enable development, test, and evaluation of Versatile, Intelligent, and Dynamic Earth Observation (VIDEO) is the recent emergence of metamaterials for use in high-performance blackbody radiometric targets. These materials are very thin (~200 microns) and lightweight (tens of grams), allowing them to be easily scaled up to realize very large targets (> 1 m^2) to subtend an entire sensor field of regard during laboratory measurements. Furthermore, the thin planar structure of the metamaterials provides a relatively small thermal mass, thereby permitting the projection of thermal features with very high spatial frequency content into the sensor field of view at the subpixel level. We propose to adapt the metamaterials developed for the Miniaturized Microwave Absolute Calibration (MiniMAC) ACT-20 project (S. Reising, PI) for use here to produce a 50 cm x 75 cm (20” x 30”) Radiometric Scene Generator (RSG) operating near 54 GHz with very large thermal contrast at the subpixel level for a typical spaceborne microwave radiometer full-width-at-half-maximum (FWHM) beamwidth in the range of 1-3 degrees.

The RSG will be used to project spatial features into the radiometer field of regard that can be detected and acted upon by the intelligent processing algorithms. The intelligent processing algorithm will use feature detection and machine learning techniques to recognize regions of interest in the atmospheric scene and cause the sensor to react to the scene characteristics by changing the sensor response function.

The project is led by MIT Lincoln Laboratory, who will provide the VIDEO software toolkit and execute the sensor-in-the-loop tests in collaboration with Colorado State U., who will provide the Radiometric Scene Generator. Entry TRL is 3 and exit TRL is projected to reach 5 after Year 2.

—

Detection of artifacts and transients in Earth Science observing systems with machine learning
Yehuda Bock, University Of California, San Diego

Our proposal is responsive to AIST objective O2 under the Analytic Collaborative Frameworks (ACF) thrust area, addressing challenges in assimilating, manipulating and visualizing data associated with geodetic observing systems (GNSS and InSAR). We seek to create open-source software to provide a rich, interactive environment where machine learning (ML) models are used as collaborator to direct the attention of the human analyst to non-physical artifacts and real transient events that require interpretation. The proposed system will be realized through two coupled sub-systems: a novel “back-end” ML software called the Transient and Artifact Continuous Learning System (TACLS), and a significant upgrade to our “front end” interactive MGViz user environment, originally designed to view displacement time series and their underlying metadata, to now interact and display layers of spatiotemporal information.

Our uniqueness is our archive of thousands of artifacts and transients, and acquired expertise in creating calibrated and validated Earth Science Data Records (ESDRs) from thousands of GNSS stations and 30 years of data; these will be used to train the ML algorithms. ESDRs include crustal deformation and strain rate fields, which reflect steady-state and transient motions due to postseismic processes, episodic tremor and slip – ETS, volcanic inflation, and mostly vertical motions due to other natural (tectonic, geomorphic) and anthropogenic processes (sea level rise, subsidence due to water extraction), and atmospheric precipitable water as a harbinger of extreme weather events. Our interactive environment will also be designed for displacement fields of higher precision and spatial resolution produced through GNSS/InSAR integration.

A primary bottleneck to the extraction of scientific insight from geodetic data is the challenge to separate non-physical artifacts and secular trends from scientifically relevant transients, which often requires costly and intensive manual intervention to achieve the most robust and accurate results. This process is often performed redundantly, inefficiently and inconsistently by groups of students and individual researchers, focused on a particular science problem. An example of a transient is slow slip deformation, which plays a crucial role in advancing our understanding of earthquake dynamics and hazards, indicating possible change of stress on the fault interface, triggered earthquake swarms or seismicity and release of accumulated elastic strain. Increasing evidence suggests that slow slip often precedes and possibly leads to the large earthquakes. As another example, GPS-based integrated water vapor estimates enable improved forecasting skill for extreme weather events, improved understanding of long-term water vapor trends, probable maximum precipitation, and retrospective analysis of weather events and watch/warning situational awareness involving extremes in precipitable water.

We will demonstrate the MGViz/TACLS system by two representative science test cases. The first will address transient tectonic signals and associated hazards in the subduction zones of the Pacific Rim with the participation of the NOAA/NWS Pacific Tsunami Warning Center (PTWC) in Hawaii. The second will track variations in atmospheric water vapor as precursors to weather events such as monsoons and atmospheric rivers to forecast flash flooding, with the participation of the National Weather Service’s (NWS) Weather forecasting Offices (WFOs) in southern California. We propose a third year to our project that will transfer the MGViz/TACLS system to PTWC and the WFOs. All software developed under this proposal will be deposited under the existing, publicly accessible MGViz repository. This software will be freely available to anyone to use with no restrictions other than those stipulated in the license. Our entry level is TRL 4 with an output level of TRL 6 at the end of year 2 and TRL 7 at the end of year 3.

—

Edge Intelligence for Hyperspectral Applications in Earth Science for New Observing Systems
James Carr, Carr Astronautics Corporation

We use the SpaceCube processor and the TRL-5 SpaceCube Low-power Edge Artificial Intelligence Resilient Node (SC-LEARN) coprocessor [1] powered by Google Coral Edge Tensor Processing Units (TPUs) to implement two AI science use cases in hyperspectral remote sensing. The first (daytime) application uses learned spectral signatures of clear-sky scenes to retrieve surface reflectance and therefore increase the efficiency of collecting land observations on our ~68% cloudy planet [2], which benefits Surface Biology and Geology (SBG) decadal survey objectives. The second (nighttime) application classifies artificial light sources after training against a catalog of lighting types. SC-LEARN was developed at GSFC for AI applications such as neural networks and is packaged in a small, low-power 1U CubeSat form factor. SC-LEARN will fly on STP-H9/SCENIC to the ISS with a Headwall Photonics HyperspecMV [3] hyperspectral imager. SCENIC is not dedicated to fixed objectives, rather investigators may reprogram SCENIC for experiments and demonstrations. In Year-1, we develop and test our science applications in a testbed environment on development boards and with actual SpaceCube hardware. We use SCENIC in Year-2 as an early flight opportunity to test and demonstrate our science application cases. We also take advantage of hyperspectral datasets from the TEMPO observatory, which will be available in Year-2, for use in further demonstrations in the testbed. Two flight builds for SCENIC are foreseen to allow us to take advantage of flight experience to refine the applications. Our targeted outcome is two fully developed science cases implemented on the innovative SC-LEARN AI platform and tested in space, with lessons learned, best practices, and an AI framework code to share. The framework code serves as a template for prototyping hyperspectral science applications for SC-LEARN that will enable others to prepare their applications more efficiently for porting onto SC-LEARN or similar hardware. By working two science cases, we assure that it is not overly specific towards a single target application. We aim to build a community of practice for AI developers in the Earth Science community. In doing so, we advance NOS objectives by advancing state-of-the-art technology to enable systems where data volume or latency considerations require Edge Intelligence.

Our industry-government team is led by PI Dr. James Carr. He is PI of the successful StereoBit AIST-18 project. It is a Structure from Motion (SfM) application on SpaceCube that tracks motions of clouds in 3D [4]. Government partners are Dr. Christopher Wilson, Associate Branch Head of the Science Data Processing branch (Code 587), and Dr. Joanna Joiner, a NASA Earth Scientist. Dr. Wilson is the payload lead for STP-H9/SCENIC and has an established working relationship with Dr. Carr from StereoBit. Drs. Carr and Joiner belong to the TEMPO Science Team. Dr. Joiner is the author of the first science case, which has been demonstrated on conventional ground computers with data from the Hyperspectral Imager for the Coastal Oceans (HICO) instrument on ISS [5, 6]. Dr. Carr has proposed the second science case as a “green paper” activity for TEMPO [7] to take advantage of otherwise unutilized nighttime hours. Classification of nightlights has implications for the quality of the dark night sky on Earth, ecology, and human health. Dr. Virginia Kalb, Black Marble PI, is a Collaborator. Justin Goodwill (GSFC) is part of our team and the lead AI-application hardware/software developer for SC-LEARN on SCENIC.

[1] https://digitalcommons.usu.edu/smallsat/2021/all2021/185/

[2] https://doi.org/10.1175/BAMS-D-12-00117.1

[3] https://cdn2.hubspot.net/hubfs/145999/June%202018%20Collateral/HyperspecMV0118.pdf

[4] 10.1109/IGARSS39084.2020.9324477

[5] https://doi.org/10.31223/X5JK6H

[6] https://doi.org/10.1117/12.2534883

[7] https://lweb.cfa.harvard.edu/atmosphere/publications/TEMPO-Green-Paper-Aug2021.pdf

—

Intelligent Long Endurance Observing System
Meghan Chandarana, NASA Ames Research Center

Existing satellites provide coarse-grained (~ 10 km^2/pixel) data on surface features and column gas concentrations of climate-relevant trace gases. While these data can be supplemented by fine-pointing satellites and aircraft, the spatial and temporal resolution available is not sufficient to observe stochastic, ephemeral events that take place between observations. Emerging HALE UAS can operate for months at a time and loiter over targets to provide continuous daylight geostationary-like observations but must be integrated with existing satellites. We propose the development of the Intelligent Long Endurance Observing System (ILEOS), a science activity planning system. This Advanced and Emerging Technology (AET) proposal directly responds to Solicitation Objective O1: Enable new observation measurements and new observing systems design and operations through intelligent, timely, dynamic, and coordinated distributed sensing. ILEOS will help scientists build plans to improve spatio-temporal resolution of climate-relevant gases by fusing coarse-grained sensor data from satellites and other sources (e.g., terrain, wind forecasts), and plan HALE UAS flights to obtain finer-grain (high spatio-temporal) data. ILEOS will also enable observations for longer periods and of environments not accessible through in-situ observations and field campaigns.

ILEOS consists of 3 components: the Target Generation Pipeline (Targeter), the Science Observation Planner (Planner), and Scientists’ User Interface (Reporter). The Targeter identifies candidate target scenes for HALE UAS-mounted instrument observations. The Targeter leverages Science SME domain knowledge to fuse available coarse-grained data from satellites and other sources into pixel value maps, used to generate scenes and scene values. The Planner will use automated planning and scheduling technology to automatically generate a flight plan to observe the best identified target scenes while enforcing all operating constraints for a HALE UAS. The Reporter provides the user interface for science mission planners to visualize pixels, scenes, flight plans, and other data such as clouds and winds. The Reporter will allow mission planners to request explanations detailing pixels, scenes, and the underlying constraints and assumptions used to schedule measurements over scenes and generate a flight plan. The Reporter also allows scientists to modify constraints as necessary to adjust science objectives.

Diverse science use cases will be developed based on characterizing NO2 and CH4 emissions in different parts of the world (e.g., permafrost thaw methane in the Arctic, high-altitude lightning-induced NOx (=NO+NO2)), and anthropogenic sources such as cities and oil platforms in the Gulf of Mexico) that benefit from sustained observations by emerging HALE UAS technologies. The first 2 years of the effort development will be performed in parallel ‘spirals’. The first spiral will concentrate on first science domain; the second domain effort will benefit from lessons learned from the first domain. In the 3rd year we will infuse ILEOS into NASA’s heritage Mission Tool Suite (MTS) application, used by the Airborne Sciences Program (ASP), and with NASA’s ETM project.

The project will advance the state of the art by:

– Developing a Target Generation Pipeline controllable by scientists to generate observation objectives for HALE UAS mounted instruments.

– Developing a Science Observation Scheduler controllable by scientists to generate flight plans for HALE UAS.

– Demonstrating the Scientists User Interface capability to task fleets of HALE UAS.

– Enabling a NOS employing a combination of existing satellites and instruments combined with new HALE UAS-borne instruments to complement coarse-grained climate-relevant data with targeted high spatio-temporal coverage.

The proposal team leverages technologists, human factors experts, science subject matter experts and instrument designers.

—

An Analytic Collaborative Framework for the Earth System Observatory (ESO) Designated Observables
Arlindo da Silva, NASA Goddard Space Flight Center

MOTIVATION: NASA’s Earth System Observatory ground breaking observations will provide critical measurements to address societal relevant problems in climate change, natural hazard mitigation, fighting forest fires, and improving real-time agricultural processes. Central to the ESO vision is the notion of Open-Source Science (OSS), a collaborative culture enabled by technology that promotes the open sharing of data, information, and knowledge aiming to facilitate and accelerate scientific understanding, and the agile development of applications for the benefit of society. The larger vision of an Earth System Digital Twin (ESDT) set forth in the AIST solicitation is the linchpin for the implementation of OSS at NASA. It calls for integrated Earth science frameworks that mirror the earth by a proxy digital construct that includes high resolution earth system models and data assimilation systems along with integrated set of analytic tools to enable the next generation of science discoveries and evidence-based decision making. Among the many applications of these frameworks is the design of future observing systems, from selection of space mission architectures, to the exploration of science and societal applications, well before launch. Thus, realistic simulations of the future measurements become a necessity if a mature OSS environment is to become available early in the lifetime of the mission, this way maximizing the impact and societal benefits of NASA observations.

OBJECTIVES: Our ultimate vision is to develop an Analytic Collaborative Framework for ESO missions, based on realistic, science-based observing system simulations and the Program of Record (PoR). Tying it all together is a cloud based cyberinfrastructure that will enable each uniquely designed satellite in the Earth System Observatory to work in tandem to create a 3D, holistic view of Earth. In this proposal, we lay the technological ground work for enabling such a vision.

TECHNICAL APPROACH: Our approach consists on the 3 main interconnected building blocks:

1) Cloud-optimized representative datasets for ESO missions and the PoR to serve as basis for developing and prototyping an Analytic Collaborative Framework.

2) An Algorithm Workbench for enabling experimentation and exploration of synergistic algorithms not only for instruments within a mission, but also including the PoR and other ESO missions.

3) A series of concrete Open-Source Science demonstrations including use cases that span science discovery and end-user applications with direct societal impact.

While our ultimate goal is include all of the main missions comprising the Earth System Observatory, in our initial 2 years we will focus on AOS and SBG, two missions for which specific synergisms have been identified in a recent workshop.

RELEVANCE: The proposed effort directly addresses the AIST solicitation by developing a cloud-based Analytic Collaborative Framework for the ESO missions. In doing so it enables investigative technologies to facilitate ”what-if” investigations inherent to ESDT systems. Our framework will enable global cloud-resolving OSSEs that significantly contributes to the exploration of synergistic new algorithms, enables trade studies during mission development, and provides a transparent mechanism for early engagement of the applications and scientific communities at large. In addition, such capabilities make a direct contribution to NASA’s emerging Open-Source Science Initiative.

—

Innovative geometric deep learning models for onboard detection of anomalous events
Yulia Gel, University Of Texas, Dallas

Artificial intelligence (AI) tools based on deep learning (DL) which are proven to be highly successful in many domains from biomedical imaging to natural language processing, are still rarely applied not only for onboard learning of Earth Science processes but for Earth Science data analysis in general. One of the key obstructing challenges here is limited capability of the current modeling tools to efficiently integrate time-dimension into the learning process and to accurately describe multi-scale spatio-temporal variability which is ubiquitous in most Earth Science phenomena. As a result, such DL architectures often cannot reliably, accurately and on time learn many salient time-conditioned characteristics of complex interdependent Earth Science systems, resulting in outdated decisions and requiring frequent model updates.

To address these challenges, we propose to fuse two emerging directions in time-aware machine learning, namely, geometric deep learnining (GDL) and topological data analysis (TDA). In particular, GDL offers a systematic framework for learning non-Euclidean objects with a distinct local spatial structure such as exhibited, for instance, by the smoke plumes. As a result, GDL allows us for more flexible modeling of complex interactions among entities in a broad range of Earth Science data structures, including multivariate time series and dynamic networks. In turn, TDA yields us complementary information on the time-conditioned underlying intrinsic Earth Science system organization at multiple scales.

The ultimate goal of the project is to develop efficient, systematic, and reliable learning mechanisms for the onboard exploration by explicitly integrating both space and time dimensions into the knowledge representation at multiple spectral and spatial resolutions. Using radiance data from NASA’s GeoNEX project [Nemani et al. 2020] and High-End Computing (HEC) systems, we will address the following interlinked tasks:

T1. Develop time-aware DL architectures with shape signatures from multiple spectral bands for semi-supervised onboard learning of multi-resolution smoke observations.

T2. Detect smoke plumes and other anomalies in multi-resolution observations with time-aware DL with a fully trainable and end-to-end multipersistence module.

T3. Investigate the uncertainty in topological detection of smoke plumes.

T4. Improve the efficiency of GDL for onboard applications.

In addition to developing the novel early-stage technology, topological and geometric DL methods for onboard exploration, we will disseminate all new topological and geometric DL tools in the form of publicly available Python packages. We will maintain all software in a public GitHub repository and use GitHub’s built-in issue JIRA system for tracking issues and collaborative software management.

—

A hosted analytic collaborative framework for global river water quantity and quality from SWOT, Landsat, and Sentinel-2
Colin Gleason, University Of Massachusetts, Amherst

NASA’s soon to be launched SWOT mission promises a sea change for terrestrial hydrology. Principally, SWOT’s reservoir/lake volume change observations and SWOT’s derived river discharge product are each unprecedented in terms of their resolution, scale, and frequency. This water quantity information is among the primary reasons for SWOT’s development and launch. SWOT water quantity algorithms and products are well documented in the literature, and a robust plan is in place to produce these products globally.

Rivers are also more than just water quantity: the quality of river water is essential knowledge for ecosystems and society. There are currently no plans to assess river water quality from SWOT data. However, optical image analysis has a long history in hydrology and can detect river water quality, especially information regarding river sediment concentrations and algal blooms. However, hydrology faces a knowledge gap: computer vision has advanced separately from image analysis as practiced by hydrologists, and computer vision capabilities as practiced in computer science are far more accurate, efficient, and robust for extracting information from imagery than are traditional hydrologic measurement techniques. Many important hydrologic image-based tasks such as water surface classification, river and lake dimensional measurements, and water quality quantification (e.g. sediment and algae) could potentially benefit from improvements as practiced in computer vision.

The launch of SWOT therefore presents a tremendous opportunity to combine SWOT and optical data in a single Analytic Collaborative Framework (ACF) to simultaneously co-predict river water quantity and quality at a scale that is not currently possible. This advance is non-trivial: multiplying the mass flux of water (estimated via SWOT) by its constituent concentrations (estimated via optical data) provides a constituent mass loading (e.g. sediment, algae) in the world’s river systems and therefore a direct benefit to society and ecosystems.

This project seeks to integrate data from the soon-to-be launched SWOT mission with traditional optical imagery into “a common platform to address previously intractable scientific and science-informed application questions” as solicited for an ACF. Specifically, we will build on an existing ACF already in development as part of the SWOT Science Team named ‘Confluence.’ Confluence currently seamlessly integrates with SWOT data (as solicited) but has a very narrow scope and mission: to analyze SWOT data and deliver the parameters needed for the SWOT river discharge product to NASA’s JPL. In addition, Confluence is only available to its developers and not the broader community. Confluence also does not currently integrate optical data into its analysis framework and has no ability to predict water quality.

We argue that the ability to co-predict river discharge and river water quality is currently intractable, and that our proposed ACF will make it tractable. The outputs of our ACF will be 1) a unique seamless data environment for SWOT and optical data, 2) an extended library of algorithms for water quantity, 3) a novel library of computer vision algorithms for water quality, and 4) an automated computational environment to produce river water quantity and quality products, globally. As an AET, we plan to transition our ACF to PO.DAAC in year 3, allowing our ACF and its outputs to reside ‘alongside’ the SWOT mission products (PO.DAAC is the designated NASA archive for SWOT) for ease of discovery and use by the hydrology community. Our proposed ACF should dramatically and uniquely advance our understanding of the world’s river water quality and quantity, informing the management, use, and study of rivers as it is transitioned to PO.DAAC for year 3 and beyond.

—

DTAS: A prototype Digital Twin of Air-Sea interactions
Alison Gray, University Of Washington, Seattle

Boundary layer interactions between oceanic and atmospheric surfaces are essential for predicting long-term climatic changes and the increasing occurrence of extreme weather events. These exchanges are critical indicators of climatic changes, especially regarding floods, droughts, storms, and hurricanes, and are increasingly studied to better understand extreme weather events. However, they are also a significant source of uncertainty in climate models as they are notoriously hard to directly observe and often involve expensive instrumentation, which introduces scaling difficulties. Traditionally, modelers have used various kinds of numerical physics-based parameterizations to understand these linkages. While these are helpful, their computational and memory needs make it inefficient to incorporate advanced parameterizations into larger climate models or run uncertainty quantification analyses. Moreover, these models tend to be one-directional, as they are generally initialized with a boundary layer condition to assess the outcome.

In this research project, we propose to develop a hybrid physics-informed artificial intelligence model that ingests several existing flux estimates and observational data products to train against flux estimates computed from measurements collected by Saildrones, state-of-the-art autonomous platforms for simultaneous ocean-atmosphere observation. Hybrid models have been increasingly used in several domains as they can remarkably decrease the computational effort and data requirements. While data-driven models can handle high-dimensional complex systems and provide rapid inference, they are often considered “black box” models with poor interpretability, and they typically do not extrapolate well beyond the training data. Our novel hybrid approach of combining physics-based models with neural networks will overcome these deficiencies. It will provide rapid scalability and fast inference of data-driven models and has the advantages of traditional numerical physics-based models regarding data efficiency, interpretability, and generalizability.

We will use the hybrid model for two primary purposes: (1) to ascertain the spatiotemporal uncertainty of existing flux measurements compared to those computed from Saildrone observations; and (2) to find the possible combinations of near-real-time data of existing flux products (satellite-based and reanalysis) and observational data of oceanic and atmospheric variables (remotely-sensed and in situ) to obtain the best estimates for a given spatiotemporal slice. The near-real-time aspect of the hybrid model enables the development of a “Digital Twin” of the boundary layer air-sea interactions. We will complete the framework with a front-end visual analysis system, which lets the user perform several actions: 1) identify the possibility space of future predictions based on a set of parameter choices (“what-if” investigations); 2) identify the parameter sweep of initial conditions for a given future prediction; and 3) perform sensitivity analysis of parameters for different scenarios. The model developed in this research investigation will focus on the Gulf Stream region. However, this is the first step towards building a Digital Twin for the Planetary Boundary Layer, which would be game-changing for scientists and decision-makers looking to advance our understanding of weather and climatic changes, to better forecast extreme weather events such as floods, hurricanes, and marine heatwaves, and to manage better and mitigate changes to ocean ecosystems.

—

GEOS Visualization And Lagrangian dynamics Immersive eXtended Reality Tool (VALIXR) for Scientific Discovery
Thomas Grubb, NASA Goddard Space Flight Center

Traditionally, scientists view and analyze the results of calculated or measured observables with static 1-dimensional (1-D), 2-D or 3-D plots. It is difficult to identify, track and understand the evolution of key features in this framework due to poor viewing angles and the nature of flat computer screens. In addition, numerical models, such as the NASA Goddard Earth Observing System (GEOS) climate/weather model, are almost exclusively formulated and analyzed on Eulerian grids with fixed grid points in space and time. However, Earth Science phenomena such as convective clouds, hurricanes and wildfire smoke plumes move with the 3-D flow field in a Lagrangian reference frame, and it is often difficult and unnatural to understand these phenomena with data on Eulerian grids.

We propose to develop a scientific exploration and analysis mixed augmented and virtual reality tool with integrated Lagrangian Dynamics (LD) to help scientists identify, track, and understand the evolution of Earth Science phenomena in the NASA GEOS model. VALIXR will:

– Enhance GEOS to calculate Lagrangian trajectories of Earth Science phenomena and output budget terms (e.g., momentum) and parcel attributes (e.g., temperature) that describe their dynamics

– Enhance the NASA open source eXtended Reality (XR, i.e., AR and VR) tool, the Mixed Reality Exploration Toolkit (MRET) developed by the PI, to visualize and animate GEOS fields as well as initialize and track LD features (i.e., parcel trajectories)

This project has wide applicability to the Earth Sciences, from analysis of smoke plumes moving around the globe, to organized convection in hurricanes, to eddies associated with the polar vortex, and more. VALIXR will provide Earth scientists:

– Enhanced scientific discovery of key phenomena in the Earth system through the combination of advanced visualization and quantitative LD with NASA models and data

– An immersive, interactive, and animated visualization of GEOS fields and particle trajectories to allow scientists to intuitively initialize LD for subsequent GEOS model runs

– Intuitive initialization, manipulation and interaction with GEOS data and trajectory paths through the use of XR

Leveraging the NASA open source MRET tool and integrating it with a generalized open-source point cloud system has huge applicability to any Earth Science domain. MRET provides an open-source foundation for collaborative XR with dispersed groups working together in a common 3D space that can combine DEM terrains, 3D engineering models and point clouds, including LIDAR data. This will enable the following benefits:

– Visualizing enormous scientific point clouds as one dataset. Current XR tools for visualizing point clouds are deficient. Problems include losing precision; requiring all points to fit on the GPU; poor visual quality for data from multiple sources; and requiring commercial licenses.

– Enabling collaboration. Remote collaboration is a powerful benefit of XR, allowing geographically separated scientists to work together and is particularly relevant given the emphasis on teleworking.

– Improving Analysis. XR enables improved ways of looking at science data. For example, using another tool by the PI, astrophysicists examined eight nearby young moving star groups and found seven new likely disk-hosting members. None of these objects had been identified via clustering algorithms or other tools in the literature.

– Improving Interaction. Not only does XR provide superior visualizations, but interaction is more intuitive. For example, if scientists need to specify a point in 3D space, traditionally this required specifying a point in three 2D coordinate systems (XY, XZ, and YZ) which is cumbersome. Specifying a location in a 3D environment is much easier as you just point at the location.

This proposal will address the development of an Earth System Digital Twin that provides both a scientific discovery tool and a model analysis and improvement tool.

—

Stochastic Parameterization of an Atmospheric Model Assisted by Quantum Annealing
Alexandre Guillaume, Jet Propulsion Laboratory

Despite the continuing increase of computing power, the multiscale nature of geophysical fluid dynamics implies that many important physical processes cannot be resolved by traditional atmospheric models. Historically, these unresolved processes have been represented by semi-empirical models, known as parameterizations. Stochastic parameterization is a method that is used to represent subgrid-scale variability. One of the reasons stochastic parameterization is necessary is to compensate for computing limitations. Independently, quantum annealing (QA) has emerged as a quantum computing technique and is now commercially available. This technique is particularly adapted to optimize or solve machine learning problems defined with binary variables on a regular grid.

Our goal is to create a quantum computing framework to characterize a stochastic parameterization of the boundary layer clouds constrained by remote sensing data. Using the formal similarity of both, the stochastic parameterization and the quantum hardware, to a regular lattice known as the Ising model, we can take advantage of the quantum computing efficiency.

First, we will implement a Restricted Boltzmann Machine (RBM) using a quantum annealer to learn the horizontal distribution of clouds as measured by Moderate Resolution Imaging Spectroradiometer (MODIS). Secondly, we will use this Machine Learning method to determine the parameters of a stochastic parameterization of the stratocumulus cloud area fraction (CAF) when it is coupled to the dynamics of large scale moisture. We will use the stochastic parameterization developed by Khouider B.and Bihlo, A. in 2019 to model the boundary layer Clouds and stratocumulus phase transition regimes.

The main outcome of the proposed work will be the delivery of a quantum computing framework that will improve upon the current state of the art of the conventional computing framework and will enable the full characterization an atmospheric stochastic parameterization using remote sensing data. Until now, it has been too computationally expensive to retrieve dynamically the parameters of the local lattice describing the CAF of stratocumulus when it is coupled to the large-scale moisture evolution. The innovation here is to use the quantum annealer to sample from the Gibbs distribution more efficiently than a conventional Markov Chain Monte-Carlo (MCMC) would. Numerical experiments have shown that quantum sampling-based training approach achieves comparable or better accuracy with significantly fewer iterations of generative training than conventional training. We emphasize that the current effort is a rare example of a problem that can be a solved with current quantum technology. This should be contrasted with most real life optimization problems that are usually too big to be solved on a quantum annealer. Current quantum annealer chips, e.g. D-Wave’s, already has a number of quantum bits that is commensurate with the number of lattice cells in the current formulation of the cloud stochastic model.

—

Thematic Observation Search, Segmentation, Collation and Analysis (TOS2CA) System
Ziad Haddad, Jet Propulsion Laboratory

Over the years, AIST has supported the development of a large and diverse set of science software resources – technical algorithms, data ingestion and processing modules, visualization capabilities, cloud computing, immersive environments – for the analysis and representation of science data and scientific results for different instruments, missions and applications. These capabilities were mostly tailored for a particular mission or even a particular set of users such as mission personnel, science teams or singular researchers. This proposal is to remedy the fact that we still do not have a practical framework for inter- nor intra- coordination and collaboration to promote and reuse these developed and refined resources, particularly to search for, collate, analyze and visualize data relevant to the Earth System Observatory (EOS) investigations.

The best analyses of a phenomenon typically require identifying scenarios of specific interest in the observations, often in nearly-coincident nearly-simultaneous data from different observing systems. For NASA’s current Program of Record (PoR) and incipient EOS, the need for this ability to identify, collate and analyze data from different sources – including different missions – that are relevant to a single user-specified scientific process or phenomenon is all the more pressing.

TOS2CA is a user-driven, data-centric system that can identify, collate, statistically characterize and serve Earth system data relevant to a given phenomenon relevant to EOS. Designed as a multi-discipline analytic collaborative framework (ACF), TOS2CA will not only facilitate the collation and analysis of data from disparate sources, it will also make it possible for scientists to establish science-traceability requirements, quantify detection thresholds, define uncertainty requirements and establish data sufficiency to formulate truly innovative missions. To do this, TOS2CA will allow one to characterize the joint distribution of several variables, and quantify and visualize how well that joint distribution would be sampled from different orbits over different durations.

As an ACF, the components of TOS2CA will include: 1) a user-driven thematic data collector; 2) a statistical analyzer; and 3) a user-friendly visualization and data exploration toolkit. One of the first objectives of TOS2CA is to develop a means for efficiently identifying, managing and utilizing relevant PoR datasets – typically massive, heterogenous and with varied access mechanisms – for analysis. TOS2CA will streamline and encapsulate the ingestion of these data in support of Earth System Observatory analyses, maintaining data fidelity, provenance and repeatability of the data collation, and providing a realistic quantitative evaluation of results.. The data collation will be developed by adapting software already developed for the joint NASA-ESA “Multi-mission Analysis Platform Project” collaboration for the eco-systems community.

The first component of TOS2CA is a data curation service with an ontological approach to identify data relevant to a user-defined phenomenon, and an information extraction and collation module made very efficient by systematic use of metadata records stored in NASA’s Common Metadata Repository (CMR). The efficiency of this data curation approach is driven by the definition of the specific Earth Science phenomenon. A vocabulary that will be defined by subject matter experts from GCMD keywords, will allow users to formulate a phenomenon so that it is mathematically supported on a bounded topological subset of space-time. The connected components of this subspace will then allow TOS2CA to break the curation service down chronologically, and thereby drastically reduce the number of potentially relevant data granules that need to be interrogated for each connected component.

—

Towards a NU-WRF based Mega Wildfire Digital Twin: Smoke Transport Impact Scenarios on Air Quality, Cardiopulmonary Disease and Regional Deforestation
Milton Halem, University of Maryland Baltimore County

Recent persistent droughts and extreme heatwave events over the Western states of the US and Canada are creating highly favorable conditions for mega wildfires that are generating broad regions of deforestation. Smoke from these Western wildfires, which depend on their atmospheric states, their intensity, and the vegetation fueling them, can be observed in distant cities and towns over the Eastern US, significantly affecting the air quality of these communities and leading to adverse human health effects, such as increased Covid-19 morbidity cases owing to the smoke as well as respiratory and smoke-related heart disease. Such mega wildfire conditions are expected to continue occurring globally with increasing frequency and intensity over forested regions as reported by the IPCC AR6.

Our goal is to develop and implement a Regional Wildfire Digital Twin (WDT) model with a sub-km resolution to enable the conduct of mega wildfire smoke impact scenarios at various spatial scales and arbitrary locations over N. America. WDT will provide a valuable planning tool to implement parameter impact scenarios by season, location, intensity, and atmospheric state. We will augment the NASA Unified WRF (NUWRF) model with an interactive locality fueled SFIRE parameterization (SFIRE) coupled to GOCART, CHEM and the HRRR4 physics. We will implement a data-driven, near time continuous, data assimilation scheme for ingesting and assimilating mixed boundary layer heights, cloud heights, and precipitation from a streaming sensor web of radars, ceilometers and satellite lidar observational systems into a nested regional WDT model. We will accelerate the NUWRF model performance to enable high resolution by emulating the microphysics and GOCART parameterizations with a deep dense machine learning progressive neural net architecture that can track the penetration of wildfire smoke into the planetary boundary layer. The SFIRE model will be a unique contribution to NUWRF, fully enabling the interaction of smoke aerosols with observed clouds, the microphysics precipitation, convection and the GOCART Chem, currently unavailable in other fire forecasting models.

This proposal builds upon the AI expertise of CO-Is gained from a prior AIST incubation grant to explore the potential impact of machine learning technologies to infer boundary layer heights from Lidar aerosol backscatter. We have successfully implemented a sensor web to simultaneously stream, process and assimilate ceilometer data from Bristol, PA, Catonsville, MD and Blacksburg, VA in real-time data into the WRF model. In addition, our CO-Is from U/CO at Denver, SJSU and Howard university are the developers and analysts of the SFIRE data products and will implement the SFIRE system into NUWRF as a fully interactive physics package.

This proposal is in keeping with the Science and Applications priorities of the Decadal Strategy for Earth Observations from Space 2018, which considers it most important to “ Determine the effects of key boundary layer processes on weather, hydrological, and air quality forecasts at minutes to sub-seasonal time scales”. Thus, if awarded, we expect to transform NUWRF into a wildfire digital twin model from a current TRL 3/4 to achieve a TRL 6/7 by the end of Year 2. We are proposing for Year 3, the testing of an extension of the scenario implementations of WDT to global located forested regions, allowing one-way feedback interactions between the GFS and Geos global models and the NUWRF based WDT, thereby laying the groundwork towards a fully global interactive digital twin model for operational analytics. We further expect to test the implementation of the SFIRE and Data Assimilation System in the EPA operational Community Model for Air Quality.

—

A scalable probabilistic emulation and uncertainty quantification tool for Earth-system models
Matthias Katzfuss, Texas A & M, College Station

We propose to develop a new, fully automated toolbox for uncertainty quantification in Earth-system models, to provide insight into the largest and most critical information gaps in Earth Sciences and to identify where potential future observations would be most valuable. We will achieve this goal by building a probabilistic emulator that is able to learn the non-Gaussian distribution of spatio-temporal fields from a small number of nature runs from Earth-system models, allowing users to, for example, discover and examine nonlinear dependence structures. In a significant step toward an Earth system digital twin (ESDT), the learned distribution can be a function of covariates (e.g., emissions scenarios), which allows interpolation between observed covariate values and running extensive what-if scenarios. Our proposed software tool is a crucial component in societal decision-making and in numerous NASA applications and missions, including studying climate projections, efficient observing system simulation experiments (OSSEs), uncertainty-quantification efforts for existing missions, and “what-if” investigations for potential future observing systems.

The project will consist of three broad goals. As the first goal, the team will develop statistical methodology for efficient estimation of and simulation from the probability distribution of multiple geophysical fields of interest. The approach, based on Bayesian transport maps, learns the spatio-temporal and multivariate dependence structure from a small- to moderate-sized ensemble of runs from an Earth-system model. The second goal is to implement the methodology in a user-friendly open-source software-package with documented examples. The third goal is to demonstrate use of the toolbox in assessing the probability distribution of important hydrological variables, as represented in Earth system models.

In order to characterize the uncertainty in spatio-temporal dependence that besets so many variables in the Earth sciences, we will specifically examine precipitation, snow water equivalent, and runoff in CMIP6 simulations to demonstrate how powerful a tool like this can be for the future NASA applications. Our technology provides flexibility to identify impactful characteristics of the water cycle under current conditions and its response to climate change, including nonlinear changes in time and under multiple emission scenarios. Further, the investigation will provide insight into hydrological processes and associated scales that are particularly uncertain, providing motivation for future observing needs for terrestrial hydrology, including upcoming missions like Surface Water and Ocean Topography (SWOT) and the mass change (MC) designated observable.

The proposed project addresses the advanced and emerging technology (AET) topic area in the AIST-21 solicitation. It will provide technologies and tools for use in ESDT. Specifically, the project will “enable running large permutations of what-if scenarios using large amounts of data and high-resolution and high-fidelity models” and comprises “statistical methodologies that optimize the computational efficiency of such what-if investigations.”

—

Development of a next-generation ensemble prediction system for atmospheric composition
Christoph Keller, Universities Space Research Association, Columbia

We propose to develop a next-generation modeling framework for the real-time simulation of reactive gases and aerosols in the atmosphere. The core innovations of this project are (a) the deployment of computationally efficient parameterizations of atmospheric chemistry and transport and (b) the development of generative models based on machine learning (ML) to predict model uncertainties. Combined, these innovations will enable improved and novel applications related to atmospheric composition, including probabilistic air quality forecasts at increased horizontal resolution, advanced use of satellite observations using ensemble-based data assimilation techniques, and scenario simulations for real-time event analysis. The proposed simulation capability will be developed and tested within the NASA GEOS Earth System Model (ESM), and its utility will first be demonstrated in the GEOS Composition Forecast system (GEOS-CF). In a second step, we propose to transfer the technology to NOAA’s air quality forecasting (AQF) system. This project will greatly advance NASA’s capability to monitor, simulate, and understand reactive trace gases and aerosols in the atmosphere. It directly supports NASA’s TEMPO mission – scheduled to launch in November 2022 – and other upcoming NASA missions including PACE and MAIA. It alleviates a major limitation of existing ESMs, namely the prohibitive computational cost of full chemistry models. We address this issue by implementing simplified parameterizations for the slowest model components, the simulation of atmospheric chemistry and the advection of chemical species. We further propose the use of conditional generative adversarial networks (cGAN) ML algorithms for the estimation of probability distributions to enable ensemble-based applications.

A key aspect of the proposed system is that the original numerical model and the accelerated models can be used in tandem. This way, the full physics model can be deployed for the main analysis stream, and the accelerated system is used to improve overall analytic and predictive power during forecast and data assimilation. This minimizes the impact of compounding errors that can arise from the use of ML models alone.

The project comprises two science tasks: Task 1 is to implement simplified parameterizations for atmospheric chemistry and tracer transport. This task builds on extensive previous work by the proposal team. For instance, PI Keller has developed a ML emulator for atmospheric chemistry based on gradient boosted regression trees, but this algorithm has not yet been tested for high-resolution applications such as GEOS-CF. Here we propose to do so, along with the development of an accelerated tracer transport capability based on optimizing the number of advected tracers and time stepping. Computation of atmospheric chemistry and transport consume more than 80% of the total compute time of full chemistry simulations, and we project that the accelerated system will deliver a 3-5 fold model speed-up. The second task is to develop an efficient methodology for generating probabilistic estimates of atmospheric composition. This task leverages the fact that cGANs offer a natural way to estimate probability distributions from a limited set of samples. The accelerated model developed in Task 1 will make it feasible to produce such samples, and we will combine that model sampling capability with the density estimation power of cGANs to dynamically estimate model uncertainties. Combined, Tasks 1 and 2 will offer an ensemble-style modeling framework for reactive trace gas and aerosol simulations that is applicable to a wide array of systems and applications. We will demonstrate these capabilities by integrating the framework into the NASA GEOS-CF system, and transfer the technology to NOAA’s air quality forecasting system in Task 3. Thus, the proposed project will greatly advance the composition modeling, prediction, and monitoring capabilities of NASA and NOAA.

—

Open Climate Workbench to support efficient and innovative analysis of NASA’s high-resolution observations and modeling datasets
Huikyo Lee, Jet Propulsion Laboratory

We propose to develop an Analytic Collaborative Framework (ACF) that can power the processing flow of large and complex Earth science datasets and advance the scientific analysis of those datasets. In the proposed ACF development, we aim to address one of the current, fundamental challenges faced by the climate science community: bringing together vast amounts of both model and satellite observation data at different spatial and temporal resolutions in a high-performance, service-based cyberinfrastructure that can support scalable Earth science analytics.

The Regional Climate Model Evaluation System (RCMES) developed by the Jet Propulsion Laboratory in association with the University of California, Los Angeles has undertaken systematic evaluation of climate models for many years with NASA’s ongoing investments to advance infrastructure for the U.S. National Climate Assessment (NCA). RCMES is powered by the Open Climate Workbench (OCW; with a current public version of v1.3), an open-source Python library, that handles many of the common evaluation tasks for Earth science data such as rebinning, metrics computation, and visualization. Over the next three years, we propose to significantly advance the use and analysis of large and complex Earth science datasets by improving and extending the capabilities of OCW to version 2.0, and OCW v2.0 will be the ACF.

The primary goal of developing OCW v2.0 is to improve and extend the capabilities of OCW for characterizing, compressing, analyzing, and visualizing observational and model datasets with high spatial and temporal resolutions. As an open-source ACF for climate scientists, OCW v2.0 will run on AWS Cloud with special emphasis on developing two use cases: air quality impacts due to wildfires and elevation-dependent warming. Our four specific objectives are to:

O1. Migrate the RCMES database (RCMED) to AWS and provide observational datasets for the upcoming fifth NCA

O2. Optimize the scientific workflows for common operations, applying data compression techniques and autonomic runtime system

O3. Integrate cross-disciplinary algorithms for analyzing spatial patterns

O4. Provide a comprehensive web service and supporting documentation for end-users

The primary outcome of our proposed work will be a powerful ACF with enhanced capabilities for utilizing state-of-art observations and novel methodologies to perform comprehensive evaluation of climate models. By infusing recent advances in data management and processing technology, OCW v2.0 will be able to optimize scientific workflows when users analyze high-resolution datasets from RCMED, CMIP6 S3, and NASA’s Distributed Active Archive Centers (DAACs). All of the OCW v2.0’s capabilities will be made available from both command-line scripts and Jupyter Lab notebooks, which capture the end-to-end analysis workflow from collaborating climate scientists for reuse and modification.

The proposed ACF development will meet one of the three AIST program’s main objectives by “fully utilizing the large amount of diverse observations using advanced analytic tools, visualizations, and computing environments.” Our data compression and runtime system will “address the Big Data challenge associated with observing systems and facilitate access to large amounts of disparate datasets” by compressing the datasets while assessing the trade-offs between lowering spatial resolution and accuracy. In addition, the ACF will “make unique topological data analysis (TDA) tools accessible and useful to the climate science community.” By testing two use cases with our ACF, we will enable our long-term vision of “moving from custom-built ACF systems to reusable frameworks” for supporting various research activities in climate science, and broadening NASA’s footprint in earth-science analytics in a way that increases the utility of current data products and cultivates demand for future ones.

—

Ecological Projection Analytic Collaborative Framework (EcoPro)
Seungwon Lee, Jet Propulsion Laboratory

In this time of global heating and rapid climate change, Earth’s ecosystems are under great stress for their survival and Earth’s biodiversity is being rapidly reduced and widely redistributed. Despite the importance of biodiversity for humanity and the imminent nature of the threat, efforts to project these losses over the coming decades remain crude, especially contrasted with other processes of global importance such as relative sea level rise. As a discipline, ecological projection is still in its early stages, and will become increasingly important as stress drivers increase and losses mount. A systematic framework, that includes advanced data science tools that are vetted and refined iteratively during application to multiple use cases, will accelerate progress.

Following the 2017 Earth Science Decadal Survey, NASA has made biodiversity a priority. This is reflected in ongoing and upcoming missions, e.g., ECOSTRESS, UAVSAR, NISAR, PACE, and SBG. In the next few years, these biodiversity remote sensing missions will create, improve, and lengthen high-resolution records of changing environmental variables and indicators for ecological systems. This provides a unique and well-timed opportunity to advance ecological projection on multi-decadal timescales, as well as ecological forecasting on shorter timescales.

We propose to build Ecological Projection Analytic Collaborative Framework (EcoPro) to support multidisciplinary teams to conduct ecological projection studies, collaborations, applications, and new observation strategy developments. EcoPro will contain (1) an analytic toolkit to perform the multidisciplinary analyses, (2) a data gateway to organize, store, and access key input and output datasets, and (3) a web portal to publish and visualize the results of the studies and to provide a virtual collaborative work space.

The project aims to achieve the following three outcomes by building and applying EcoPro to the ecological projection and forecasting.

Outcome 1: Perform scientific studies to demonstrate the scientific use of EcoPro. Giant sequoias forests, giant kelp forests, and coral reefs will be used as case studies. They are important local ecosystems threatened by climate change. The case studies will demonstrate the sufficiency of EcoPro Analytic Toolkit for ecological projection studies.

Outcome 2: Generate application-usable datasets and visualize them to demonstrate the application use of EcoPro. Climate stressor variables and habitability and mortality indicators of our three case-study ecosystems will be generated and visualized for the use by the application community (e.g., conservation managers from the California Department of Fish and Wildlife, the California Ocean Protection Council, the National Park Service, the US Forest Service, NOAA Coral Reef Conservation Program, the Great Barrier Reef Marine Park Authority). This will demonstrate the sufficiency of EcoPro Web Portal for sharing and visualizing the ecological projection datasets with the application community. In particular, we plan to transition and infuse EcoPro to the SBG Application Working Group in the optional third year.

Outcome 3: Conduct experimental studies to demonstrate the New Observing Strategies (NOS) use of EcoPro. We will experiment with different resolutions of predictor observations to quantify the adequacy of the observation datasets used in generating and validating ecological projection datasets. This will lead to new observation requests to existing observing systems or new observation requirements for future mission formulations. In the optional third year, we plan to transition and infuse EcoPro to the SBG Science Team to support assessment of data sufficiency of SBG products and formulation of complementary missions.

—

An Intelligent Systems Approach to Measuring Surface Flow Velocities in River Channels
Carl Legleiter, USGS Reston

The goal of this project is to develop a New Observing Strategy (NOS) for measuring streamflow from a UAS using an intelligent system. This framework will satisfy the AIST program objectives of enabling new measurements through intelligent, timely, and dynamic distributed sensing; facilitating agile science investigations that utilize diverse observations using advanced analytic tools and computing environments; and supporting applications that inform decisions and guide actions for societal benefit. More specifically, by focusing on hydrologic data collection, this project is consistent with NASA’s vision for NOS: optimize measurement campaigns by using diverse observing and modeling capabilities to provide complete representations of a critical Earth Science phenomenon – floods.

The USGS operates an extensive monitoring network but maintaining streamgages is expensive and places personnel at risk. This project will build upon a UAS-based payload for measuring surface flow velocities in rivers, developed jointly by the USGS and NASA, to improve the efficiency and safety of data collection. The sensor package consists of thermal and visible cameras, a laser range finder, and an embedded computer, all integrated within a common software middleware. At present, these instruments provide situational awareness for the operator on the ground by transmitting a reduced frame rate image stream. Our current concept of operations involves landing the UAS, downloading the images, and performing Particle Image Velocimetry (PIV). This analysis includes image pre-processing, stabilization, geo-referencing, and an ensemble correlation algorithm that tracks the displacement of water surface features. For this project, the workflow will be adapted for real-time implementation onboard the platform.

Developing this NOS is timely because the impacts of climate change on rivers create a compelling need for reliable hydrologic information not only through regular monitoring but also in response to hazardous events. Our intelligent system will be designed to address both of these scenarios. First, we will facilitate quality control during routine streamgaging operations by quantifying uncertainty. For example, the UAS could be directed to hover at a fixed location above the channel and acquire images until a threshold that accounts for natural variability and measurement error is reached; only after this criterion is satisfied would the platform advance to the next station. We refer to this mode of operations as stationing autonomy. Second, during hazardous flood conditions identified via communication with other sensors, the focus of the intelligent system would shift to autonomous route-finding. To enable dynamic data collection, heavy precipitation or abrupt rises in water level within a basin would trigger deployment of a UAS to measure streamflow during a flood. Onboard PIV will provide the intelligent system with real-time velocity information to direct the UAS to focus on high velocity zones, areas likely to scour, and threatened infrastructure. This information could be transmitted wirelessly and used to inform disaster response. By developing these capabilities, we will introduce a NOS that significantly enhances both hydrologic monitoring and response to extreme events.

Our intelligent systems framework will be implemented in three phases. Initially, simulations will be used to develop methods of characterizing uncertainty and selecting optimal routes. These simulations will be based on field data sets from a range of river environments and created within a real-time robotics simulator. In the second stage, the algorithms will be applied to data recorded during previous UAS flights. The third phase will apply the intelligent system to live data during a flight, with the sensor payload being used for verification. This progression will transition the NOS from an initial TRL of 4 to 6 by the conclusion of the two-year effort.

—

Reproducible Containers for Advancing Process-oriented Collaborative Analytics
Tanu Malik, De Paul University

For science to reliably support new discoveries, its results must be reproducible. This has proved to be a severe challenge. Lack of reproducibility significantly impacts collaborative analytics, which are essential to rapidly advance process-oriented model diagnostics (PMD). As scientists move from performance-oriented metrics toward process-oriented metrics of models, routine tasks require diagnostics on analytic pipelines. These diagnostics help to understand biases, identify errors, and assess processes within the modeling and analysis framework that lead to a metric.

To conduct diagnostics, scientists refer to the “same pipeline”. Referring to the same software and data, however, becomes contentious—scientists iteratively tune and train pipelines with parameters, changing model and analysis settings. Often reproducibility in terms of sharing a common analytics pipeline, and methodically comparing against different datasets cannot be achieved. While tracking logs, provenance, and sufficient statistics are used, these methods remain disjoint from analysis files and only provide post hoc reproducibility. We believe a critical impediment to conducting reproducible science is the lack of software and data packaging methods which methodically encapsulate content and record associated lineage. Without such methods, deciding a common reference pipeline and scaling-up collaborative analytics, particularly in operational settings, becomes a challenge.

Container technology, such as Docker, Singularity, provides content encapsulation and improves software portability and is being used for conducting reproducible science. Containers are useful for well-established and documented analysis pipelines, but, in our experience, the technology has a steep learning curve and significant overhead of use, especially for iterative, diagnostic methods.

This proposal aims to establish reproducible scientific containers that are easy-to-use and are lightweight. Reproducible containers will transparently encapsulate

complex, data-intensive, process-oriented model analytics, will be easy and efficient to share between collaborators, and will enable reproducibility in heterogeneous environments. Reproducible containers, developed by the PI so-far, rely on reference executions of an application to automatically containerize all necessary and sufficient dependencies associated with the application. They record application provenance and enable repeatability in different environments. Such containers have met with considerable success, demonstrating a lightweight alternative to regular containers for computational experiments conducted by individual geoscientists in the domains of Solid Earth, Hydrology, and Space Science. However, in their current form, containers, reproducible or otherwise (such as Docker), are not data-savvy—they are oblivious to spatio-temporal semantics of data and either include all data used by an application or exclude it entirely. When all data is included, containers become bloated; alternatively when excluded, they cause network contention at the virtual file system.

The target outcome of this project is to develop reproducible containers that are data-savvy—that is to retain their original properties of automatic containerization, provenance tracking, and repeatability guarantees, but provide ease of operation with spatio-temporal scientific data, and are efficient to share and repeat even when an application uses a large amount of data. This outcome will be achieved by (i) developing an I/O-efficient data observation layer within the container, and (ii) including spatio-temporal data harmonization methods when containers encapsulate heterogeneous datasets (ii) applying data-savvy, reproducible containers to process-oriented precipitation feature (PF) diagnostics, and (iv) finally assessing how diagnostics improve with the use of data-savvy, provenance-tracking reproducible container.

—

Terrestrial Environmental Rapid-Replicating Assimilation Hydrometeorology (TERRAHydro) System: A machine-learning coupled water, energy, and vegetation terrestrial Earth System Digital Twin
Craig Pelissier, Science Systems And Applications, Inc.

The Earth’s environment is changing rapidly, resulting in more extreme weather and increased risk from weather and climate related phenomena. Land Surface Models (LSMs) are a critical component of climate and weather forecasting models, and are integral tools for regional drought monitoring, agricultural monitoring and prediction, famine early warning systems, and flood forecasting, among other things. A vital part of NASA’s Earth Science mission includes supporting terrestrial models (LSMs) that can leverage available Earth observation data (EOD) to provide accurate and timely information about the terrestrial water, energy, and carbon cycles. The increase in adverse weather conditions makes near real-time and short-term capabilities increasingly critical for early response systems and mitigation.

In the past 5 years, Machine Learning (ML) has emerged as one of the most powerful ways to extract information from large and diverse sets of terrestrial observation data (see references in the Open Source Software Licensing section). Our group and others in the hydrological community have successfully developed models of most of the key states (streamflow, soil moisture, latent and sensible heat, and vegetation) and fluxes that are significantly more accurate than the traditional process-based models (PBMs) currently deployed by NASA. Importantly, they run several orders of magnitude faster than traditional PBMs. This arises from a simpler numerical structure and from the ability to efficiently use hardware accelerators (GPUs, TPUs). Their rapid-adaptation and increased throughput can provide unprecedented near real-time and short-term forecasting capabilities that far exceed the current PBM approaches in use today. Although ML models do not currently provide a process-based scientific explainability, and there remains skepticism about long-term forecasting in the presence of non-stationarity (changing climate), their accuracy and ability to enhance the current near real-time and short-term capabilities is undeniable.

The current NASA LSM software infrastructures (SI), (e.g., the NASA Land Information System; [1]) are not designed in ways that allow them to fully leverage ML technologies due to several differences such as: native programming languages, software stacks, numerical algorithms, methodologies, space-time structures, and High-Performance Computing (HPC) capabilities. The effort to merge these technologies into a unified SI, if feasible, would be significant and likely result in something brittle, not easily-extensible, hard to maintain, and impractical. We propose to develop a terrestrial Earth System Digital Twin (TESDT) that is designed from the ground-up to couple state-of-the-art ML with NASA (and other) EOD. This TESDT will combine the best ML hydrology models with capabilities for uncertainty quantification and data assimilation to provide a comprehensive TESDT. The software infrastructure will be developed in Python and specifically designed to provide a flexible, extensible, modern, and powerful framework that will be a prototype AI/ML based TESDT. It will be able to perform classically expensive tasks like ensemble and probabilistic forecasting, sensitivity analyses, and counterfactual “what if” experiments that will provide critical hydrometeorological information to aid in decision and policy making.

We will build the SI to integrate and couple the land surface components including data management, training, testing, and validation capabilities. Different coupling approaches will be deployed, researched, and tested, as well as, an ML specific data assimilation framework. In the optional year, relevant hydrometeorological events, e.g., the 2006-2010 Syrian drought and current changes to water storage in the Himalayan mountains, will be used to demonstrate and validate the performance of the aforementioned capabilities to real world applications.

—

Knowledge Transfer for Robust GeoAI Across Space, Sensors and Time via Active Deep Learning
Saurabh Prasad, University Of Houston

Recent advances in optical sensing technology (e.g., miniaturization and low-cost architectures for spectral imaging in the visible, near and short-wave infrared regimes) and sensing platforms from which such imagers can be deployed (e.g. handheld devices, unmanned aerial vehicles) have the potential to enable ubiquitous passive and active optical data on demand to support sensing of our environment for earth science. One can think of the current sensing environment as a vast sensor-web of multi-scale diverse spatio-temporal data that can inform various aspects of earth science. Although this increase in the quality and quantity of diverse multi-modal data can potentially facilitate improved understanding of fundamental scientific questions, there is a critical need for an analysis framework that harmonizes information across varying spatial-scales, time-points and sensors. Although there have been numerous advances in Machine Learning models that have evolved to exploit the rich information provided by multi-channel optical imagery and other high dimensional geospatial data, key challenges remain for effective utilization in an operational environment. Specifically, there is a pressing need to have an algorithm base capable of harmonizing sensor-web data under practical imaging scenarios for robust remotely sensed image analysis.

In this project, we propose to address these challenges by developing a machine learning algorithmic framework and an associated open source toolkit for robust analysis of multi-sensor remotely sensed data that advances emerging and promising ideas in deep learning, multi-modal knowledge transfer between sensors, space and time, and provides capability for semi-supervised and active learning. Our model would seek to harmonize data from heterogenous sources, enabling seamless learning in a disparate ensemble of multi-sensor, multi-temporal data. Our proposed architecture will be comprised of a generative adversarial learning-based knowledge transfer framework that will use optics inspired and sensor-node specific neural networks, multi-branch feed forward networks to transfer model knowledge from one or more source sensor nodes to one or more target sensor nodes, and semi-supervised knowledge transfer. This game-changing model transfer and cross-sensor super-resolution/sharpening capability will enable end-users to leverage training libraries that provide disparate or complementary information (for example, imparting robustness to spatio-temporal non-stationarities and enabling learning from training libraries from different geographical regions, sensors, times and sun-sensor-object geometries). We will develop and validate active deep learning capability within our knowledge transfer framework that will seek to strategically facilitate additional labeling in the source and/or target sensor-nodes for further improving performance. A functional prototype of our framework will be implemented on a commercial cloud for dissemination and access to stake-holders and the broader research community. Our algorithms will be developed with a specific earth science application focus – earth observation based agricultural sensing. However, the tools developed in this project will be readily applicable to other domains, and will have far reaching benefits to NASA earth science – including ecological impacts of climate change, forestry, wetlands, etc., using a wide array of spaceborne data sources such as: (1) multispectral imaging systems (e.g. Landsat, Sentinel), (2) imaging spectroscopy (e.g. DESIS and the future HyspIRI and CHIME missions), (3) SAR (Sentinel, (future) NISAR), and a rich archive of NASA, ESA and commercial satellite imagery, as well as airborne platforms (e.g. AVIRIS-NG, G-LiHT, commercial). The proposed capability will also be important in successful analysis of data acquired by constellations of satellites, for which seamless learning is a key objective.

—

Integration of Observations and Models into Machine Learning for Coastal Water Quality
Stephanie Schollaert Uz, NASA Goddard Space Flight Center

Coastal areas are impacted by population growth, development, aging infrastructure, and extreme weather events causing greater runoff from land. Monitoring water quality is an urgent societal need. A growing fleet of satellites at multiple resolutions provide the ability to monitor large coastal areas using big data analytics and machine learning. Within our AIST18 project, we started working closely with state agencies who manage water resources around the Chesapeake Bay. We propose to build upon these activities to improve the integration of assets to monitor water quality and ecosystem properties and how they change over time and space. Initially we are taking advantage of technologies and data collected in and around the Chesapeake Bay, with a plan to expand to other watersheds.

As the largest estuary in North America, the Chesapeake Bay receives runoff from approximately 100,000 tributaries, carrying sediment, fertilizer, and pollutants from farms, developed communities, urban areas, and forests. These constituents degrade water quality and contribute to its optical complexity. Resource managers tasked with enforcing pollution reduction goals for these point and non-point sources are also challenged by shrinking budgets with which to monitor multiple aspects of the ecosystems while the use of the Bay for recreation, fishing, and aquaculture is increasing. Of particular concern are the increasing number of harmful algal blooms (HABs) and septic tank leaks due to aging infrastructure and rising sea level [Wolny et al., 2020; Mitchell et al., 2021]. State agencies already work closely with NOAA and EPA and are looking to NASA to apply advanced technologies to further improve their natural resource management.

Our AIST18 project demonstrated promising results with multispectral optical, medium spatial resolution satellite data trained using geophysical model variables within a machine learning (ML) architecture by extracting multi-source feature maps. The nearshore environment demands finer spatial resolution than government assets alone can provide, thus we plan to build on this work by utilizing higher spatial resolution data from commercial satellites. Following our demonstration of feasibility using medium resolution satellite imagery from one sensor, we will now derive feature maps from many sensors of varying spatial, spectral, and temporal resolution. These can be effectively merged regardless of initial source resolution at progressively higher (hierarchical) contextual levels by fusing at multiple layers within the ML model. Heterogeneous feature maps can be adaptively scored and weighted, which influences their significance in the resulting predictions. We plan to analyze higher spectral information from in situ inherent optical property observations to determine the minimum set of requirements for remote sensing of water quality, e.g. water clarity, phytoplankton blooms, and the detection of pollutants. In situ observations will facilitate ML training using higher spectral and spatial resolution imagery from commercial satellites at the coast. We are also collaborating with community experts to evaluate the utility of hyperspectral remote sensing for detecting aquatic features not discernable through multispectral imaging, such as phytoplankton community structure and the likelihood of harmful blooms. In situ observations will facilitate ML training using hyperspectral and higher spatial resolution imagery from commercial satellites at the coastal margins and land-water interface. Finally, we aim to eventually integrate upstream assets of land cover classification, elevation, vertical land motion, and hydrology as inputs to the ML architecture, leveraging other projects that characterize the watershed and runoff of sediments and nutrients to coastal water bodies. Adapting our process to an open science framework will facilitate future integration of these data beyond the aquatic community.

—

3D-CHESS: Decentralized, distributed, dynamic and context-aware heterogeneous sensor systems
Daniel Selva, Texas A&M Engineering Experiment Station

The overarching goal of the 3D-CHESS Early-Stage Technology proposal is to demonstrate proof of concept (TRL 3) for a context-aware Earth observing sensor web consisting of a set of nodes with a knowledge base, heterogeneous sensors, edge computing, and autonomous decision-making capabilities. Context awareness is defined as the ability for the nodes to gather, exchange, and leverage contextual information (e.g., state of the Earth system, state and capabilities of itself and of other nodes in the network, and how those relate to the dynamic mission objectives) to improve decision making and planning. We will demonstrate the technology and characterize its performance and main trade-offs in a multi-sensor in-land hydrologic and ecologic monitoring system performing four inter-dependent missions: studying non-perennial rivers and extreme water storage fluctuations in reservoirs, and detecting and tracking ice jams and algal blooms.

The Concept of Operations is as follows. Nodes in the sensor web can be ground, air or space. Nodes may be manually operated or fully autonomous. Any node can send a request for a mission to the sensor web (e.g., measuring geophysical parameter p at point x and time t+/-dT with a certain resolution dx and accuracy dp). Upon reception of a mission request, each node uses a knowledge base to decide if given its own state and capabilities it can perform part or all of the proposed mission. If so, it enters a planning phase in which based on its own goals and utility function it decides whether and how much to bid for the proposed mission. A market-based decentralized task allocation algorithm is used to coordinate assignments across nodes.

To establish proof of concept for a sensor web that works as described in the previous paragraph, we will develop a multi-agent simulation tool by integrating existing tools developed by the team and apply it to a relevancy scenario focusing on global inland hydrologic science and applications. A continuous monitoring system will be simulated that will provide global, continuous measurements of water levels, inundation, and water quality for rivers and lakes using a variety of sensors and platforms. The system will start with a default scientific mission objective to study extreme water storage fluctuations in reservoirs and wetting and drying processes in non-perennial rivers (science-driven). In addition, two applications-driven missions of opportunity will be modeled and considered: ice jams and corresponding upstream flood events, and harmful algal blooms in lakes.

We recognize that 3D-CHESS represents an “aggressive” vision that departs significantly from the state of practice. Therefore, in addition to comparing the value of an implementation of the full 3D-CHESS vision against the status quo (Goal 1), we will also systematically study “transition” architectures that lie somewhere in between the full 3D-CHESS concept and the status quo (Goal 2). For example, in these transition architectures, new task requests may come only from human operators, or planning may be manually done by operators for some nodes while being fully autonomous for others – although still allowing for humans to update the nodes’ utility functions and intervene in case of contingency.

The proposed work has direct relevance to the O1 objective of the AIST program solicitation as it develops new technologies that enable unprecedented degrees of autonomy, decentralization, and coordination to achieve new science capabilities and improved observation performance while reducing development and operational costs. The combination of the knowledge-based technologies and decentralized planning technologies integrated within a multi-agent system framework enables the EOS to respond to scientific and societal events of interest faster and more effectively.

—

Kernel Flows: emulating complex models for massive data set
Jouni Susiluoto, Jet Propulsion Laboratory

Inference about atmospheric and surface phenomena from remote sensing data often requires computationally expensive empirical or physical models, and always requires uncertainty quantification (UQ). Running these models to predict or retrieve geophysical quantities for very large data sets is prohibitive, and Monte Carlo experiments for UQ, which involve rerunning these models many times, are out of the question except possibly for small case studies. These problems can be overcome with emulators: machine learning models that “emulate” physical models. Emulators are trained on carefully selected examples of inputs and outputs generated either by pairs of inputs and outputs acquired directly from observations, or by a physical model under specific conditions that are representative of the problem space. Then, the emulator is applied to new inputs, and produces estimates of corresponding outputs, ideally with uncertainties due to the emulation itself. The latter are crucial for interpreting emulator output, and must be included in the total uncertainty ascertained from Monte Carlo-based UQ experiments.

We propose a general-purpose, versatile emulation tool that (1) provides fast, accurate emulation with little tuning, (2) scales up to very large training sets, (3) provides uncertainties associated with outputs, and (4) is open source. This tool set will facilitate large-scale implementation of forward modeling and retrievals, and of UQ at production scales. We choose two science application areas to showcase these capabilities: (A) nowcasting the evolution of convective storms; an example of empirical modeling, and (B) radiative transfer for Earth remote sensing; an example of physical modeling.

Our methodology is based on Gaussian Processes and cross-validation. These are combined in an algorithm called Kernel Flows, hereafter KF (Owhadi and Yoo, 2019). KF has been used in a variety of settings with excellent results, including climate model emulation (Hamzi, Maulik, and Owhadi, 2021). Preliminary results applying KF to radiative transfer problems for OCO-2, MLS, and imaging spectroscopy show that KF is well-suited to high-dimensional emulation required in the types of problem represented by our applications. Our method is general enough to apply to a wide range of analysis and prediction problems, and will enable agile science investigations as called for in Objective O2 of the Notice of Funding Opportunity. Software interfaces will be lightweight, simple, and general, to enable easy integration with different data sources.

The two science applications involve running models on very large data sets, the need to do it faster than is currently possible, and to quantify uncertainties that result from the emulation process. In the nowcasting example, the model to be emulated is the relationship between vertical structure of storm clouds and convective storm formation. In the other application, it is a radiative transfer model. The Gaussian Process underlying our method is based on a rigorous probabilistic model that can be used to generate Monte Carlo replicates of the predicted fields. This enables forward UQ experiments to derive uncertainties on predicted quantities, including emulator uncertainty.

—

A New Snow Observing Strategy in Support of Hydrological Science and Applications
Carrie Vuyovich, NASA Goddard Space Flight Center

This proposal aims to develop a new observing strategy (NOS) for snow that considers the most critical snow data needs along with existing and expected observations, models, and a future snow satellite mission. The Snow Observing System (SOS) will be used to estimate SWE and snow melt throughout the season, targeting observations (e.g. peak SWE and the onset of melt) with the greatest impact to hydrological metrics as they occur in different regions.

Snow is a seasonally evolving process that results in a reflective, insulating cover over the Earth’s land mass each year, provides water supply to billions of people and supports numerous ecosystems. Snow also contributes to short-term and long-term disasters. It is a critical storage component of the global water cycle, yet we currently do not have global snow observations that provide data needed to understand its role in hydrological regimes and respond to snow-related events. While numerous existing or expected satellite sensors are sensitive to snow and provide information on different snow properties, none provide global snow water equivalent (SWE) data, the essential information to address hydrologic science questions, at the frequency, resolution and accuracy needed.

While snow contributes water resources to a large portion of the Earth’s terrestrial area, its coverage and role evolves throughout the season, affecting different regions, elevations and latitudes at different times of the year. For instance, in North America, peak SWE, peak SWE uncertainty and melt onset shift from lower latitudes and elevations early in the year (Jan – Apr) to northern latitudes and elevations later (May – June), indicating that data needs may also shift throughout the year. Seasonal snow is a perfect candidate for an optimized observational strategy that leverages existing sensors and focuses future mission concepts on monitoring the most critical areas to provide cost-effective and robust information.

The National Academy of Science identified snow water equivalent or snow depth as a critical observation in the 2017-2027 Decadal Survey and recommended it as a measurement priority in the Earth System Explorer (ESE) class missions. An ESE announcement of opportunity is expected within the next year (Earth Science “Earth System Explorers” AO Community Announcement NNH22ZDA002L), which could provide an opportunity to launch a SWE-focused mission. An optimized observing strategy for snow will be integral to the mission design to target the most critical data needs.

Our approach will be to evaluate observations from existing missions that have previously not been combined in an optimized way; create a hypothetical experiment to determine an optimal observing strategy focused on specific hydrological events; assess the value of new potential sensors, such as from commercial smallsats to fill observing gaps and provide higher frequency observations during critical time periods. We will also evaluate the potential for focusing higher density observations in regions where concerns for flood, drought or wildfires will benefit from early warning. These dynamic observations could help rally ground, UAS or airborne observations in regions showing snow volumes outside the normal range or experiencing unexpected snowpack conditions.

The expected outcome is a demonstrated plan for a new observing system that responds to the dynamic nature of seasonal terrestrial snow and focuses on regions of interest in a timely manner. This strategy will allow high-resolution observations in critical areas for improving science and application understanding, reduce the potential cost of a global snow-observing satellite mission by not observing non-snow-covered areas, and take advantage of new commercial smallsat assets which are advancing rapidly and becoming increasingly available at the frequencies of interest.

—

SLICE: Semi-supervised Learning from Images of a Changing Earth
Brian Wilson, Jet Propulsion Laboratory

The field of computer vision (CV) is advancing rapidly, enabling significant accuracy improvements for image classification, segmentation, object detection problems. In particular, multiple self- and semi-supervised learning approaches have been published in recent years:

– SimCLR (v2) from Google (A Simple Framework for Contrastive Learning of Visual Representations),

– FLASH from University of Pennsylvania and Georgia Tech (Fast Learning via Auxiliary signals, Structured knowledge, and Human expertise)

– DINO / PAWS from Facebook (Self-Supervised Vision Transformers with DINO / Predicting View-Assignments with Support Samples),

– EsViT from Microsoft (Efficient Self-supervised Vision Transformers for Representation Learning).

We propose to investigate and characterize the efficacy of multiple SSL techniques for representative image problems on Earth imagery, and then select the best for further infusion into mission and science workflows. The challenge for assessing climate change is to build a flexible Cloud-based system, with additional GPU training on supercomputers, that can apply cutting-edge computer vision techniques at scale on years of satellite-based Earth and ocean imagery.

Therefore, we propose to build a Cloud and supercomputing-based platform for cutting-edge computer vision at scale. The three top-level goals of the SLICE system, “Semi-supervised Learning from Images of a Changing Earth” are:

1. Establish the SLICE framework and platform for applying scalable semi-supervised computer vision models to Earth imagery, running in AWS Cloud and supercomputing environments, that can be easily adopted as a reusable platform by NASA data centers, mission science teams and NASA PIs.

2. Investigate and characterize the accuracy of multiple SSL models (i.e. SimCLRv2, DINO, EsViT) on a variety of relevant remote sensing tasks with minimum labels (here ocean phenomena).

3. Build and publish self- and semi-supervised learning models with a focus on the upper ocean small-scale processes in anticipation of several on-going and upcoming NASA missions (i.e. SWOT, WaCM, and PACE). Ocean eddy properties and derived heat flux will be modeled and predicted from SST, SSH, and SAR data.

The proposed work is directly traceable to AIST Objective O2 by developing a task-agnostic deep learning framework that facilitates large-scale image analytics on disparate, multi-domain datasets. The development of a framework and platform to train SOTA deep learning models with limited labels is responsive to the Analytic Collaborative Frameworks (ACF) thrust of the AIST element because such models can be used as individual, high-performing building blocks of a unified ACF, or for that matter surrogate models in an ESDT.

The SLICE platform will provide correct example ML workflows, parallel image tile preparation, best of breed data augmentation and training frameworks that do distributed training on multiple GPU’s, and publish multiple tuned SOTA SSL and vision transformer models for reuse. The pretrained SSL models will give scientists a headstart in that they can be immediately applied and finetuned on the target problem, with less training time required. By providing models pretrained on physics model outputs (not ImageNet), the starting model weights can be “physics informed” for the geophysical system being studied. Since all of the algorithms will be “pluggable”, they can be replaced with modified data preprocessing & augmentation, or the latest cutting-edge DL approach and network architecture. The CV field applied to Earth imagery is growing rapidly and is overdue for a standard platform that can evolve rapidly, so that the science applications stay on the cutting edge of ML. SLICE is a first step at standardization and spreading SOTA semi-supervised learning approaches to CV in all science areas.

—

Coupled Statistics-Physics Guided Learning to Harness Heterogeneous Earth Data at Large Scales
Yiqun Xie, University of Maryland, College Park

Despite recent advances of machine learning (ML) in computer vision and machine translation, creating learning techniques that are spatially-generalizable and physics-conforming remains an understudied and challenging task in Earth Science (ES). In particular, direct applications of typical ML models often fall short due to two major challenges posed by ES data. First, a fundamental property of spatial data is spatial heterogeneity, which means the functional relationships between target variables (e.g., land cover changes, water temperature and streamflow) and Earth observations tend to be non-stationary over space. The footprints of such heterogeneous data generation functions are often unknown, adding an extra layer of complication. Second, annotated data available in ES applications are often limited or highly localized due to the substantial human labor and material cost for data collection. As a result, pure data-driven attempts – which are often carried out without consideration of underlying physics – are known to be susceptible in learning spurious patterns that overfit limited training data and cannot generalize to large and diverse regions.

We aim to explore new model-agnostic learning frameworks to explicitly incorporate spatial heterogeneity awareness and physical knowledge to tackle these challenges in ES. First, to harness spatial heterogeneity, we will explore a statistically-guided framework to automatically capture spatial footprints of data generated by different functions (e.g., target predictions as functions of spectral bands) and transform an user-selected deep network architecture into a heterogeneity-aware version. We will also investigate more effective spatial knowledge sharing models using the spatial-heterogeneity-aware architecture. Second, to further improve the interpretability and generalizability for data-sparse regions, we will explore new physics-guided ML architectures to incorporate domain knowledge, e.g., water temperature dynamics driven by the heat transfer process and other general physical processes embedded in physics-based models. To address the biased parameterizations of physics-based models, we will also investigate new learning strategies to effectively extract general physical relations from multiple physics-based models. Finally, we will explore synergistic integration of the statistically and physically guided frameworks to create a more holistic solution to address various challenging ES application scenarios. Results will be evaluated using important ES tasks, including land cover and land use change (LCLUC) mapping and surface water monitoring, with support from chief scientists of NASA LCLUC and USGS water monitoring programs.

To improve the confidence in the success of the new technology, our preliminary exploration has created prototypes of the frameworks to perform statistically-guided spatial transformation for data with spatial heterogeneity, and physics-guided learning for scenarios with limited data. Preliminary case studies using ES data have demonstrated feasibility and the potential of the new frameworks: (1) for land cover mapping, our prototype of spatial transformation improved the F1-score by 10-20% over existing deep learning baselines; and (2) for water temperature and streamflow prediction, the preliminary physics-guided learning model demonstrated improvements over both existing process-based models used by USGS and ML models over large-scale river basins and lakes, and the model has been included in USGS’s water prediction workplan. The proposal team includes experts from both computer science and ES. Targeted deliverables of this project include the new technology as well as its open-source implementation, and related ES benchmark datasets used for validation.