Project Selections for AIST-23
24 Projects Awarded Under the Advanced Information Systems Technology (AIST) Program
12/03/2024 – NASA’s Science Mission Directorate, NASA Headquarters, Washington, DC, has selected proposals for the 2023 solicitation of the Advanced Information Systems Technology Program (A.58 of the 2023 Research Opportunities in Space and Earth Sciences omnibus solicitation) in support of the Earth Science Division (ESD). The 2023 AIST awards will provide novel information systems and computer science technologies to reduce the risk, cost and development time of NASA space- and ground-based information systems, to enable advanced observation measurements, as well as to grow scientific understanding of the Earth’s systems, improve predictive capabilities, and deliver actionable science and applications to inform decisions.
NASA’s AIST Program identifies, develops, and supports adoption of software and information systems, as well as novel computer science technology expected to be needed by the ESD in the next 5 to 10 years. The AIST Program is organized around two primary thrusts, the Novel Observing Strategies (NOS) and the Earth System Digital Twins (ESDT). Proposals were solicited within these two thrusts or in several other very advanced and promising software technology areas and were expected to explicitly show how the resulting technology would be infused into at least one of ESD’s Science or Earth Action domains. The AIST Program anticipates the technologies in these proposals will mature at least one Technology Readiness Level, with an eventual goal to demonstrate their value to the relevant science communities.
ESD’s Earth Science Technology Office (ESTO) evaluated 66 proposals and will award a total of 24 proposals with periods of performance of 1.5 to 3 years. The total amount of all the awards is $38M. The abstracts as provided by proposers are as follows:
—-
- Explainable AI Model Development via Model-Agnostic and Physics-Informed Approaches For Earth Science
James Carr, Carr Astronautics Corporation - Flight Demonstration of Federated New Observing Strategies for Multiple Science Applications
Steve Chien, Jet Propulsion Laboratory - Integrating Explainable Machine Learning with Physics for Enhanced Wildfire Detection in Observation-Constrained Environments
Leah Ding, American University - Creating a NASA Surface Topography and Vegetation (STV) Novel Observing System (NOS)
Andrea Donnellan, Jet Propulsion Laboratory - Pix4DCloud: A Suite of Physics-Constrained Transformer Models to Retrieve 4D Clouds in Real World and Digital Twins
Jie Gong, NASA Goddard Space Flight Center - An Innovative Sunlight Denoising Technique To Improve Measurement Quality and Reduce Cost of Future Spaceborne Lidars
Yongxiang Hu, NASA Langley Research Center - A Software Tool for Probabilistic Calibration and Data Assimilation for Geophysical Models
Matthias Katzfuss, University of Wisconsin, Madison - Mapping Anthropogenic Water Cycle Impacts in a Future Climate: A Global Digital Twin for Scenario-Driven Exploration
Sujay Kumar, NASA Goddard Space Flight Center - Earth System Digital Twin for Central Africa Carbon and Biodiversity Corridors
Seungwon Lee, Jet Propulsion Laboratory - Connecting a Broad Community to Earth System Digital Twin Technologies at the Interface of Atmospheric Composition with the Earth System
Randall Martin, Washington University - Machine-Learning to Improve Cycling and Forecasts with GEOS and Expedite the Evaluation of Assimilating Observations from New Instruments
Romit Maulik, Pennsylvania State University - Optimal Estimation with a Generative AI Prior for Remote Sensing Retrievals and Observing System Design
Adam Milstein, Massachusetts Institute of Technology/Lincoln Lab - A Novel Observation Strategy with Heterogeneous and Dynamic Microwave-Sensing (NOS-HDM) in Response to STV
Mahta Moghaddam, University of Southern California - A Forecasting Scheme for Accelerated Harmful Algal Bloom Monitoring (FASTHAB)
Nima Pahlevan, Science Systems And Applications, Inc. - Pilot Deployment of TERRAHydro: A Framework, Demonstration, and Vision for Earth System Digital Twins
Craig Pelissier, Science Systems And Applications, Inc. - A Terrestrial Water Budget Digital Twin
Fritz Policelli, NASA Goddard Space Flight Center - Event- and Feature-Based Observing System Design: Quantifying Science and Applications Benefit for Diverse Measurement Combinations
Derek Posselt, Jet Propulsion Laboratory - An Earth System Digital Twin for Wildfire: Predicting Wildfire Progression and Behavior, and Its Downstream Impacts on Air Quality
Mohammad Pourhomayoun, California State University, Los Angeles - Physics-Aware Quantum Neural Network Modeling of Earth Science Phenomena
Eleanor Rieffel, NASA Ames Research Center - 3D-CHESS FO: Autonomous Sensor Web for Inland Water Ecosystem Monitoring
Daniel Selva, Texas A&M Engineering Experiment Station - AI Climate Tipping-Point Simulator (ACTS)
Jennifer Sleeman, Johns Hopkins University - A Digital Twin Integrating Knowledge and AI for Understanding Carbon and Biodiversity Corridors in Central Africa
Yiqun Xie, University of Maryland, College Park - Building Earth System Digital Twins with Deep Generative Models for Improved Simulation of Clouds and Their Feedbacks and Impacts
Tianle Yuan, University of Maryland Baltimore County - Time Series Multi-Modal Foundation Model for Near-Real-Time Land Surface Dynamics Characterization in Support of ESDT
Hankui Zhang, South Dakota State University
—
Explainable AI Model Development via Model-Agnostic and Physics-Informed Approaches For Earth Science
James Carr, Carr Astronautics Corporation
Modern Artificial Intelligence (AI) applications are proving valuable to NASA science and engineering, with impressive performance for Earth Observation (EO) applications. However, a weakness of these models is the lack of explainability from their complex internal architecture with potentially billions of parameters, making them black-box universal function approximators. This lack of explainability often slows their adoption in a science community that expects full transparency and independent verification. While there is substantial benefit for AI models applied to the expansive science data published by NASA, they are difficult to replicate and explain in a human-understandable form due to the opaque nature of the models and the complex distribution of data used to train them. The true value of AI will be realized when AI explains itself to humans and finds new insights into nature.
To provide reasonable insight into these systems, this proposal will explore Explainable AI (XAI), or AI algorithms that can be readily parsed, verified, reproduced, and explained by human users, using TEMPO (Tropospheric Emissions: Monitoring of Pollution) mission science data as an example reference. Specifically, this project will examine XAI techniques with DNN (Deep Neural Network) models developed for TEMPO using both (1) model-agnostic approaches and (2) physics-informed neural networks (PINNs). The proposal will publish a methodology for XAI models that can be independently understood and verified for incorporation into broader science data processing pipelines beyond TEMPO. Our targeted outcome is developed TEMPO science use case(s) with a corresponding AI framework illustrating lessons learned and best practices for others to leverage. The framework code serves as a template for prototyping other XAI science applications and will enable the greater community to incorporate these types of models more effectively into their own mission use cases. We aim to build a community of practice centralized around XAI applications to advance objectives of the AIST Early-Stage Technology (EST) which specifically emphasizes emerging state-of-the-art AI concepts for generalization, explainability, understanding, extensibility, validation, and reproducibility of results, which are driving milestone objectives for XAI applications.
Furthermore, while initial efforts will establish science examples from TEMPO, this proposal team is uniquely qualified to address deployment of these techniques for onboard satellites systems. It is well known by the space engineering community that onboard processing options are limited, with the most recent engineering thrust towards the development of Microchip’s HPSC (High Performance Space Computer). This proposal will develop the architecture and initial schematic for RADICAL (Rad-tolerant Application Data processor for Integrated CubeSat Architecture with LS1046), a new CubeSat card featuring the Teledyne LS1046-Space radiation-tolerant quad-core processor. As part of the NASA SpaceCube family of processors, this design will fill-the-gap between the low-power SpaceCube GHOST (Frontgrade GR740-based) and the higher-power HPSC enabling the deployment of these new XAI models.
Our industry-government team is led by PI Dr. James Carr. He is PI of several successful previous AIST proposals and an influential contributor to several Earth science missions including TEMPO. Government partners are Dr. Christopher Wilson, Associate Branch Head of the Science Data Processing Branch, leading a team of expert researchers in onboard processing and AI techniques. Dr. Wilson has led several successful NASA missions (STP-H5, STP-H6, STP-H9). As part of the project team at GSFC, Justin Goodwill is author of several highly cited publications for onboard AI applications and was the lead AI-application hardware/software developer for one of the only TRL-9 AI microchip flight cards called SC-LEARN.
—
Flight Demonstration of Federated New Observing Strategies for Multiple Science Applications
Steve Chien, Jet Propulsion Laboratory
Earth science measurements have predominantly been untargeted “mow the lawn” missions that collect nadir measurements with only slight adaptivity (e.g. only collect over land + coastal, collect during the daytime). These missions have enabled incredible science discoveries and increased our understanding of the processes that govern the Earth.
However, these breathtaking advances represent only the tip of the iceberg in key measurements to drive Earth science. Increasingly precise and pinpoint measurements of critical, rare, and transient phenomena are needed to continue the rapid pace of Earth Science. While traditional Earth Science missions excel at acquiring global datasets, measuring rare, transient phenomena at finer scales requires a different approach.
Why not acquire global coverage at finer and finer scales? Fundamental physics imposes trades in remote sensing instruments on spatial resolution versus swath. Additionally, active sensors demand energy to image larger swaths, increasing mission costs.
This proposal matures and demonstrates in space Federated Autonomous MEasurement (FAME), New Observing System (NOS) technology to address this exact problem. Instead of acquisition of enormous amounts of data with the hope of acquiring rare, transient, science events of interest, FAME uses flexible, taskable assets to direct targeted observations where and when to capture these critical events.
In FAME, observation requests are submitted to a service portal and a federated, heterogenous pool of assets are allocated to satisfy the requests. However, many questions arise from this radical new approach — what do requests look like? How do requests get mapped to and fulfilled by actual spacecraft? How do the numerous entities operating the spacecraft coordinate? Our effort will answer such questions executing the largest flight and ground demonstration of autonomous spacecraft operations to date. Over fifty (50) spacecraft from diverse entities such as NASA, ESA, Ubotica/Open Cosmos, Loft, Apex, Planet, ICEYE, Capella, and others will be linked with a core of 7 spacecraft in full NOS configuration with edge computing and flight software to enable analysis of imagery and rapid notification to other spacecraft. We are working to extend this to 3 more spacecraft via an ESA led coalition.
We will first (Year 1) mature flight capabilities and demonstrate autonomous operations and science-directed measurement in the NOS testbed. Y1 will also include several on-orbit tests including onboard AI analysis of imagery with rapid notification leading to responsive tasking and acquisition of follow-up data. In Years 2 and 3 we will scale up these demonstrations to the full set of assets demonstrating processing of thousands of alerts and simulated taskings and workflows executed and hundreds of said tests executed onboard and tasking actual spacecraft. This demonstration will prove the NOS promise of achieving high temporal density, on-demand measurement of science phenomena. We will demonstrate FAME on numerous science use cases from the NOS Workshop: Atmosphere Use Cases #1 Pollution transport between boundary layer and free troposphere, #2 Clouds and Convection and #4 Environment Interaction due to explosion; Carbon & Ecosystems Use Cases #1 Monitoring of Plant Species and Habitat Protection and #3 Algal Bloom; Earth Surface & Interior Use Cases #2 Landslides, #3 Volcanoes, and #4 Land Use Land Cover Change – Wildfires; Oceans Use Case #1 Algal Bloom Tracking — Fisheries & Sea Life; and Flooding (multiple use cases). This Year 2 and Year 3 on-orbit demonstration will move the NOS paradigm into the mainstream via sustained flight on a numerous (7) spacecraft driving many more (50+) spacecraft.
—
Integrating Explainable Machine Learning with Physics for Enhanced Wildfire Detection in Observation-Constrained Environments
Leah Ding, American University
Satellite-based fire detection provides critical data for fire management, fire spread modeling, air quality forecasts, and assessments of fire impacts on ecosystems and communities. Current fire detection algorithms, whether physics-based or machine learning (ML)-based, frequently fail when wildfires are obscured by dense clouds or smoke, creating data gaps that degrade the quality of air quality and fire emissions estimates. Part of the problem is the lack of training data for fires beneath clouds, and another part is the influence of legacy computational limitations on the approach for satellite data analysis. Separate data products for fire, clouds, and aerosols, rather than joint solutions that explicitly consider the influence of atmospheric conditions on attenuation of the thermal emissions from fire events, perpetuates this issue. These missing fire detections hamper our understanding of changing fire activity in response to climate warming and inhibit our ability to attribute aerosols and greenhouse gas emissions to fire activity.
This proposed work aims to address these challenges by developing an explainable multitask ML model for fire detection and integrating it with cloud and aerosol retrieval. Our primary focus lies in addressing the specific challenges posed by the presence of clouds and dense smoke, where active fires are not consistently detected with current approaches. Our objective is to develop a multitask ML approach with the primary task focused on fire detection, aided by subtasks related to cloud and smoke aerosol retrievals. All tasks are collectively learned to facilitate situational awareness. We will build upon our prior research in physics-informed and explainable ML for 3D cloud reconstruction and extend it to include both cloud and smoke aerosol retrievals for enhancing active fire detection. In observation-constrained environments, where atmospheric interference increases uncertainty, understanding the rationale behind ML model predictions is essential for reliable fire detection under challenging cloudy and smoky conditions. To ensure physical, spatial, and temporal consistency, we will integrate explainable ML techniques with physics in the proposed multitask transformer architecture, ensuring the transparency of ML models and their consistency with the physics of fire activity and atmospheric radiative transfer. Additionally, we will utilize the attention mechanism to integrate temporal and spatial contextual information, including near-coincident measurements from geostationary (GEO) and low Earth orbit (LEO) satellites, as well as atmosphere profiles from reanalysis data. This will further enhance the accuracy of active fire detection under adverse atmospheric conditions by ensuring spatial and temporal consistency.
The outcomes of this proposed project include: (1) a fire detection ML model developed through the joint learning of vertical cloud and aerosol distributions, (2) a representative fire dataset under cloudy and smoky conditions, and (3) an explainable ML-based approach with broader applications for the retrieval of other surface properties obscured by clouds and aerosols.
Despite having a lower entry TRL, this proposed effort has the potential to be “game-changing” as it tackles critical challenges in wildfire monitoring and tracking under observation-constrained conditions. By integrating physical constraints into ML models and learning from multiple satellite observations and reanalysis data over time, this project bridges the gap between traditional physics-based models and data-driven ML techniques. The project’s emphasis on explainability ensures the transparency of ML model’s predictions and maintains spatiotemporal consistency with physical processes. The project will be carried out by an interdisciplinary team with expertise in ML, spaceborne fire detection, and satellite remote sensing of clouds and aerosols.
—
Creating a NASA Surface Topography and Vegetation (STV) Novel Observing System (NOS)
Andrea Donnellan, Jet Propulsion Laboratory
We propose to develop a novel observing system (NOS) demonstration to enable the implementation of NASA’s Surface Topography and Vegetation (STV) Targeted Observable. This includes developing a scalable architecture and smart tasking strategies, standardizing processing workflows and outputs, and fusing data into higher-level products. Science and application objectives for STV would be best met by baseline and repeated 3-dimensional observations from lidar, radar, and stereoimaging. Observing characteristics vary for each measurement type but the assets should be coordinated. The current STV architecture is one of distributed sensors, or nodes, interconnected by a communications fabric that enables dynamic and intelligent operations. NOS employ flexible multi-source and multi-sensor measurements and for STV will be combined to create standard STV data products from a variety of orbital and sub-orbital assets in an optimal architecture.
Fusing the radar, lidar, and stereo imaging data will result in robust STV data products enabling the separation of bare Earth topography from vegetation density profiles. Each method contributes unique and complementary measurements. Lidar measurements are excellent at determining digital surface model and finding ground returns but can be sparse. Stereoimaging measurements cover a broad area and produce an excellent digital surface model, though canopy height can be underestimated as tree crowns narrow below the image pixel size. Lidar and stereoimaging observe the surface if not obstructed by clouds. Radar measurements provide broad area coverage while penetrating clouds. Multi-frequency radar can further penetrate vegetation and, when combined with multi-baseline radar, can indirectly measure vegetation 3-dimensional structure. Fusion algorithms, data pipelines and processing infrastructure must be developed to leverage the combined strengths of these measurement approaches.
The breadth of these varying measurement needs and requirement to separate different signals of interest are difficult or impossible to address using a single sensor. This can be achieved, however, through the timely and dynamic coordination of multiple existing Earth Science sensors in a distributed web of federated sensors that will constitute an NOS. Fusion of such diverse data products as lidar, multi-look radar, and stereoimagery remains a challenge. For example, measurements from all sensors must be collected simultaneously, or within a target window. The data should also be captured in a manner to optimize the complementary nature of the different sensors.
To address these challenges, we propose to: 1) develop a scalable architecture to produce global baseline topography maps; 2) coordinate an STV team to develop smart tasking/processing strategies to reduce redundancy and maximize coverage for measurement-driven novel observations and event-driven responses; 3) standardize processing workflows and outputs for all sensors in the sensor web; and 4) fuse the collected data into higher-level products that will be bound together though NOS software.
Numerous land surface and ecosystems processes would be better understood through observations of STV. Characterizing the complex interactions of these processes will, in turn, help society manage existing resources, and respond to environmental and ecological changes. A baseline STV map will allow scientists to understand the history of processes written in the landscape and understand future changes. Because of the dynamic and ever-changing nature of these processes, repeat measurements of STV are necessary at a variety of temporal and spatial scales, ranging from sub-daily to decadal, and sub-centimeter to multi-kilometer. The 2018 Earth Science Decadal Survey identified STV for maturation into an observing system architecture.
—
Pix4DCloud: A Suite of Physics-Constrained Transformer Models to Retrieve 4D Clouds in Real World and Digital Twins
Jie Gong, NASA Goddard Space Flight Center
Atmospheric clouds exhibit both vertical and horizontal structures. The 3D structure of clouds exerts over-arching impacts on the top-of-atmosphere (TOA) radiation budget and surface precipitation characteristics. Moreover, the vertical cloud structure also impacts quality of downstream tasks (e.g., aerosol, fire or ocean color retrievals). It is however extremely challenging to retrieve cloud vertical structures from spaceborne wide-swath passive sensors using physics-based models because of the high computational cost and large uncertainties involved in the radiative transfer calculation. On the modeling side, clouds usually form as an “ad-hoc” process depending on fixed thresholds calculated from atmospheric fields, which does not necessarily represent the weather-dependent physical processes in nature.
Many recent works, including ours, explored a variety of machine learning (ML) approaches to predict cloud vertical structures using passive sensor measurements trained on “truths” from active sensors. These works still employ traditional ML methods for each individual instrument, while temporal contextual information from adjacent overpasses from multiple similar instruments are not utilized thus far.
This proposal aims to introduce and evaluate the transformer neural network architecture (often used in “foundation models” as the core for the large language models) to overcome the common hurdle when applying the ML/AI to satellite data: cross-time and cross-mission knowledge transfer. We target at both improving the observational 3D cloud retrievals and creating an ESDT subcomponent cloud generator. Through creating and utilizing the multi-timestep, multi-spectral and multi-instrument pre-trained all-sky radiance foundation models, we will for the first time quantify (1) the merits of using a foundation model to improve the 3D cloud mask and type retrieval from Advanced Baseline Imager (ABI) by leveraging available temporal and spatial contextual information; (2) the advantage of using CloudSat radar observations and ABI radiance-based foundation model to accurately represent 3D cloud fields in global models. Moreover, as deep convective systems (DCs) are the prominent source of extreme precipitation events, we will (3) generate A 3D Vertical Object-oriented deep Convective systems And their DevelOpment stages model (A3DVOCADO) that auto-identifies and predicts the development stages and lifespan of the DCs from the 3D cloud fields in both retrieved and DT-clouds, and we will (4) investigate whether the DT-clouds respond to inter-annual variabilities the same way clouds in nature do. “Pix4Dcloud” as called employs temporal information to predict 3D clouds. This project enters with TRL 3 and is expected to exit at TRL 4 or 5, which fits the AET category.
Being one of the pioneer efforts at introducing the transformer/foundation model to NASA atmospheric science applications and the first comprehensive evaluation of its merits against both physics and traditional ML models, this proposal perfectly fits the core scope of the AIST program for “novel computer science technologies expected to be needed by the Earth Science Division in the 5-10-year timeframe”. It specifically responds to both O3 and O2 as PIX4DCloud contains a suite of ML models to allow flexibilities in generating 3D cloud structures or further detecting and tagging DC development stage in observations or model simulations. DCs not only fall into one of the 5 NOS categories (water cycle), but are also one of the 8 objectives of the upcoming decadal survey AOS mission. Therefore, outcomes from this AIST project will bring broader benefits to NASA’s future earth science missions by potentially transforming the retrieval and flying strategy for NASA’s next generation instruments. Last but not the least, this project will deliver a foundation model to the public that can bring direct benefits to a wide variety of atmospheric science applications.
—
An Innovative Sunlight Denoising Technique To Improve Measurement Quality and Reduce Cost of Future Spaceborne Lidars
Yongxiang Hu, NASA Langley Research Center
Spaceborne lidar is a central part of the Earth science observing system. In the 2017 Decadal Survey, spaceborne lidar was emphasized to address numerous science questions in topical panels such as climate variability and change, weather, air quality, and marine and terrestrial ecosystems. Lidar systems were suggested in more than half of the Decadal Survey Designated/Explorer/IIP missions. The goal of this project is to develop a new approach to space lidar data analyses across disciplines, for maintaining NASA’s position as an innovation leader in Earth System science and technology.
While spaceborne lidar systems provide unique scientific information about the weather and climate system of the Earth, NASA faces significant challenges executing these lidar missions because most lidar concepts are expensive to achieve required Signal-Noise-Ratios (SNRs) in daytime. It is well known that sunlight is a major source of noise in space lidar measurements and thus, raising the signal strength is a major cost driver of space lidar missions. In low Earth orbit, the chance of a reflected photon reaching the telescope of a space lidar is around 10^(-12). It’s very difficult for space lidars to achieve the required signal-to-noise-ratios (SNRs). For example, Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations mission (CALIOP/CALIPSO) had difficulty detecting a large fraction of aerosol layers during daytime due to sunlight noise. To improve SNR requires the use of high-power lasers and large telescopes, leading to quite large payloads and thus exceedingly high cost. So far, spatial averaging has been the primary denoising method in space lidar instrument designs and operational data analysis. If an advanced sunlight denoising technique can be developed, the cost of space lidar missions will be much reduced and more science data can be derived accurately.
We propose to develop an innovative quantum-computing technique to subtract the sunlight noise in the space Lidar data, thereby improving the quality of science data products. In our proposal, we translate the sunlight denoising problem into a constrained minimization problem that fits the computational architecture of quantum annealing / optimization, based on the fact that (i) sunlight noise is spatially incoherent while backscatter signals (and their shot noise) from the atmosphere and the ocean are spatially coherent; and (ii) sunlight noise follows Poisson distribution, of which the mean equals the variance, and is measured between two laser shots. Here we propose three different quantum methods to subtract sunlight noises from lidar images by solving constrained minimization problems using the latest Dirac-3 quantum computer developed by Quantum Computing Inc (QCI). These problems are extremely time-consuming (if not impossible) for conventional computers to solve, but can be solved efficiently on quantum machines. The objective of this study includes:
1. Optimizing and testing an innovative quantum-computing data analysis technique to subtract sunlight noise in space lidar data;
2. Comparing this innovative approach to a deep-learning autoencoder method for select Cloud Aerosol Transport System and ICESat-2 lidar measurements;
3. Applying the technique to improve the quality of CALIPSO daytime aerosol observations;
4. Evaluating its advantage over classical computer in both noise reduction performance and computational time in improving the quality of CALIPSO observations;
5. Evaluating the impact of this innovative quantum computing sunlight denoising technique on the potential of cost reduction in spaceborne lidar missions in the future.
Implication to Earth Science: The success of this project will potentially lead to significant improvement in SNR of lidar data collected in past spaceborne missions, as well as significant reduction in the cost of future lidar missions.
—
A Software Tool for Probabilistic Calibration and Data Assimilation for Geophysical Models
Matthias Katzfuss, University of Wisconsin, Madison
We propose to develop a new software toolbox for probabilistic model-data fusion. Calibration of model parameters and inferring geophysical states via data assimilation are fundamental components of Earth System Digital Twins (ESDTs) and involve merging noisy and heterogeneous data with complex, expensive, and imperfect models. Our software can flexibly handle these challenging tasks while accounting for and quantifying uncertainties and enabling efficient exploration of what-if scenarios. Our proposed software tool is a crucial component in societal decision-making and in numerous NASA applications and missions, including environmental monitoring, calibrating and improving Earth system models, climate forecasting, uncertainty quantification for existing and future missions, and what-if investigations for potential future observing systems.
In a current AIST-21 project, the team has successfully developed a state-of-the-art probabilistic emulator based on Bayesian transport maps (BTMs), which is able to learn the non-Gaussian distribution of spatio-temporal fields from a small number of model nature runs. We propose to build on this work by employing the BTM emulator in the context of offline and online model-data fusion. Specifically, the proposed project consists of three aims. One, we will develop a tool for distributional offline calibration of geophysical models, which requires very few runs of the expensive geophysical model, accounts for uncertainty, and ensures that even fine-scale and nonlinear model behavior is properly aligned with observations. In addition to finding optimal values for the calibration parameters, we will provide methods for obtaining a posterior distribution that quantifies parameter uncertainty, which can then be accounted for in what-if investigations. Two, we will create software for ensemble-based data assimilation that is capable of online updating of the state of the system of interest using real-time data. Our approach can take into account uncertainty and non-Gaussianity in the forecast distribution, leading to optimal nonlinear uncertainty-aware updates of a potentially high-dimensional state, which improves over state-of-the-art data-assimilation methods. The approach can also be extended to online calibration of the geophysical forward model in the presence of unknown parameters. Three, we will demonstrate the use of the toolbox in a modeling and assimilation system for monitoring river discharge. This framework enables monitoring of river systems through the combination of a geophysical model and ingestion of multiple complementary datasets, including river discharge products from NASA’s Surface Water and Ocean Topography (SWOT) mission. We will carry out what-if investigations and a calibration effort that incorporates observational and model uncertainty.
The proposed project addresses the advanced and emerging technology (AET) topic area in the AIST-23 solicitation. It will provide crucial technologies for use in ESDT. The project will provide tools to assimilate real-time observations, contribute to the integration of uncertainty quantification in important components of ESDTs, offer the capability to investigate many hypothetical scenarios, and comprise statistical methodologies (including surrogate models) that optimize the computational efficiency of what-if investigations. The project includes a water-cycle use case, and it involves significant transdisciplinary expertise from all relevant domains, as requested.
—
Mapping Anthropogenic Water Cycle Impacts in a Future Climate: A Global Digital Twin for Scenario-Driven Exploration
Sujay Kumar, NASA Goddard Space Flight Center
Accurate representation of the hydrological cycle, especially in the context of climate change and human interventions (anthropogenic stressors), is crucial for ensuring robust quantitative assessments of water availability, climate risk, and effective management. The intricacies of non-linear processes inherent in the hydrological cycle are further compounded by the dynamic nature of climate processes and anthropogenic influences, governing the risks associated with water availability and extremes (such as floods and droughts) across diverse landscapes. Integrating climate, land, and anthropogenic-related processes is essential to creating a more realistic depiction of the regional to global hydrological cycle, enabling accurate quantification of water risks in critical hotspot regions. The current suite of physical models is highly deficient in representing such management impacts and is computationally expensive to deploy over large spatio-temporal scales. Consequently, assessment reports from major climate authorities do not adequately include realistic representations of the changes in water use stemming from human interventions.
Leveraging remote sensing data of the water cycle through data assimilation methods has been proven to be an effective approach for characterization of human management processes. However, those modeling and data assimilation systems are only effective for characterizing changes and impacts during the historical record. The proposed work will focus on incorporating and extending this knowledge for future scenario development through the use of a suite of machine learning-based water cycle digital twin models at a global scale. We plan to build deep-learning-based Digital Twin (DT) models, leveraging land reanalysis, remote sensing, and climate projection datasets. The land reanalysis is developed based on the most comprehensive inclusion of available land remote sensing datasets using the NASA Land Information System (LIS). Thus, the proposed work will capitalize on the advancements in physical models, data assimilation, machine (deep) learning concepts, parallel computing, and transfer learning, along with the utilization of DT technology, enabling scenario development of water availability risks and predictions of extreme events. Our proposed DT technology-driven platform is designed to deliver a comprehensive and robust quantitative assessment of complex hydrological cycle systems, addressing the key limitation of unobserved anthropogenic impacts in traditional hydrologic model simulations.
The proposed work is relevant to the Earth Science Digital Twin (ESDT) theme of the solicitation under the Early-Stage technology category. The deep learning models developed through the proposed work will be able to provide characterization of the historical record as well as projections and what-if scenarios, with a particular focus on extending the inferences on human activities for future predictions. The proposed work is timely, addressing the urgent need for reliable scenarios incorporating anthropogenic impacts. This is crucial for analyzing the exacerbation of climate change on the water cycle and projecting water availability and climate extremes. This is also crucial to realizing NASA’s ‘Advancing the Climate Strategy (2023)’ Climate Action plan. By delivering scenarios of terrestrial water cycle availability that incorporate explicit scenarios of human water use, the proposed work will provide capabilities that enable planning, adaptation, and mitigation strategies in the face of climate change impacts.
—
Earth System Digital Twin for Central Africa Carbon and Biodiversity Corridors
Seungwon Lee, Jet Propulsion Laboratory
We will design and build an Earth System Digital Twin (ESDT) to represent and forecast the dynamics in carbon storage and habitat connectivity across Central Africa. The proposed ESDT will interconnect models designed to understand and forecast the effects of major threats on Central Africa’s rich ecosystems, such as climate change, land use change, and biodiversity loss. We aim to prioritize conservation strategies effectively, addressing scientific and application challenges in managing carbon and preserving biodiversity while considering socio-economic regional challenges.
Our ESDT intends to:
1. (What Now) Represent the current state of forests structure, intactness and carbon storage using spaceborne and airborne remote sensing and in-situ field measurements.
2. (What Now) Analyze regional drivers of change in forest extent and fragmentation using a record of Earth observations and socio-economic data.
3. (What Now) Model habitat suitability and biodiversity corridors using data products derived in #1 and animal movement tracked from space.
4. (What Next) Forecast forest extent, fragmentation, and carbon storage using #1 and #2.
5. (What Next) Forecast forest structure, intactness, and carbon storage under future climate scenarios using the Ecosystem Demography Model (ED2.2) parametrized by #2 and #3.
6. (What Next) Forecast habitat suitability and biodiversity corridors using models in #3 and data products in #5.
7. (What if) Assess the effects of various conservation strategies and potential land use plans on forest intactness, fragmentation, carbon stocks, and animal movement.
8. (What if) Assess the potential effects of adherence with international frameworks and agreements such as the Paris Climate Agreement, REDD+, the Sustainable Development Goals, and the Global Biodiversity Framework 2030 targets.
Our ESDT will integrate advanced remote sensing data products and algorithms into ecological, landscape, Earth system, socio-economic and animal movement models. The past and current state of forest extent and fragmentation will be assessed by analyzing Landsat-derived data products on forest deforestation, degradation, and recovery, while forecasting will be done by calibrating the TerrSet Land Change Modeler with auxiliary data such as socio-economic, topography and road networks. The current state of forest structure and carbon will be developed by integrating measurements from measurements from field, drone lidar, airborne lidar, spaceborne lidar, and flux tower, while their predictions will be performed using the vegetation demographic ED2.2 model. The current state of habitat suitability and connectivity corridors will be assessed through the lenses of African forest elephants (Loxodonta cyclotis). We will map forest habitats suitable for the species and study the connectivity of such habitats across the entire Congo Rainforest using the Movebank data acquired by GPS tags installed in 53+ elephants, while we will forecast them using the Step Selection Functions and Agent Based Models. Finally, our ESDT will assess the impacts of conservation and management policies such as levels of greenhouse emission, the design of new protected areas, and the prioritization of forest management and land use strategies. For the impact assessment, we will integrate environmental data, socio-economic data and policy scenarios into the Environment-Vulnerability-Decision-Technology (EVDT) framework.
—
Connecting a Broad Community to Earth System Digital Twin Technologies at the Interface of Atmospheric Composition with the Earth System
Randall Martin, Washington University
Atmospheric chemistry models are essential components in the representation of the Earth system through Earth System Digital Twins (ESDTs) to interpret observations and enable predictions for a range of scientific investigations at the interface of atmospheric composition with the Earth system. These models must operate not only online as components of Earth system models (ESMs) but also offline, using meteorological data as input, because the user community relies on the more easily accessible offline version for model development and applications. The GEOS-Chem global 3-D model of atmospheric composition, developed and managed with NASA support, is used offline by hundreds of research groups worldwide with meteorological fields from the NASA Goddard Earth Observing System (GEOS), and the exact same model also operates online as a chemical module within the GEOS ESM at the NASA Global Modeling and Assimilation Office (GMAO). Through partnership with GMAO and with support from AIST, we have recently developed a high-performance configuration of GEOS-Chem (GCHP) to enable the atmospheric chemistry community to conduct global simulations of stratosphere-troposphere oxidant-aerosol chemistry including aerosol microphysics at up to cubed-sphere C720 (~12 km). Interactive regional simulations at even finer scales (order 1 km) are available through WRF-GC which couples the Weather Research and Forecasting (WRF) meteorological model with GEOS-Chem. We have also developed the Integrated Methane Inversion (IMI) as a cloud-based tool built on GEOS-Chem enabling stakeholders to infer carbon fluxes by inversion of satellite methane and CO2 data. Here we solicit support from AIST to connect the Earth science community to ESDT technologies at the interface of atmospheric composition with the Earth system to enable seamless integration of cloud- and high-end computing into what-if investigations, to accelerate and advance multi-scale capabilities, to expand multi-discipline capabilities with dynamic atmospheric chemistry-aerosol microphysics-meteorology coupling, and to extend a community tool for inversion of greenhouse gas satellite data to GCHP. Specifically, we propose to:
- Enable seamless integration of cloud- and high-end computing into GCHP what-if investigations by including NASA technologies for module coupling and data file formats, and extension to the Google Cloud.
- Accelerate and advance multi-scale capabilities using the GCHP stretched grid accelerated with operational GEOS-CF chemical boundary conditions and advanced by a GEOS analysis at 3 km resolution over the TEMPO field-of-regard.
- Expand multi-discipline, multi-scale capabilities through integration of GEOS-Chem chemistry and aerosol microphysics modules into the NASA-Unified WRF (NU-WRF) regional ESM and assimilation system for regional two-way coupling with clouds, precipitation, and land processes.
- Enable use of GCHP for global high-resolution inversions of greenhouse gas satellite data with the IMI software tool.
Our proposed work will promote scientific discovery by increasing accessibility and usability of climate and Earth science information, thus engaging more users of NASA data and research, and delivering more applied Earth science. It will support the new geostationary satellite constellation for air quality, including NASA TEMPO. It will enable investigation of numerous multi-discipline analytic concepts such as air quality and climate impacts due to fires; effects of climate change on atmospheric composition and in turn on human and ecosystem health; and effects of atmospheric composition on meteorology, climate, and solar power generation. It will increase the value of satellite data for stakeholders to quantify carbon fluxes. Our proposed developments address several priorities of NASA’s Climate Strategy including 1.1, 2.1, 2.2, and 3.1.
—
Machine-Learning to Improve Cycling and Forecasts with GEOS and Expedite the Evaluation of Assimilating Observations from New Instruments
Romit Maulik, Pennsylvania State University
Primary Objective: This is a two-tiered effort to use machine learning (ML) surrogate models to: (i) accelerate integration of the GEOS Atmospheric Data Assimilation System (ADAS) to facilitate studying the potential improvement and impact resulting from the addition of new instruments to GEOS ADAS; and a demonstration of capability by (ii) generating forecasts of sea-surface-temperature and sea-ice to improve short- to mid-range (10-day) forecasts with GEOS.
Motivation: Machine-Learning (ML) surrogate models for dynamical systems promise dramatic gains in predictive capabilities for various science applications. Surrogate models provide an avenue for constructing Earth Systems Digital Twins (ESDT) envisioned in the AIST solicitation. More specifically, these AI-based models have become central to so-called ‘outer-loop’ applications, when the nonlinear and linearized (or perturbation) models, used in data assimilation (DA) procedures, are replaced with surrogate models and result in a multiple-times faster DA. Such faster DA algorithms can be used to expedite improvements on the DA procedures themselves, including the assimilation of new observations.
Objectives: The main objective of this proposal is to demonstrate how the use of surrogate models can: (i) help expedite the assessment of contribution from new observations to the GEOS assimilation system, and (ii) to provide an illustration for how the GEOS near-real-time forecasts can be improved by using AI predictions of ocean boundary conditions, namely, SST and sea-ice. These areas are of direct interest to NASA Earth Science Missions and its future Earth System Observatory.
Approach: We propose to adapt to GEOS the existing ERA5-based surrogate model developed by the PI and some of the Co-Is. This project will rely on (1) the development of data-driven surrogate models for both the SST/sea-ice and GEOS and (2) the investigation of how the GEOS surrogate may be used for accelerated DA specifically with a focus on the addition of new instruments to ADAS. The AI architecture we wish to leverage is the vision transformer neural network, which has proven superior for learning complex dynamical systems, with particularly impressive results for atmospheric variable forecasting. Once the GEOS surrogate model is trained, our goal to use it to accelerate the GEOS ensemble DA system underlying the hybrid 4D Ensemble DA (4DEnVAR) algorithm used in the GMAO near-real time DA system.
Expectations: The main effort of this proposal is expected to facilitate the assessment and evaluation of the benefits of introducing new observations in the GEOS data assimilation system. The series of experiments routinely employed by researchers when adding new observations to the DA system and validating the quality of the resulting products and forecasts will be dramatically accelerated with use of the GEOS surrogate model introduced by this proposal. This expediency will add to the agile procedures already in place in the Joint Effort for Data assimilation Integration (JEDI) in association with incorporation of new observing types in the unified observation operator. The second of these efforts will serve to demonstrate the ability of surrogate models to be used for direct system improvement. This is expected to bring an improvement in the skill of GEOS forecasts. Once the GEOS surrogate is trained it is conceivable to use it to investigate rapid evaluation of model gradients via automatic differentiation and thus consider improvements in the DA to bring it from Hybrid 4DEnVar to Hybrid 4D Variational DA (4DVar).
Relevance: The proposed work responds to the AIST objectives by providing a prototype for enabling agile science analysis that utilizes diverse observations using an advanced ML tool. Our demonstration will directly improve near-real-time GEOS forward processing system.
—
Optimal Estimation with a Generative AI Prior for Remote Sensing Retrievals and Observing System Design
Adam Milstein, Massachusetts Institute of Technology/Lincoln Lab
In recent years, Earth observation missions have evolved from large, expensive spacecraft which are infrequently launched to include constellations of distributed, often small spacecraft. Future missions are expected to incorporate onboard intelligence to optimize resources and science value. Such missions must be designed to collect observations efficiently to maximize science, both at the mission concept stage and in dynamic scenarios where observations are reconfigured. Scientific interpretation of Earth remote sensing data requires a retrieval algorithm for estimating the scientific quantity of interest — for example, temperature or moisture content – from the instrument radiances. To enable these missions, accurate and explainable approaches for performing retrievals are needed which can fully exploit these datasets, as well as quantify uncertainty and information content of the observing system.
Currently, a widely-accepted physical retrieval approach is optimal estimation (OE). OE is based on Bayes’ theorem, and computes an estimate of an unknown geophysical state (which may consist of 2D or 3D imagery or 1D profiles) from a set of observations and a model of the sensing physics. The prior model and observation errors are assumed to be Gaussian. OE is attractive in principle because it promises explainability: Assuming Gaussian models are correct, the uncertainty of the computed state estimate is readily computed, and the contributions to this uncertainty readily identified. The OE formalism also provides for computing the information content of the measurement system being modeled, allowing the observations to be optimized. However, OE has key drawbacks. The geophysical variable, in general, has complicated joint dependencies and is not Gaussian at all. As a result, the OE retrieval and its uncertainty estimates are not guaranteed to be accurate.
Another widely used retrieval approach is to use a neural network (NN) as a regression. These NNs are trained using supervised learning in which an ensemble of observations is paired with a collocated dataset assumed to be close to the “truth” state. These NNs are often more accurate than OE due to their ability to learn complex, nonlinear and non-Gaussian statistics empirically that may otherwise be very difficult to model. However, a key drawback of regression NNs is the lack of explainability in the result. NN regression techniques typically cannot directly identify what the source of underlying uncertainty is.
We will develop a new approach that offers the best of both worlds, with explainable solutions, accurate uncertainty quantification, and high-quality imagery which maximizes the use of available information: OE with a generative AI prior model. A generative AI model in the form of a deep neural network generator function is trained which can create random samples of the desired state variable from a Gaussian input vector – a “latent vector” – using a learned representation of the underlying joint statistics. We then formulate OE to estimate the Gaussian latent vector as the unknown state rather than the geophysical variable of interest directly, a change of variables. Because both the observation error and the latent vector are truly Gaussian and rely on accurate joint statistics learned from real imagery, this approach is far more accurate than with traditional OE, while retaining its key advantages: clear uncertainty quantification and information content analysis.
Over the course of an 18-month project, we will demonstrate the proposed methodology for retrievals and NOS system design for microwave sounding of temperature and humidity in the atmosphere to support weather, climate, and planetary boundary layer phenomenology. This work is responsive as it is Early Stage Technology relating to physics-informed AI and uncertainty quantification. The technique is currently at an overall TRL 2 and will be TRL4 at conclusion.
—
A Novel Observation Strategy with Heterogeneous and Dynamic Microwave-Sensing (NOS-HDM) in Response to STV
Mahta Moghaddam, University of Southern California
This proposal is focused on the Novel Observing Strategies (NOS) AIST program thrust and responds to Sub-element (3) of the current AIST solicitation, namely, “Demonstrations and Prototypes (D&P).” We propose to develop, demonstrate, and prototype advanced information system technologies that support the NASA NOS goal for autonomous, science-driven observations of the Earth system via coordinated measurements of multiple sensing nodes. As required by the program, we will demonstrate the entire autonomous operational end-to-end concept in the NOS framework. Our science use cases, in support of the NASA Surface Topography and Vegetation (STV) mission, will include monsoon-season near-surface hydrology and freeze-thaw transitions in northern-high latitudes including permafrost active layer soils. Both of these cases need rapid readjustment of observation strategy for on-demand deployment of observational systems. We will consider up to four sensor node types for the demonstration and prototyping of the proposed system: (1) satellite observations from NISAR and Spire Global, (2) airborne flexible-configuration synthetic aperture radar (FlexDSAR) implemented on the next-generation airborne synthetic aperture radar (AIRSAR-NG) at L- and P-bands, (3) uncrewed aerial vehicle software-defined radar (UAV-SDRadar) observations for on-demand field deployment, and (4) in-situ wireless sensor network (WSN) nodes for soil moisture and temperature. The NOS scenario would be that the satellite and/or in-situ sensors would detect an emerging event in need of immediate high spatial and temporal resolution observations (as in the onset of freeze/thaw), and would autonomously determine whether an airborne asset carrying an appropriate sensor (such as the AIRSAR-NG L+P band radar or an SDRadar) needs to be deployed and/or spaceborne constellations such as Spire GNSS-R satellites need to be tasked to provide the needed spatiotemporally dense observations. These autonomously generated datasets will be used to operationally respond to the emerging event or to create the requisite analysis for science purposes. As such, the targeted outcomes will be both the closed-loop autonomously controlled multi-node prototype observation system and a set of demonstration data sets that provide proof-of-concept for the proposed prototype.
This project will be requesting Level-2 support for NOS-Testbed (NOS-T) Full Demonstration. This support “comprises multiple components that dynamically exchange information during an end-to-end scenario.”
The proposed project utilizes a comprehensive suite of observation nodes. More specifically, the proposed concept includes a large-class, NASA flagship satellite (NISAR), a commercial satellite constellation (Spire) to task autonomously in response to the NOS science use case objectives, flexible architecture distributed (crewed) airborne SAR (FlexDSAR), low-cost uncrewed airborne software-defined radar, and ground-based wireless in-situ sensor networks (WSNs, such as SoilSCAPE).
The technology elements developed and demonstrated by NOS-HDM will address elements of the “NASA — Advancing NASA’s Climate Strategy” especially Priority-1 items 1.1 and 1.2. This is achieved by developing novel multi-modal observation strategies and modeling tools to better understand Earth and its climate while leveraging emerging and existing space technology and remote sensing platforms.
—
A Forecasting Scheme for Accelerated Harmful Algal Bloom Monitoring (FASTHAB)
Nima Pahlevan, Science Systems And Applications, Inc.
About half of the world’s population lives within ~ 200 kilometers of coastlines near oceans or fresh waters. These coastal zones and their ecosystem services massively contribute to the well-being and economy of the immediate residents. The changing climate that bears extreme weather patterns (e.g., heatwaves, extended wet/dry periods), along with human developments (e.g., urbanization, intensified agriculture/aquaculture), pose significant risks to human settlements and coastal environments. One cascading effect of climate change and anthropogenic activities is the increased frequency, intensity, and extent of harmful algal blooms (HABs) in coastal oceans, estuaries, and freshwater ecosystems. Forecasting these HAB characteristics is crucial for effective resource management and decision-making.
A reliable and advanced early warning system for water quality (WQ) conditions and HABs is a desired functionality of any forecasting framework regardless of its underlying mechanisms/observations. For instance, existing coupled hydrodynamic-biogeochemical models (i.e., process-based models; PBMs) are complex, computationally demanding, and require extensive calibration/tuning; hence, they are not commonly utilized for operational short-term (monthly) forecasting. On the other hand, satellite-based prediction tools are hampered by cloud coverage, sometimes leaving large swaths of an ecosystem unmeasured for extended periods and returning poor forecasting skills.
By leveraging NASA’s invaluable historical ocean color (OC) products, simulated PBM outputs (e.g., salinity), and physical forcing data (e.g., wind speed, air temperature), we propose to develop a fast and efficient machine learning (ML) scheme for predicting WQ variables and potential HAB events. These WQ variables include chlorophyll-a (Chla), Total Suspended Solids (TSS), Secchi disk depth (Zsd), temperature (T), and salinity (S), all of which are essential indicators and drivers of HABs. Our spatiotemporal core ML model (i.e., Convolutional Long Short-Term Memory) that uses the Monte Carlo Dropout technique will enable the forecasting of WQ conditions and their associated uncertainties 1-30 days in advance. We will initially train our scheme (FASTHAB) with data in the 2000-2021 timespan and plan to thoroughly validate its forecasting skill in the 2022-2024 timeframe using in situ and/or OC-derived WQ data for any chosen 30-day period, for which only physical forcing data is required/used for prediction.
Our forecasting scheme will be prototyped for the Chesapeake Bay ecosystem — the largest estuary in North America — for two reasons: a) its significant regional/national socioeconomic importance because of its fisheries, aquaculture, and recreation/tourism, and b) its long-term history of in situ WQ monitoring program rendering it a suitable testbed for research and developments. We will ultimately integrate FASTHAB into an existing (AIST-funded) Artificial intelligence (AI) framework to ensure its future use in compliance with Earth System Digital Twins (ESDT).
Our team, composed of remote sensing experts, computer scientists, modelers, HAB ecologists, and aquaculture specialists, will address the challenges of this Early-Stage Technology (EST) project to advance the nation’s WQ and HAB forecasting skills by incorporating past, current, and future OC observations and products. Our readily generalizable and scalable FASTHAB scheme will allow for early identifications of hotspots across the Chesapeake Bay ecosystem in support of resource management and decision-making, enabling apt mitigation actions essential to public health, ecosystem recovery, fisheries, and aquaculture operations.
—
Pilot Deployment of TERRAHydro: A Framework, Demonstration, and Vision for Earth System Digital Twins
Craig Pelissier, Science Systems And Applications, Inc.
Our project has two intertwined goals in its pursuit of developing the next generation of Earth System Digital Twins (ESDT):
- Development of the Coupled Reusable Earth System Tensor (CREST) Framework: This AI-first framework provides the infrastructure for constructing, operating, and deploying large community-developed federated ESDTs. It takes a hierarchical graph-based approach to building Earth System Models (ESMs) that enables integration of traditional and AI-based models within the ESDT. Utilizing a tensor-based software (TBS) backend, CREST enables high-performance computing and seamless AI integration, serving as a comprehensive middleware for building and deploying ESDTs.
- Implementation of the Terrestrial Environmental Rapid-Replicating and Assimilation Hydrometeorological (TERRAHydro) AI-based Land Surface Digital Twin: Leveraging CREST, TERRAHydro combines the latest in data-driven hydrology to create an advanced land, vegetation, and water digital twin. Its capabilities include performing rapid recalibration, counterfactuals, and extensive scenario analysis, thereby addressing critical What-Now, What-Next, and What-If questions, and showcasing the value of the proposed technology.
The currently funded grant (21-AIST21-0003) will produce infrastructure for building, training, and testing the TERRAHydro Earth Systems Model or digital replica, as well as forecasting and data assimilation capabilities. In addition, a final third-year full-scale demonstration will be given. To achieve this, a variety of CREST infrastructure is being produced under the current grant such as data management, model specification and building, inference, and archiving. However, a significant amount of deployment and interoperability capabilities for targeting operational settings still needs to be developed. The thrust of this proposal and award is to develop deployment and interoperability capabilities, culminating in the deployment of the TERRAHydro pilot system on the NASA Science Managed Cloud Environment (SMCE). This includes a control and monitoring system, impact and assessment capabilities, and a web-based front-end accessible through a user portal.
Interoperability — or the ability to integrate a larger class of models — is critical for federation and wider adoption of an ESDT framework. To address this, current interoperability in CREST will be extended to include AI models written in different tensor-based languages, as well as traditional models (e.g., Fortran-based). For the former, we will expand the current infrastructure to allow seamless integration of models written in TensorFlow, PyTorch, and JAX by leveraging existing technology (e.g., Open Neural Network Exchange) to automatically detect and translate models appropriately. For the latter, we will explore two options, with the success and viability being demonstrated using TERRAHydro:
- Re-writing the model in JAX: This comes at the cost of rewriting models but provides auto-differentiation and seamless integration within the current framework. As a low-risk approach to interoperability, it offers a demonstration of the benefits that come with tightly coupling process-based and data-driven models — but requires higher effort on the part of end users.
- Black-box gradient estimates: This will investigate approaches to gradient estimates of black boxes, and test their overall effectiveness. While higher risk, if successful this approach will require virtually no integration efforts from end users, but will come at the cost of an approximate gradient and differing performance profiles than what can be achieved from a full model rewrite.
Our vision for CREST is to allow the creation of always-online systems continuously ingesting data and updating its state, and accessible through a web-browser and interactive clickable front-end.
—
A Terrestrial Water Budget Digital Twin
Fritz Policelli, NASA Goddard Space Flight Center
The Earth science community has made steady progress over the recent decades in measuring precipitation, evapotranspiration, and terrestrial water storage change (the last thanks to the NASA GRACE mission), and now, with the advent of the NASA SWOT mission, the water budget research community is finally able to take advantage of satellite observation-based measurements of discharge. The input to the terrestrial water budget of a given area of interest is precipitation (P). The outputs are evapotranspiration (evaporation and transpiration, ET), and runoff (Q, equal to river discharge for river basins). The difference between the input and the outputs is water storage change. We propose to bring observation-based projections of these water budget components together in a digital twin for the purposes of visualizing the community’s progress toward balancing the terrestrial water budget for areas of interest and for facilitating the generation of best estimates of the value and uncertainty of the components of the water budget. We further propose to provide best-estimate projections of the water budget components for future conditions of global basins and sub-basins of interest based on a range of climate change scenarios. Projections of water budget components will rely on the World Climate Research Program Coupled Model Intercomparison Project 6 (CMIP6) climate model projections for precipitation and evapotranspiration and an innovative new machine learning model trained on past observations to generate projected mass change and runoff (or river discharge for river basins). We believe this is an opportune time to provide a web-based terrestrial water budget digital twin to greatly facilitate water budget analyses, broaden this research community, and extend its relevance to water basin managers and the public. Our user community will include:
- scientists interested in understanding and contributing to improvements in the state of the science for balancing observation and projection-based water basins budget research,
- scientists interested in observation-based best-estimates of the water budget components for calibration of or assimilation into their models, or evaluating model analyses of such budgets,
- water basin managers interested in the current state of the water budget for their areas of interest and in scenario-based estimates of future conditions, and
- members of the public who have interest in learning the basics of the water cycle and terrestrial water budgets as well as understanding NASA capabilities to monitor and project the components of these budgets.
We believe that our project is well-focused on the “What-Now, What-Next, and What-If” capabilities of a proper Earth System Digital Twin (ESDT), and consistent with NASA AIST Program goals. Additionally, we have a strong interest in developing our ESDT to be compatible and interact with other existing or future digital twins. We consider the Technology Readiness Level (TRL) of the capability/system described in this proposal to be currently at TRL 3, and expect to raise the TRL by 2 levels.
—
Event- and Feature-Based Observing System Design: Quantifying Science and Applications Benefit for Diverse Measurement Combinations
Derek Posselt, Jet Propulsion Laboratory
There is an increasing need for observing systems that are not only global in scope, but also flexibly responsive to emerging events. In addition, new measurements must not only advance the state of science, but also serve the needs of applications end users. Furthermore, any new measurements will be made in the context of an observing system that is ever increasing in its diversity and complexity. The key challenge is to measure the benefits of proposed new measurements in the context of the current and planned global observing system, allowing for a diversity of measurement types (spaceborne, airborne, and ground-based), enabling a focus on feature- and event-based observing, and returning quantitative estimates of benefit to science and applications. At this time, the capability to 1) quantitatively evaluate multi-agent (spaceborne, land-based, and aircraft-based) observing systems does not exist. Neither does 2) the capability to quantitatively evaluate an event-based adaptive observing system for the atmosphere. There also exists 3) no general capability to evaluate observing systems for applications use cases.
In this new AIST project, we will construct a multi-agent object-based observing system simulation experiment (OSSE) framework that:
- considers spaceborne, airborne, and ground-based fixed and adaptive measurements individually and together,
- enables targeted observing of features of interest, and
- quantifies the benefit of new measurements for both science and applications.
It utilizes recent advances in Bayesian inverse problem theory, data fusion, and machine learning. Specifically, Spatial-Temporal Statistical Data Fusion (STDF), a generalization of optimal interpolation (OI), combines information from datasets with diverse sampling and error characteristics to produce an optimal synergistic estimate of one or more geophysical variables. STDF also propagates uncertainties from individual estimates into the merged estimate. Probabilistic measures of differences among distributions (e.g., Kullbach-Liebler divergence) can be used to provide quantitative estimates of the effects of differences in spatial and temporal sampling characteristics. Bayesian inverse (e.g., Optimal Estimation and Markov chain Monte Carlo) methods can be used to map changes in expected observation uncertainty to uncertainty in geophysical variables and thence to the ability to (dis)prove a hypothesis or provide actionable applications information. Random Forest based machine learning can be used to learn the relationships between a change in observing system configuration and a change in geophysical variable error and application utility. We will test our system on observations of the planetary boundary layer (PBL), a specific example from the 2020 Novel Observing Systems workshop. The PBL is home to nearly all of the Earth’s human population and regulates the exchanges of mass, energy, and momentum between the Earth’s surface and the free troposphere.
The result of this project will be a multi-agent feature-based OSSE toolkit that quantifies the benefit of new observations in the context of the existing program of record for both global and regional sampling, is capable of being used for PBL observing system design, and is easily extensible to evaluation of the science and applications benefit of measurements for other observing system use cases. The timing of the end of our two year project will immediately precede the release of the 2027 Earth Science Decadal Survey. It is our hope that our system will prove useful in the initial studies that will result.
—
An Earth System Digital Twin for Wildfire: Predicting Wildfire Progression and Behavior, and Its Downstream Impacts on Air Quality
Mohammad Pourhomayoun, California State University, Los Angeles
This proposal presents an Earth System Digital Twin (ESDT) for Wildfire, delivering an advanced and integrated system designed to enhance the accuracy, efficiency, and real-time responsiveness of wildfire forecasting and management. This wildfire ESDT utilizes a comprehensive set of technologies, including novel AI-based frameworks with near real-time high-resolution predictive models to forecast wildfire spread and progression in the future, as well as its downstream impact on air quality.
Wildfires in North America have become increasingly prevalent and severe, posing significant threats to human health, the environment, and the economy. The smoke generated by wildfires contributes to hazardous air quality, exposing people to harmful pollutants that can exacerbate respiratory conditions and lead to long-term health issues. Beyond human impacts, wildfires have devastating effects on ecosystems, causing habitat destruction, loss of biodiversity, and releasing vast amounts of carbon dioxide into the atmosphere, further exacerbating climate change.
Despite notable progress in wildfire modeling and mitigation technologies in recent years, understanding and predicting wildfire behavior and progression in near real-time, along with its adverse impacts, remains highly challenging and complex.
The proposed Wildfire ESDT includes:
- A unified data platform integrating the latest available information from satellite and in situ observations to visualize the current status of wildfires and air quality (What-Now).
- High-resolution and near real-time AI-based predictive models to forecast active wildfire spread, progression, and trajectory in the future, and its short-term and long-term impacts on air quality (What-Next).
- Impact Assessment models and tools that provide projections and predictions for different response scenarios, actions, and conditions, and let the user explore scenario-based assessments (What-If).
- Low latency and user-friendly visualization models and interactive user-interfaces (including VR/AR) to visualize current and future status of fire and air quality, potential response scenarios (What-Now, What-Next, What-If), and uncertainty quantifications.
Building upon our prior initiatives and successful AIST projects including Predicting What We Breathe (PWWB), Fire Alarm, and Air Quality Analytic Collaborative Framework (AQ-ACF), this project aims to incorporate unique data processing algorithms, novel AI-based predictive models, and state-of-the-art machine learning (ML) algorithms for precise forecasting of active wildfire behavior and progression over time, as well as its air quality impacts.
This project will also address the challenges of integrating large-scale datasets from various sources by establishing a unified system for processing satellite observation, ground-based data, and land information for comprehensive analysis and accurate forecasting. Furthermore, the project places a strong emphasis on user experience by incorporating interactive visualization methods. These methods serve as a dynamic interface, delivering the obtained insights and forecast outcomes to the user, and facilitating a more intuitive and informed decision-making process.
The proposed Wildfire ESDT will significantly support firefighters, emergency responders, and various stakeholders in optimizing resource allocation, setting priorities, and executing targeted responses to wildfires. It also plays a pivotal role in efficiently evacuating individuals to secure locations, ensuring a prompt and coordinated approach to saving lives during wildfire incidents. Additionally, It serves as a vital tool for policymakers and researchers, enhancing their ability to analyze wildfire patterns, providing data-driven insights into wildfire behavior, and facilitating informed decision-making and effective management strategies.
—
Physics-Aware Quantum Neural Network Modeling of Earth Science Phenomena
Eleanor Rieffel, NASA Ames Research Center
The increasing number of extreme weather events has also fueled a desire for accurate, in-time prediction tools. An example of recent research success lies in data-driven models for short-to-mid-term weather forecasting, which have outperformed traditional numerical weather predictions in hundreds of test cases [1, 2]. These results have sparked interest in high-resolution modeling of large-scale, complex phenomena for use in digital twins, such as NVIDIA’s Earth-2 project and NASA’s current Earth System Digital Twin effort. The main driver behind this interest is the potential of learning based methods (e.g., [2]) in solving partial differential equations (PDEs), which form the foundation of many Earth Science focus areas. In comparison to traditional solvers, these are capable of solving problems with more complex geometries and can sample the solution at arbitrary precision. In realistic scenarios where the complex processes may not be completely understood, they have the potential for pattern discovery and predictions of phenomena given partial knowledge and have the advantage of using empirical data to inform their solutions. The learning based methods hold promise but can be hard to optimize and can scale poorly as the problem becomes increasingly complex [3, 4]. On the other hand, quantum computers offer a fundamentally different paradigm of computation and can solve certain classes of problems exponentially faster than their classical counterparts [5–8]. Recent results promise an advantage for certain specialized differential equations in the long term [9–13]. However, much remains unknown about the full potential of quantum algorithms for differential equations in general outside a few initial studies [14–18] with their own promise and roadblocks.
The objective of this proposal is to develop physics aware quantum-classical hybrid neural networks to solve for complex partial differential equations describing multiple interconnected earth systems. We will assess our method on ocean dynamics studies by solving the shallow water, barotropic and baroclinic equations. We will first numerically and analytically analyze and compare the current state of art quantum and classical learning based approaches for solving PDEs. The current architectures can be difficult to train as they can be too expressive and are agnostic to the physical problem. They use physics-based terms only in the loss as a soft constraint which does not guarantee that the PDE is satisfied exactly. We will explore using PDEs to initialize the training parameters and in selecting the training architecture. We will also explore embedding the PDE in the learning ansatz for quantum compatible approaches. This will help alleviate the trainability problem of quantum learning methods and increase their effectiveness in solving complex coupled PDEs. This shall allow us to use quantum and quantum compatible approaches to obtain fast and accurate solutions to solve PDEs describing various Earth-Science systems. Finally we will present a report, publication and release software showcasing our findings.
Our work will provide a proof of principle that quantum technology can be applied to predict complex earth science phenomena. As quantum technology advances, it will offer a roadmap for addressing much more complex problems on a large scale in the future. Applications of this technology will mostly be explored in the context of the fluid mechanics of ocean dynamics. Natural applications include coastal zone digital twins and the interface between ocean currents, land, and atmosphere. Further applications could integrate this technology with ocean carbon process analysis and intermediate- term weather forecasting. Our goal will be to develop our model and analysis in a way that can be interoperable with other Earth system models and extensible to other settings and governing equations.
—
3D-CHESS FO: Autonomous Sensor Web for Inland Water Ecosystem Monitoring
Daniel Selva, Texas A&M Engineering Experiment Station
The goal of the proposed project is to conduct a hardware-in-the-loop validation in a laboratory environment of an autonomous and context-aware network of interconnected space, air and ground sensors for inland water and ecosystems monitoring. Specifically, we will consider three application scenarios: harmful algal blooms in lakes, wetting and drying processes in non-perennial rivers, and extreme storage state fluctuations in rivers and reservoirs.
The technology seeks to significantly increase the level of operational autonomy of Earth Observing Systems in order to drastically improve the spatio-temporal sampling of relatively rare and/or short-lived scientific processes, such as the onset of an algal bloom. This is increasingly important due to climate change, which has made dramatic changes in the state of inland water bodies more frequent and difficult to predict. In addition, the increased autonomy will also improve our response time for natural disasters such as floods. Such autonomous capability can be a cost-effective alternative to the use of numerous in situ or remote sensors for increased sampling.
The sensor web consists of a set of nodes with heterogeneous sensors on space, air, and ground platforms. Each node performs its own default mission but commits to help with any missions of opportunity that appear. Nodes with edge processing capabilities perform on board data processing to detect and/or predict Events Of Interest (e.g., unusually high water level for that location and time of the year). For example, a node (e.g., ground sensor) may be monitoring the water level in a reservoir. When an Event Of Interest is detected, the node generates a Task Request according to the mission specification and sends it to the network. The Task Request may ask for additional observations of the same location with different sensors within a certain decorrelation time. As the other nodes in the network receive the Task Request, they use reasoning techniques to determine if their state and capabilities allow them to help with the Task Request. This is what we call context awareness. If able to help, the nodes exchange limited information with other nodes in the network that are also capable of participating to decide on who should perform the task. Task allocation and planning algorithms use science-driven utility functions to assess and compare the expected scientific value of the possible observations they can perform.
The technical approach combines four key enabling technologies: 1) onboard or edge processing for detection and prediction of Events of Interest, such as a high flow event or a sudden increase in algae; 2) uncertain knowledge graph reasoning to reason about the capabilities of the agents vis-a-vis the mission objectives and Task Requests; 3) decentralized coordination strategies so agents can autonomously decide who does what and when; and 4) decentralized estimation and recalibration so that nodes can collectively estimate the Earth system’s state.
A previous project called 3D-CHESS funded under NASA’s ROSES 2021 AIST solicitation has demonstrated proof of concept (TRL 3) for this technology by using a low fidelity multi-agent simulation in a single computer to validate the feasibility and scientific value of the concept compared to the status quo. In this effort, we seek to increase the TRL of 3D-CHESS to TRL 4 by performing a higher fidelity validation and characterization of the value of the technology using a hardware-in-the-loop distributed simulation with realistic computing and networking constraints. In addition, we will perform a validation of some of the key functionality using a swarm of small quadcopters and a simulated inland water ecosystem. Compared to the AIST21 project, this effort adds several new thrusts, including ground sensors, realistic networking protocols, and an emphasis in predictive models, data assimilation, and decentralized estimation and recalibration.
—
AI Climate Tipping-Point Simulator (ACTS)
Jennifer Sleeman, Johns Hopkins University
As the climate continues to become more unstable, the ability to predict when major shifts, or tipping points, in our climate system may occur is essential. Since their occurrence in climate models is challenged by the fact that they depend on a number of physical processes that are governed by poorly constrained parameters [1,2], predicting these shifts is computationally challenging (if not impossible) using traditional numerical methods. We propose to develop an artificial intelligence (AI) climate tipping point digit twin for climate tipping point and cascade discovery using a deep learning generative approach, to be integrated into the NASA Advanced Information Systems Technology (AIST) Earth Systems Digital Twin (ESDT). The core innovations of this project are: (a) the development of large foundational models trained on global circulation models, NASA-generated observations, and data assimilating model output that will enable the dynamic generation of surrogate (reduced) models, (b) a generative adversarial tipping point method, based on our previous work [1,2] built to discover tipping points and cascades across surrogate climate models, (c) a learned causal model that identifies factors with strong influences on tips, and that works across models to enable cascading experiments and intervention exploration, and (d) a neuro-symbolic Large Language Model (LLM) interface, based on our previous work [73], that enables asking “what-if” scientific questions. When combined, these innovations will provide a general machinery for studying the triggers that could lead to climate tipping point occurrences, and further anticipate how tips within one system will impact other tipping points. The capacity to ask “what-if” questions will enable both “what-if” scenarios for individual climate tipping points as well as cascades across climate tipping points, and will further include support for questions on the impact of climate interventions. A key component of this innovation will be learning the causal paths in ways that are explainable, i.e. showing how parameters influence the path that led to a tipping and providing an explainability model related to how conclusions were formed from responses to “what-if” questions. To show the power of the AI simulator, this work will focus on a set of interwoven climate tipping points as exemplars of the general machinery: the Meridional overturning circulation collapse, the Amazon dieback, and the West African monsoon/Sahel rainfall. These have been selected to enable the study of cascading effects, while also demonstrating that the foundation model and surrogate model generator can dynamically enable new discoveries across a variety of Earth systems. The chosen climate tipping points are chosen to be aligned with NASA’s Earth science activities as described in the most recent Decadal Survey [74], and will allow for studies related to cascading behavior across oceans and land. Intervention-related questions, such as – Would increasing carbon storage make a sizable difference in slowing the Amazon deforestation? — could be studied using the proposed AI simulator. This will be a invaluable tool for scientists as more geo-engineering and regional climate interventions are explored in the wild without much regard for consequential effects [75].
—
A Digital Twin Integrating Knowledge and AI for Understanding Carbon and Biodiversity Corridors in Central Africa
Yiqun Xie, University of Maryland, College Park
Home to the world’s second-largest contiguous rainforest, Central Africa is a crucial land carbon reservoir and the major habitat of thousands of endemic species of plants and wildlife. Past decades of land use activities have threatened Central Africa, resulting in the loss of millions of hectares of humid primary forest and fragmented landscapes. Carbon and biodiversity corridors, connecting protected habitats across landscapes, can mitigate the effects of land use and climate change on biodiversity and enhance carbon storage capacity. Identifying and managing these corridors requires scientific understanding and technical tools to assess the current conditions of carbon storage and biodiversity, along with the vulnerability to climate change, land use change, resource exploitation, and wildfires in the future.
This project aims to build a digital twin of carbon and biodiversity corridors in Central Africa by integrating knowledge-based models and AI to enable detailed analysis of the current status and future forecasts at high resolution under a broad spectrum of scenarios. To achieve this, several research gaps and challenges need to be addressed. First, most existing high-resolution forest maps derived from remote sensing products focus on the spatial extent of forests but do not accurately reflect their carbon status due to misalignments between optical signals and height structures, leading to inaccurate carbon estimates. Moreover, despite their relevance, 3D forest structure and connectivity have yet to be considered in existing biodiversity intactness assessments and associated conservation priorities. Second, existing mechanistic ecosystem models consider an extensive range of factors for forecasting quality. However, this leads to high computational costs, significantly constraining the forecasting ability at both high resolution and large spatial extents. Furthermore, the ecosystem models, biodiversity, and connectivity models lack connections with each other, despite their strong linkage. Third, the impact assessment of carbon and biodiversity corridors needs to explicitly consider diverse scenarios, including climate change, land use change, and wildfires. It also needs to explore different management plans (e.g., protected area allocation) in response to different scenarios. Such analyses require a highly efficient simulation and optimization framework.
To bridge the technical gaps, this project aims to make the following advances to enable the digital twin capabilities: (1) We will develop a four-dimensional approach that combines the two-dimensional geographical space with orbital LiDAR measurements (e.g., GEDI, ICESat-2, AfriSAR) and time (i.e., history of disturbance) to enable a high-fidelity reconstruction of the digital replica of forest structure and aboveground carbon stocks. The new information will be incorporated to develop novel layers of 3D forest structural connectivity and enhance the Biodiversity Intactness Index. (2) We will develop a new computational paradigm of ecosystem modeling by creating a high-accuracy AI-accelerated version of the Ecosystem Demography model using deep learning to realize scalable forecasting at high spatial resolution. We will also enhance data assimilation capabilities with knowledge-guide learning. Additionally, we will integrate carbon and biodiversity modeling in a corridor framework to maximize co-benefits for biodiversity and climate mitigation. (3) To enable impact assessment capabilities, we will further enhance the simulation models’ robustness under the wide range of future scenarios including alternate climate assumptions from CMIP6. Furthermore, we will develop an optimization framework that integrates multiple dimensions (e.g., climate, land use change, wildfire) to inform management decisions, including protected area prioritization and allocation.
—
Building Earth System Digital Twins with Deep Generative Models for Improved Simulation of Clouds and Their Feedbacks and Impacts
Tianle Yuan, University of Maryland Baltimore County
Modeling clouds and simulating their feedbacks to climate change remain a grand challenge for global weather and climate models. Cloud systems result from complex interactions of processes from sub-micron to planetary scales, a fact that fundamentally limits our ability to resolve all processes due to prohibitive computing cost. Even more critical and challenging is properly modeling clouds and their changes under climate change, because of their great radiative impact. It is therefore not surprising that cloud feedbacks are the leading factor that determines the range of projected warming. Most current approaches use parameterizations that simplify the representation of modeled clouds, which introduces biases and uncertainties to the host global climate model (GCM) and negatively impacts the quality of GCM and Earth System Digital Twins (ESDTs) simulations.
We propose to develop deep generative models (DGMs) to serve as ESDT components to improve the simulations of clouds and cloud feedbacks to climate change while drastically reducing the computational cost. DGMs take advantage of the expressive power of deep neural networks and can be used to model extremely complex problems in high-dimensional space. They are the technology backbone of the current generative AI revolution and tools such as GPT, Claude, GEMINI, and DALL·E3. Our preliminary work and previous relevant work by others show that DGMs are an ideal fit for modeling challenging tasks such as clouds and cloud feedbacks. They can capture non-linearities in a complex process with local realism and with much cheaper computational cost compared to the current state-of-the-art, global cloud resolving models. Specifically, we will:
- Develop DGMs to model instantaneous cloud fields;
- Develop DGMs to model long-term cloud feedbacks and their dependence on evolving surface warming patterns for various warming scenarios;
- Apply these models for studying climate feedback and impacts, visualization, and communications, and for synthetic data generation.
We will build DGMs using data from GCM output in existing repositories, NASA MERRA-2 reanalysis, and NASA satellite observations such as those from MODIS, DSCOVR EPIC, GOES ABI, and CERES. Fast DGMs that are also accurate in representing cloud feedbacks are highly valuable in reducing uncertainty in climate prediction, designing climate intervention and mitigation strategies, and understanding extreme weather in a changing climate. DGMs that model clouds accurately will improve accessibility of NASA Earth science information, extend data records through synthetic data creation, and increase awareness about NASA data and model capabilities to the broader community through intuitive and accurate visualizations.
The technology we propose to develop is general-purpose and can be readily adopted to new machine learning techniques, new datasets, and new research topics. It provides a novel method to integrate NASA observations, simulation results, and reanalysis data. It will also benefit data fusion and utilization of recent and future missions such as PACE and AOS.
—
Time Series Multi-Modal Foundation Model for Near-Real-Time Land Surface Dynamics Characterization in Support of ESDT
Hankui Zhang, South Dakota State University
The soil moisture and fuel moisture content play critical roles in wildfire prediction, fire behavior simulation, emergency response, and air pollution estimation. However, current remote sensing algorithms for these parameters fall short of meeting the accuracy and near-real-time requirements of Earth System Digital Twins (ESDT) due to insufficient training samples and limited spatial-temporal resolution. Existing algorithms combining multi-modal observations often result in reduced temporal resolution because they can only retrieve moisture or other surface parameters at the multiple sensor contemporaneous acquisition dates. This limitation reduces the usable data significantly since many satellites, particularly multi-modal sensors like Sentinel-1 C-band SAR, Landsat 8/9, Sentinel-2 optical data, and the upcoming NISAR L-band SAR, are not coordinated to acquire data simultaneously.
Deep learning foundation models offer an avenue to overcome the challenge of limited training data and integrate multi-modal data for reliable near-real-time land surface characterization. Building on the success of models like ChatGPT and Masked Autoencoders (MAE), foundation models have become a cornerstone in various fields, including the interpretation of earth observation data. These models are trained through self-supervised tasks and fine-tuned using domain-specific training samples. However, existing foundation models like NASA-IBM Prithvi often view earth observations as individual static images rather than time series data, relying solely on selected cloud-free images while discarding valuable good-quality observations from partially cloud-covered images. Moreover, these models tend to overlook surface seasonal dynamics, such as phenology, which are essential for dynamic soil and fuel moisture retrieval.
This proposal aims to (i) develop a foundation model to fuse time series multi-modal data, including Sentinel-1 C-band SAR, NISAR L-band SAR, and Harmonized Landsat Sentinel-2 (HLS) optical reflectance data; (ii) fine-tune the foundation model for near-real-time mapping of soil moisture and live and dead fuel moisture content at any satellite data acquisition dates; and (iii) interpret the models in terms of multi-modal data fusion efficacy and input predictor importance. The proposed method utilizes a year of multi-modal satellite data to estimate soil moisture and fuel moisture content at any satellite acquisition date to utilize seasonal dynamics information. The proposed model builds on the Transformer model which is known for its excellence in time series modeling (e.g., ChatGPT), and uses a novel cascade Transformer structure to handle the uncoordinated acquisition dates of different satellites. The foundation model will be pre-trained using a masked mechanism which has been shown perform well on HLS data. It will be fine-tuned using publicly available training samples (e.g., International Soil Moisture Network, National Fuel Moisture Database) across conterminous United States, alongside additional data collected by the research team, and evaluated by comparing with existing soil moisture and fuel moisture datasets.
The proposed research is responsive to the AIST Early-Stage Technology by using “Transformers and foundation models” and by seeking to “understand” the models. It could “maximize science mission return” by effectively fusing daily uncoordinated multi-modal satellite acquisitions. It could be a game-changer for accurate and near-real-time soil moisture and fuel moisture content retrieval in support of ESDT by leveraging foundation models. Furthermore, the foundation model is expected to work for other land surface parameters. The proposal has Technology Readiness Level (TRL) entry of 1 and exit of 3.