2014 AIST Projects Awarded

NASA Science Mission Directorate Awards Funding for 24 Projects Under the Advanced Information Systems Technology (AIST) program 2014 ROSES A.41 Solicitation (NNH14ZDA001N-AIST)

12/01/2014 – NASA’s Science Mission Directorate, NASA Headquarters, Washington, DC, has selected proposals for the Advanced Information Systems Technology Program (AIST-14) in support of the Earth Science Division (ESD).  The AIST-14 will provide technologies to reduce the risk and cost of evolving NASA information systems to support future Earth observation and to transform those observations into Earth information.

Through ESD’s Earth Science Technology Office a total of 24 proposals will be awarded over a 2-year period. The total amount of all the awards is roughly $25M.

The Advanced Information Systems Technology (AIST) program sought proposals for technology development activities to enable science measurements, make use of data for research, and facilitate practical applications for societal benefit by directly supporting each of the core functions within ESD: research and analysis, flight, and applied sciences. The objectives of the AIST Program are to identify, develop and demonstrate advanced information system technologies that:

A total of 124 proposals were evaluated, 24 of which have been selected for award. The awards are as follows:

Robert Brakenridge, University Of Colorado, Boulder
Global Flood Risk from Advanced Modeling and Remote Sensing in Collaboration with Google Earth Engine
Petya Campbell, Goddard Space Flight Center
Next Generation UAV Based Spectral Systems for Environmental Monitoring
Aashish Chaudhary, Kitware, Inc.
Prototyping Agile Production, Analytics and Visualization Pipelines for big-data on the NASA Earth Exchange (NEX)
Martyn P. Clark, National Center for Atmospheric Research
Development of computational infrastructure to support hyper-resolution large-ensemble hydrology simulations from local to continental scales
Thomas Clune, Goddard Space Flight Center
DEREChOS: Data Environment for Rapid Exploration and Characterization of Organized Systems
Kamalika Das, Ames Research Center
Uncovering effects of climate variables on global vegetation
Matthew French, University of Southern California Information Sciences Institute
SpaceCubeX: A Hybrid Multi-core CPU/FPGA/DSP Flight Architecture for Next Generation Earth Science Missions
Jonathan Gleason, Langley Research Center
Ontology-based Metadata Portal for Unified Semantics (OlyMPUS)
Milton Halem, University of Maryland Baltimore County
Computational Technologies: Feasibility Studies of Quantum Enabled Annealing Algorithms for Estimating Terrestrial Carbon Fluxes from OCO-2 and the LIS Model
Hook Hua, Jet Propulsion Laboratory
Agile Big Data Analytics of High-Volume Geodetic Data Products for Improving Science and Hazard Response
Thomas Huang, Jet Propulsion Laboratory
OceanXtremes: Oceanographic Data-Intensive Anomaly Detection and Analysis Portal
William Ivancic, Glenn Research Center
Multi-channel Combining for Airborne Flight Research using Standard Protocols
Kristine Larson, University of Colorado
AMIGHO: Automated Metadata Ingest for GNSS Hydrology within OODT
Seungwon Lee, Jet Propulsion Laboratory
Climate Model Diagnostic Analyzer
Jacqueline LeMoigne, Goddard Space Flight Center
Tradespace Analysis Tool for Designing Earth Science Distributed Missions
Mike Lieber, Ball Aerospace
Model Predictive Control Architecture for Optimizing Earth Science Data Collection
Constantine Lukashin, Langley Research Center
NASA Information and Data System (NAIADS) for Earth Science Data Fusion and Analytics
Christian Mattmann, Jet Propulsion Laboratory
SciSpark: Highly Interactive and Scalable Model Evaluation and Climate Metrics for Scientific Data and Analysis
Victor Pankratius, Massachusetts Institute of Technology
Computer-Aided Discovery of Earth Surface Deformation Phenomena
Rahul Ramachandran, Marshall Space Flight Center
Illuminating the Darkness: Exploiting untapped data and information resources in Earth Science
Shawn Smith, Florida State University
A Service to Match Satellite and In-situ Marine Observations to Support Platform Inter-comparisons, Cross-calibration, Validation, and Quality Control
Tomasz Stepinski, University of Cincinnati
Pattern-based GIS for Understanding Content of very large Earth Science Datasets
Wei-Kuo Tao, Goddard Space Flight Center
Empowering Data Management Diagnosis, and Visualization of Cloud-Resolving Models by Cloud Library upon Spark and Hadoop
Chaowel Yang, George Mason University
Mining and Utilizing Dataset Relevancy from Oceanographic Dataset (MUDROD) Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access

Return to Top

 

Robert Brakenridge, University Of Colorado at Boulder
Global Flood Risk from Advanced Modeling and Remote Sensing in Collaboration with Google Earth Engine

As predictive accuracy of the climate response to greenhouse emissions improves, measurements of sea level rise are being coupled with modeling to better understand coastal vulnerability to flooding. Predictions of rising intensity of storm rainfall and larger tropical storms also imply increased inland flooding, and many studies conclude this is already occurring in some regions.

Most rivers experience some flooding each year: the seasonal discharge variation from low to high water can be 2-3 orders of magnitude. The mean annual flood is an important threshold: its level separates land flooded each year from land only affected by large floods. We lack adequate geospatial information on a global basis defining floodplains within the mean annual flood limit and the higher lands still subject to significant risk (e.g. with exceedance probability of greater than 3.3%; the 30 yr. floodplain). This lack of knowledge concerning changing surface water affects many disciplines and remote sensing data sets, where, quite commonly, a static water mask is employed to separate water from land. For example, inland bio-geochemical cycling of C and N is affected by flooding, but floodplain areas are not well constrained.

Measurements and computer models of flood inundation over large areas have been difficult to incorporate because of a scarcity of observations in compatible formats, and a lack of the detailed boundary conditions, in particular floodplain topography, required to run hydrodynamic models. However, the available data now allow such work, and the computational techniques needed to ingest such information are ready for development. Optical and SAR sensing are providing a near-global record of floodplain inundation, and passive microwave radiometry is producing a calibrated record of flood-associated discharge values, 1998-present. Also, global topographic data are of increasingly fine resolution, and techniques have been developed to facilitate their incorporation into modeling. Several of us have already demonstrated the new capability to accurately model and map floodplains on a continent scale using input discharges of various sizes and exceedance probabilities.

Work is needed to accomplish global-scale products, wherein results are extended to all continents, and downscaled to be locally accurate and useful. Floodplain mapping technologies and standards vary greatly among nations (many nations have neither): the planned effort will provide a global flood hazard infrastructure on which detailed local risk assessment can build. Our project brings together an experienced team of modeling, remote sensing, hydrology, and information technology scientists at JPL and the University of Colorado with the Google Earth Engine team to implement and disseminate a Global Floodplains and Flood Risk digital map product. This project addresses major priorities listed in the AIST program: with Google, we would identify, develop, and demonstrate advanced information system technologies that increases the accessibility and utility of NASA science data and enables new information products. The work will address the Core Topic Data-Centric Technologies, including Technologies that provide opportunities for more efficient interoperations with observations data systems, such as high end computing and modeling systems; and Capabilities that advance integrated Earth science missions by enabling discovery and access to Service Oriented Architecture. It will also address the Special Subtopic Technology Enhancements for Applied Sciences Applications in regard to natural disasters, and contribute to the GEOSS architecture for the use of remote sensing products in disaster management and risk assessment.


Return to Top

Petya Campbell, Goddard Space Flight Center
Next Generation UAV Based Spectral Systems for Environmental Monitoring

At present, UAVs used in environmental monitoring mostly collect low spectral resolution imagery, capable of retrieving canopy greenness or properties related water stress. We propose a UAV based capacity for accurate measurement of spectral reflectance at high temporal frequencies and stability to depict diurnal/seasonal cycles in vegetation function. We will test our approaches first using spatially-resolved discrete point measurements characterizing VNIR reflectance and solar-induced fluorescence Y1, followed in Y2 by imaging spectroscopy. The ultimate goal is to produce science-quality spectral data from UAVs suitable for scaling ground measurements and comparison against airborne or satellite sensors. Because of the potential for rapid deployment, spatially explicit data from UAVs can be acquired irrespective of many of the cost, scheduling and logistic limitations to satellite or piloted aircraft missions. Provided that the measurements are suitably calibrated and well characterized, this opens up opportunities for calibration/validation activities not currently available. There is considerable interest in UAVs from the agricultural and forestry industries but there is a need to identify a workflow that yields calibrated comparisons through space and time. This will increase the likelihood that UAVs are economically feasible for applied and basic science, as well as land management. We target the consistent retrieval of calibrated surface reflectance, as well as biological parameters including chlorophyll fluorescence, photosynthetic capacity, nutrient and chlorophyll content, specific leaf area and leaf area index- all important to vegetation monitoring and yield. Scientifically, deployment of UAV sensors at sites such as flux towers will facilitate more frequent (e.g. within-day) and spatially comprehensive assessment of the vegetation physiology and function within tower footprints than is possible by foot, from sensors fixed to the tower, or irregular aircraft missions. We propose a rapid data assimilation and delivery system based on past SensorWeb efforts to move calibrated reflectance data and derived retrievals directly from the UAV to users. We will utilize SensorWeb functionalities to strategically run a data gathering campaign to optimize data yield. As well, we propose a mission deployment system to optimize flight paths based on real-time in-flight data processing to enable effective data collection strategies. All spectral data will also be uploaded to NASA’s in-development EcoSIS online spectral library, and we will employ a cloud system to manage the intermediate products. Ultimately, we will demonstrate the acquisition of science-grade spectral measurements from UAVs to advance the use of UAVs in remote sensing beyond current state of application, providing measurements of a quality comparable to those from handheld instruments or well-calibrated air- and spaceborne systems.  A key benefit is that UAV collections at 10-150m altitude bridge the gap between ground/proximal measurements and airborne measurements typically acquired at 500m and higher, allowing better linkage of comparable measurements across the full range of scales from the ground to satellites. This proposal is directly responsive to the AIST NRA in that it: bridges the gap in Earth observation between field and airborne measurements; and reduces risk to NASA through development of methods to make well-characterized measurements from UAVs for integration, calibration and validation of NASA satellite and airborne data, makes use of a data delivery system in which measurements and derived products are rapidly distributed to users, and provides spatially explicit data of calibrated reflectance and vegetation traits new temporal and spatial scales not currently available. We submit in the Core Topic area “Operations Technologies” and are applicable to the “Ecological Forecasting” subtopic. We will enter at TRL 3 and exit at TRL 5.


Return to Top

Aashish Chaudhary, Kitware, Inc.
Prototyping Agile Production, Analytics and Visualization Pipelines for big-data on the NASA Earth Exchange (NEX)

The goal of this project is to develop capabilities for an integrated petabyte-scale Earth science product development, production and collaborative analysis environment. We will deploy this environment within the NASA Earth Exchange (NEX) and OpenNEX in order to enhance existing science data production pipelines in both high-performance computing (HPC) and cloud environments. Bridging of HPC and cloud is a fairly new concept under active research. This system will significantly enhance the ability of the scientific community to accelerate transformation of Earth science observational data from NASA missions, model outputs and other sources into science data products and facilitate collaborative analysis of the results. We propose to develop a web-based system that seamlessly interfaces with both high-performance computing (HPC) and cloud environments, providing tools that enable science teams to develop and deploy large-scale data processing pipelines, perform data visualization, provenance tracking, analysis and QA of both the production process and the data products, and enable sharing results with the community. In terms of the NRA, the project is proposed under “Data-Centric Technologies” category. The HPC component will interface with the NASA Earth Exchange (NEX), a collaboration platform for the Earth science community that provides a mechanism for scientific collaboration, knowledge and data sharing together with direct access to over 1PB of Earth science data and 10,000-cores processing system. The cloud component will interface with NASA OpenNEX – a cloud-based component of NEX. The project aligns well with number of goals of “NASA’s Plan for a Climate-Centric Architecture” and will be capable of supporting number of missions such as LDCM, OCO-2, or SMAP. There will be immediate benefit to number of existing and upcoming projects. The WELD (Web Enabled Landsat Data) project sponsored by NASA MEASUREs program will benefit immediately through improved production and QA monitoring capabilities as well as more efficient execution. There are also a number of projects that are ready to build on the WELD results. First of them is NASA GIBS, a core EOSDIS component, that requires to deliver native resolution imagery from WELD (this will be about 5PB production system). There are also science projects that hope to build on WELD results by implementing MODIS algorithms such as FPAR/LAI using the high-resolution Landsat data. In order to demonstrate the capabilities of the system, we will deploy a prototype on the existing NEX Landsat WELD processing system – a complex 30-stage pipeline, which delivers derived vegetation products by processing over 1.5PB of data. The project will be developed in several stages each addressing separate challenge – workflow integration, parallel execution in either cloud or HPC environments and big-data analytics or visualization. We will first develop the capability and best practices to assist science teams with integration of their large-scale processing pipelines with the workflow system. We will continue with enabling users to launch seamless data production on either cloud or HPC environments, while tracking the data and process provenance. This effort will be based on previous ESTO-funded activities. Finally, we will integrate the system with web-based visualization tools to enable efficient big-data visualization and analytics of the results. The period of performance of the project is two years and we have estimated the possible beginning for March 2015. However, the exact start date is not critical for this project and it can be readily adjusted. We have estimated the entry TRL of the efforts at 4 and we will deliver a system with exit TRL of 6. The detailed TRL justification is provided in the proposal.


Return to Top

Martyn P. Clark, National Center for Atmospheric Research
Development of computational infrastructure to support hyper-resolution large-ensemble hydrology simulations from local to continental scales

Development of computational infrastructure to support hyper-resolution large-ensemble hydrology simulations from local-to-continental scales Development of computational infrastructure to support hyper-resolution large-ensemble hydrology simulations from local-to continental scales.  A move is currently afoot in the hydrologic community towards multi-model ensembles to capture the substantial uncertainty in environmental systems. However, the current generation of operational and experimental simulation platforms can be characterized as “small ensemble” systems. Typically, fewer than five hydrologic models are used to characterize model uncertainty, and the multi-model systems have a poor probabilistic portrayal of risk. As a result, water resource management decisions based on these “small ensemble” systems will often be suboptimal and in the worst case will simply be wrong. The relatively small size of the ensembles in current multi-model systems is largely dictated by the considerable difficulty and human resources needed to implement and parameterize individual hydrological models so that they can be operated within a common framework.  We have developed an advanced hydrologic modeling approach, SUMMA (the Structure for Unifying Multiple Modeling Alternatives), which enables explicitly representing the ambiguities in a myriad of modeling decisions. SUMMA can be used to design ensemble systems with specific properties and improve the probabilistic characterization of risk. SUMMA is currently at Technical Readiness Level 3, and focused effort on computational infrastructure is necessary to develop the next-generation modeling system needed to support water resources planning and management throughout the USA.  The goal of this project is to develop the computational infrastructure to enable hyper-resolution large-ensemble hydrology simulations from local-to-continental scales. By hyper-resolution we mean hydrology simulations on spatial scales of the order of 1 km and by large ensemble we mean a multi-model ensemble of the order of 100 ensemble members. The continental-scale domain for this project is the contiguous USA. The goal of hyper-resolution, large-ensemble, and continental-domain hydrology simulations will be accomplished with focused effort on the following four tasks:

(1) Satisfy data requirements of multiple modeling approaches (e.g., spatial parameter fields);
(2) Improve SUMMA’s numerical robustness and computational efficiency;
(3) Embed SUMMA in the NASA Land Information System (NASA-LIS) to expand multi-model simulation capabilities; and
(4) Evaluate and refine the multi-model ensemble.

These tasks represent the obvious next steps to advance probabilistic continental-domain modeling capabilities at spatial scales relevant for water managers. The first two tasks of fulfilling data requirements for multiple modeling approaches and improving the computational performance of hydrologic models is necessary to enable hyper-resolution, large-ensemble, continental-domain hydrology simulations. The third task of embedding SUMMA in NASA-LIS takes advantage of NASA’s state-of-the-art multi-model framework, especially its ensemble modeling and model benchmarking capabilities. It also extends the capabilities of NASA-LIS to support large multi-model ensembles. The final task to evaluate and refine the multi-model ensemble is necessary to improve the probabilistic characterization of risk. By using a common modeling core in SUMMA, the overhead associated with implementing alternative modeling approaches is significantly reduced, a major departure from current “small ensemble” methods. By embedding SUMMA within NASA’s Land Information System (LIS), meaningful “large ensemble” systems will become a reality and can provide improved guidance for water resource management decisions by our development partners at the U.S. Army Corps of Engineers and the Bureau of Reclamation.


Return to Top

Thomas Clune, Goddard Space Flight Center
DEREChOS: Data Environment for Rapid Exploration and Characterization of Organized Systems

Motivation/Problem Statement: DEREChOS is a natural advancement of the existing, highly-successful Automated Event Service (AES) project. AES is an advanced system that facilitates efficient exploration and analysis of Earth science data. While AES is well-suited for the original purpose of searching for phenomena in regularly gridded data (e.g., reanalysis), targeted extensions would enable a much broader class of Earth science investigations to exploit the performance and flexibility of this service. We present a relevancy scenario, Event-based Hydrometeorological Science Data Analysis, which highlights the need for these features that would maximize the potential of DEREChOS for scientific research.

Proposed solution:  We propose to develop DEREChOS, an extension of AES, that: (1) generalizes the underlying representation to support irregularly spaced observations such as point and swath data, (2) incorporates appropriate re-gridding and interpolation utilities to enable analysis across data from different sources, (3) introduces nonlinear dimensionality reduction (NDR) to facilitate identification of scientific relationships among high-dimensional datasets, and (4) integrates Moving Object Database technology to improve treatment of continuity for the events with coarse representation in time. With these features, DEREChOS will become a powerful environment that is appropriate for a very wide variety of Earth science analysis scenarios.

Research strategy:  DEREChOS will be created by integrating various separately developed technologies. In most cases this will require some re-implementation to exploit SciDB, the underlying database that has strong support for multidimensional scientific data. Where possible, synthetic data/inputs will be generated to facilitate independent testing of new components. A scientific use case will be used to derive specific interface requirements and to demonstrate integration success.

Significance: Freshwater resources are predicted to be a major focus of contention and conflict in the 21st century. Thus, hydrometeorology and hydrology communities are particularly attracted by the superior research productivity through AES, which has been demonstrated for two real-world use cases. This interest is reflected by the participation in DEREChOS of our esteemed collaborators, who include the Project Scientist of NASA SMAP, the Principal Scientist of NOAA MRMS, and lead algorithm developers of NASA GPM.

Relevance to the Program Element: This proposal responds to the core AIST program topic: 2.1.3 Data-Centric-Technologies. DEREChOS specifically addresses the request for big data analytics, including tools and techniques for data fusion and data mining, applied to the substantial data and metadata that result from Earth science observation and the use of other data-centric technologies.

TRL: Although AES will have achieved an exit TRL of 5 by the start date of this proposed project, DEREChOS will have an entry TRL of 3 due to the new innovations that have not previously been implemented within the underlying SciDB database. We expect that DEREChOS will have an exit TRL of 5 corresponding to an end-to-end test of the full system in a relevant environment.


Return to Top

Kamalika Das, Ames Research Center
Uncovering effects of climate variables on global vegetation

The objective of this project is to understand the causal relationships of how ecosystem dynamics, mostly characterized by vegetation changes, in different geographical areas with distinct eco-climatic variability, are affected by regulating climatic factors and other anthropogenic disturbances and extreme events. Although this is a well-studied problem, the state-of-the-art in this area has significant room for improvement. For example, climate variables such as precipitation, solar radiation, and temperature have traditionally been studied as limiting factors affecting plant vegetation growth. However, a quantitative analysis of the influence of unknown or unexpected driving forces on vegetation anomalies is still missing in the context of abrupt climate change (e.g. persistent drought, heat waves) and human-induced local events (e.g. forest fire, irrigation). Similarly, most prior studies have based their analyses on assumptions of linearity and certain types of nonlinearity in the dependency relationships between observed/modeled climate variables and satellite-derived vegetation indices. We hypothesize that such assumptions may not hold true in practice when scaled over large regions, thereby rendering the non-generalized models and the understanding of ecosystem dynamics potentially misconstrued. In this study we propose to use a regression technique called ˜symbolic regression’ for learning these complex time-spacerelationships and their evolution over time. This genetic programming based learning technique has demonstrated the potential of discovering new dependency structures among variables that were previously unknown. Using symbolic regression, we not only expect to uncover new relationships among well-studied climate variables, but also identify latent factors responsible for vegetation anomalies. We will use NASA’s high-end computing and data infrastructure at the NASA Earth Exchange facility in order to scale this evolutionary optimization based regression technique to build global prediction frameworks using hierarchical and ensemble approaches. The benefits of this work will be in improving the understanding of the ecosystem dynamics and generalizing those understandings from regional to global scales. The work will leverage the results of previous NASA-funded efforts, both in terms of data sets and computing infrastructure. The work will aim to answer three science questions starting at the local level and then moving to a global scale. They are:  1) what is the magnitude and extent of ecosystem exposure, sensitivity and resilience to the 2005 and 2010 Amazon droughts, 2) what are the human-induced and other attribution factors that cause vegetation anomalies in certain geographical regions that cannot be otherwise explained by the natural climate variability, and, 3) how does the learned dependency of vegetation on the climate variables and other exogenous factors vary across different eco-climatic zones and geographical regions on a global scale? This project will develop algorithms that will answer the three questions listed above with the help of the domain scientist’s validation. This modeling exercise based on symbolic regression is the first of its kind for earth science applications. Therefore, entry TRL of the project is 2. On successful completion of milestones, the exit level TRL is expected to be 4, where the capabilities have been validated at a global scale. The interdisciplinary team includes expertise in large-scale data mining, symbolic regression, evolutionary optimization, and Earth science, to meet the technical challenges of this project.


Return to Top

Matthew French, University of Southern California Information Sciences Institute
SpaceCubeX: A Hybrid Multi-core CPU/FPGA/DSP Flight Architecture for Next Generation Earth Science Missions

This proposal addresses NASA’s Earth Science missions and climate architecture plan and its underlying needs for high performance, modular, and scalable on-board processing. The decadal survey era missions are oriented not only to provide consistent observations with the previous generation of missions, but also to provide data to help scientists answer critical 21st century questions about global climate change, air quality, ocean health, and ecosystem dynamics, while adding new capabilities such as low-latency data products for extreme event warnings. Missions such as (P) ACE, HyspIRI, GEO-CAPE, ICESat-II, and ASCENDS are specifying instruments with significantly increased temporal, spatial, and frequency resolutions and moving to global, continuous observations. These goals translate into on-board processing throughput requirements that are on the order of 100-1,000x more than previous Earth Science missions for standard processing, compression, storage, and downlink operations. The team proposes to develop SpaceCubeX: a Hybrid Multi-core CPU/FPGA/DSP Flight Architecture for Next Generation Earth Science Missions to address these needs and enable the next generation of NASA Earth Science missions to effectively meet their goals.

Recent studies have shown that in order to realize mission size, weight, area, and power (SWAP) constraints while meeting inter-mission reusability goals, compact heterogeneous processing architectures are needed. In a heterogeneous architecture, general OS support, high level functions, and coarse grained application parallelism are efficiently implemented on multi-core processors, while a co-processor provides mass acceleration of high throughput, fine-grained data parallelism operations, to achieve high performance robustly across many application types. Hybrid architecture development represents a significant departure from traditional homogeneous avionics development and SpaceCubeX provides a structured approach to fundamentally change the avionics processing architecture development process to yield the following benefits:

SpaceCubeX leverages substantial research investments from NASA, DARPA, and NRO on space based computing, multi-core and FPGA architectures, and software / hardware co-design APIs and focuses them on NASA Earth science missions and applications. The University of Southern California’s Information Sciences Institute (USC/ISI) will oversee the effort, leading the development of the architecture and the common API. USC/ISI is teamed with NASA Goddard Space Flight Center, who will assist in architecture development, space applications, and demonstrations, and NASA Jet Propulsion Laboratory who will develop performance benchmarks. The team will use its well-established expertise in these areas to develop a simulation level testbed in year 1 and to implement a bread board proto-type emulation testbed and benchmark performance in year 2, raising the TRL from 3 to 5 in all areas.


Return to Top

Jonathan Gleason, Langley Research Center
Ontology-based Metadata Portal for Unified Semantics (OlyMPUS)

The Ontology-based Metadata Portal for Unified Semantics (OlyMPUS) will extend the prototype Ontology-Driven Interactive Search Environment for Earth Sciences (ODISEES) developed at NASA’s Atmospheric Science Data Center (ASDC) to enable users to find and download data variables that satisfy their precise criteria. The ODISEES-OlyMPUS end-to-end system will support both data consumers and data providers, enabling the latter to register their data sets and provision them with the semantically rich metadata that drives ODISEES’ data discovery and access service for data consumers.

A core function of NASA’s Earth Science Division is research and analysis that uses the full spectrum of data products available in NASA archives. Scientists need to perform comprehensive analyses that identify correlations and non-obvious relationships across all types of Earth System phenomena. Comprehensive analytics are hindered, however, by the fact that many data products and climate model products are disparate and hard to synthesize. Variations in how data are collected, processed, averaged, gridded, and stored, create challenges for data interoperability and synthesis, which are exacerbated by the sheer volume of available data. A coordinated approach to data delivery will greatly improve prospects for interoperability and synthesis, better enabling scientists to take advantage of the full range of global data from satellites and aircraft, as well as outputs from numerical models.

Metadata has emerged as a means of improving data delivery and interoperability. Robust, semantically rich metadata can support tools for data discovery and access and facilitate machine-to-machine transactions with services such as data sub-setting, re-gridding, and reformatting. Such capabilities are critical to enabling the research activities integral to NASA’s strategic plans. However, as metadata requirements increase and competing standards emerge, data producers are increasingly burdened with the time-consuming task of provisioning their data products. Adequate tools for metadata provisioning are not commonplace. Although some metadata provisioning tools are available, the metadata they produce typically has little or no semantic framework, is generally coarse-grained, and is frequently inadequate to provide the sort of information required to support data interoperability. If metadata is to provide the means to address interoperability challenges, then tools that support the needs of both data consumers and data providers have to be developed. The OlyMPUS project will:

1. Expand the capabilities of ODISEES to improve existing search capabilities and introduce new features and capabilities, raising the Technology Readiness Level from 3 to 5 over the two-year effort.

2. Leverage the robust semantics and reasoning capabilities of ODISEES to provide data producers with a semi-automated tool -the OlyMPUS Metadata Provisioning System – to produce the robust and detailed semantic metadata needed to support ODISEES’ parameter-level discovery and access services.

3. Integrate ODISEES with select data delivery tools at the ASDC and National Center for Climate Simulation (NCCS) to enable data consumers to create customized data sets and download them directly to their computers or, for registered users of the NASA Advanced Supercomputing (NAS) facility, directly to NAS storage resources for access by applications running on NASA supercomputers.


Return to Top

Milton Halem, University of Maryland Baltimore County
Computational Technologies: Feasibility Studies of Quantum Enabled Annealing Algorithms for Estimating Terrestrial Carbon Fluxes from OCO-2 and the LIS Model

The successful launch of the Orbiting Carbon Observatory 2 (OCO-2) on July 2, 2014 should lead to new opportunities to calculate long-term trends in CO2 fluxes and their regional and global uptake. The sun-synchronous coverage of OCO-2 with its high spatial resolution of 1.29×2.25km at nadir provides the first global dataset of vertical CO2 concentrations with surface spectral resolutions that can provide accurate CO2 flux profiles. We propose algorithms that could be significantly enabled by quantum annealing computing (QAC) technologies over the course of the next decade to leverage data products from this mission and other NASA Earth observing satellites to infer regional carbon sources and sinks.

To evaluate whether quantum computing has the potential to be a disruptive technology in supporting Earth Science missions, we will show that the Dwave QAC housed at NASA Ames can be used to derive value and information from the recently launched OCO-2 satellite. In particular, we propose to explore the use of (QAC) for (i) satellite image registration to detect canopy and vegetation cover changes, and (ii) to perform variational data assimilation. We will apply these schemes to estimate regional net ecosystem exchange (NEE), a challenging and extremely important measurement for climate prediction. Both of these QAC algorithms are very general in that they are applicable to a large number of Earth Science problems.
We will use QAC to assimilate information from Level 2 vertical profiles from OCO-2 into the Goddard Land Information System at 1° resolution for the following two regions: a high latitude region encompassing Barrows Island and a low latitude region encompassing the Amazon. The separate QAC algorithms will be explored to accommodate the two tasks. Image registration will comprise of two QAC algorithms: (i) a binary image thresholding classifier to detect edge-like features will be directly encoded into the Dwave QAC, and (ii) the Dwave Qsage hybrid optimization algorithm will be used to geo-register a time series of OCO-2 images and MODIS images to detect NDVI changes over the two regions. Additionally, a 3D-variation data assimilation algorithm will be directly encoded as a QUBO into the Dwave Chimera graph to perform data assimilation.

These three QAC algorithms will be tested on a distributed hybrid system consisting of the IBM iDataPlex cluster at UMBC with remote access to the NASA 512 qubit Dwave computer at Ames. After initial development, subsequent tests to assess the scalability of algorithmic solutions will be conducted on 1024 and 2048 qubit QACs provided by Dwave Systems (see attached letter). This dual use of image processing and model data assimilation algorithms will not only contribute to a potential OCO-2 capability to provide an NEE product but may also serve as justification for the continued development of the quantum architecture for remote sensing and climate modeling satellite data assimilation “including for other remote sensing products related to such things as soil moisture and ocean carbon uptake. Clearly, owing to the current state of QAC lacking algorithms to address any of the issues proposed here (or most other Earth Science problems), we consider this project to have an entry TRL of 2. Developing the basic QAC algorithms for image registration and variation data assimilation will raise the TRL to 3, and implementing the full proposed hybrid quantum/classical assimilation OSSE will further raise the TRL to 4. The exit level of this approach will be TRL 4 as we develop a prototype of QAC-enabled data assimilation for Level 3 OCO-2 observations on the Dwave 512, 1024, and 2048 qubit systems.


Return to Top

Hook Hua, Jet Propulsion Laboratory
Agile Big Data Analytics of High-Volume Geodetic Data Products for Improving Science and Hazard Response

Geodetic imaging is revolutionizing geophysics, but the scope of discovery has been limited by labor-intensive technological implementation of the analyses. The Advanced Rapid Imaging and Analysis (ARIA) project has proven capability to automate SAR image analysis, having processed thousands of COSMO-SkyMed (CSK) scenes collected over California in the last year as part of a JPL/Caltech collaboration with the Italian Space Agency (ASI). The successful analysis of large volumes of SAR data has brought to the forefront the need for analytical tools for SAR quality assessment (QA) on large volumes of images, a critical step before higher level time series and velocity products can be reliably generated. While single interferograms are useful for imaging episodic events such earthquakes, in order to fully exploit the tsunami of SAR imagery that will be generated by current and future missions, we need to develop more agile and flexible methods for evaluating interferograms and coherence maps.

Our AIST-2011 Advanced Rapid Imaging & Analysis for Monitoring Hazards (ARIA-MH) data system has been providing data products to researchers working on a variety of earth science problems including glacial dynamics, tectonics, volcano dynamics, landslides and disaster response. A data system with agile analytics capability could reduce the amount of time researchers currently spend on analysis, quality assessment, and re-analysis of interferograms and time series analysis from months to hours. A key stage in analytics for SAR is the quality assessment stage, which is a necessary step before researchers can reliably use results for their interpretations and models, and we propose to develop machine learning tools to enable more automated quality assessment of complex imagery like interferograms, which will in turn enable greater science return by expanding the amount of data that can be applied to research problems.

Objectives: We will develop an advanced hybrid-cloud computing science data system for easily performing massive-scale analytics of geodetic data products for improving the quality of InSAR and GPS data products that are used for disasters monitoring and response. We will focus our analysis on Big Data-scale analytics that are needed to quickly and efficiently assess the quality of the increasing collections of geodetic data products being generated existing and future missions.

Technology Innovations: Science is an iterative process that requires repeated exploration of the data through various what-if scenarios. By enabling faster turn-around of analytics and analysis processing of the increasing amount of geodetic data, we will enable new science that cannot currently be done. We will adapt machine learning approach to QA assessment for improving the quality of geodetic data products. Furthermore, these types of analytics such as assessing coherence measures of the InSAR data will be used to improve the quality of the data products that are already being used for disasters response. We will develop new approaches enabling users to quickly develop, deploy, run, and analyze their own custom analysis code across entire InSAR and GPS collections.

Expected Significance: To improve the impact of our generated data products for both the science and monitoring user communities, quality assessment (QA) techniques and metrics are needed to automatically analyze the PB-scale data volumes to identify both problems and changes in the deformation and coherence time series.  Automated QA techniques are currently underdeveloped within the InSAR analysis community, but have already become much more strategically important for supporting the expected high data volumes of upcoming missions such as Sentinel, ALOS-2, and NASA-ISRO SAR (NISAR) and high-quality science and applications. The science data system technology will also enable NASA to support the high data volume needs of NISAR in addition to the analysis of the data products.


Return to Top

Thomas Huang, Jet Propulsion Laboratory
OceanXtremes: Oceanographic Data-Intensive Anomaly Detection and Analysis Portal

Anomaly detection is a process of identifying items, events or observations, which do not conform to an expected pattern in a dataset or time series.  Current and future missions and our research communities challenge us to rapidly identify features and anomalies in complex and voluminous observations to further science and improve decision support.  Given this data intensive reality, we propose to develop an anomaly detection system, called OceanXtremes, powered by an intelligent, elastic Cloud-based analytic service backend that enables execution of domain-specific, multi-scale anomaly and feature detection algorithms across the entire archive of ocean science datasets.

A parallel analytics engine will be developed as the key computational and data-mining core of OceanXtreams’ backend processing. This analytic engine will demonstrate three new technology ideas to provide rapid turnaround on climatology computation and anomaly detection:

1. An adaption of the Hadoop/MapReduce framework for parallel data mining of science datasets, typically large 3 or 4 dimensional arrays packaged in NetCDF and HDF.

2. An algorithm profiling service to efficiently and cost-effectively scale up hybrid Cloud computing resources based on the needs of scheduled jobs (CPU, memory, network, and bursting from a private Cloud computing cluster to public cloud provider like Amazon Cloud services).

3. An extension to industry-standard search solutions (OpenSearch and Faceted search) to provide support for shared discovery and exploration of ocean phenomena and anomalies, along with unexpected correlations between key measured variables.

We will use a hybrid Cloud compute cluster (private Eucalyptus on premise at JPL with bursting to Amazon Web Services) as the operational backend.  The key idea is that the parallel data-mining operations will be run near the ocean data archives (a local network hop) so that we can efficiently access the thousands of (say, daily) files making up a three decade time-series, and then cache key variables and pre-computed climatology in a high-performance parallel database.

OceanXtremes will be equipped with both web portal and web service interfaces for users and applications/systems to register and retrieve oceanographic anomalies data.  By leveraging technology such as Datacasting (Bingham, et.al, 2007), users can also subscribe to anomaly or event types of their interest and have newly computed anomaly metrics and other information delivered to them by metadata feeds packaged in standard Rich Site Summary (RSS) format. Upon receiving new feed entries, users can examine the metrics and download relevant variables, by simply clicking on a link, to begin further analyzing the event.  The OceanXtremes web portal will allow users to define their own anomaly or feature types where continuous backend processing will be scheduled to populate the new user-defined anomaly type by executing the chosen data mining algorithm (i.e. differences from climatology or gradients above a specified threshold).  Metadata on the identified anomalies will be cataloged including temporal and geospatial profiles, key physical metrics, related observational artifacts and other relevant metadata to facilitate discovery, extraction, and visualization.

Products created by the anomaly detection algorithm will be made exportable and subset table using Webification (Huang, et.al, 2014) and OPeNDAP (http://opendap.org) technologies.  Using this platform scientists can efficiently search for anomalies or ocean phenomena, compute data metrics for events or over time-series of ocean variables, and efficiently find and access all of the data relevant to their study (and then download only that data).


Return to Top

William Ivancic, Glenn Research Center
Multi-channel Combining for Airborne Flight Research using Standard Protocols

Glenn Research Center (GRC) proposes a 2-year effort for the Core Topic Area: Operations Technologies. The technology specifically supports NASA’s Earth Science airborne systems including Uninhabited Aerial Vehicles (UAVs) including autonomous operations and improvements in efficient operations of real-time communications for airborne science research.

The airborne science research aircraft such as the Global Hawk, DC-8 and P-3 use custom channel combining of Iridium satellite phones to continuously monitor science payload and housekeeping data as well as science payload power control and reporting.  This technology is also used for aircraft control.   The current systems have known problems, which results in inefficient data delivery and poor reliability particularly when attempting to use standard Internet protocols to login and control.  The Internet community recently announced an experimental protocol, multipath-TCP. We propose to characterize the current Iridium modems and model those channels.  We will then evaluate multipath-TCP as a generic solution to the channel-combining problem when using TCP and to utilize those techniques inherent in multipath-TCP to develop standardized multipath-UDP.  Multipath-UDP would provide a standardized channel-combining technique when using applications that have UDP-based datagram delivery.

The Technology Readiness Level is currently 2 for all multipath-TCP using noisy and unreliable links.  The system will initially be tested in a laboratory environment but will quickly move to real modems and hardware to obtain a TRL of 6 (System/subsystem model or prototype demonstration in a relevant environment.) If we can develop a software only solution that will work with existing hardware we could obtain a TRL of 8 (flight qualified through test and demonstration) by the end of the project.


Return to Top

Kristine Larson, University of Colorado at Boulder
AMIGHO: Automated Metadata Ingest for GNSS Hydrology within OODT

GNSS sites installed by surveyors and geophysicists to measure land motions can also provide valuable and cost-efficient information about three critical hydrologic variables: soil moisture, snow depth/snow water equivalent, and vegetation water content. A pilot project in the western U.S. has demonstrated that these GNSS reflection data can be produced operationally, with data latency necessary for weather forecasting and measurement frequency appropriate for climate studies and satellite validation.

Reflections from coastal GNSS sites can also be used to accurately measure water levels such as tidal motions. Some of the 12,000+ existing continuously-operating GNSS sites globally could be producing hydrology and sea level measurements at minimal cost (GNSSH2O), but information technology limitations preclude inclusion of more than a handful of sub networks in current systems. To address this problem, we will advance the technology of the software that acquires and processes data from disparate and variable sources

Our objectives are to
1. Enable operators of GNSS networks to provide current and past data to the GNSS H2O system.
2. Build an infrastructure system to automatically ingest GNSS observations, evaluate station metadata, and produce hydrologic products from the GNSS data.
3. Enable improved understanding of the GNSS water products through development and maintenance of a portal for visualization, mining, and data sharing.

We will leverage the proven Apache OODT (Object Oriented Data Technology) framework. This open-source system currently serves as the infrastructure backbone for various JPL, NASA, and non-NASA science data systems. Apache OODT framework provides configuration-driven components that can fulfill the GNSS H2O system’s requirements and provide the scalability and extensibility required to ingest and process thousands of new data streams. Although OODT has proven to be a robust suite of software components for developing and operating a diverse array of science data systems, it does require a software developer’s expertise and effort to configure an existing system for new data streams. This project looks to advance the technology of this software suite by designing and developing an automatic configuration layer that incorporates the software developer’s expertise for configuring OODT components. In order to automate this process, we will first define a standard set of metadata for describing a GNSS station and its data and design and develop a web-based interface for submitting this metadata to the system. We will utilize a portion of this metadata to determine the suitability of a station’s reflection data for generating hydrologic products. For example, we will pre-process the station metadata to determine (using Google maps and other publicly available geological data) the close proximity of roads and other geographical factors near the station that would render any of the GPS reflection data unusable for hydrologic products or sea level measurements. Our technology will also benefit any current or potential OODT user with variable, heterogeneous data sources. This project will demonstrate technology that enhances NASA’s ability to efficiently provide Earth science data to scientists and the broader water management community, including urban planners that study coastal resilience for sea level rise and storm inundation.  It is directly responsive to the Data-Centric Technologies core topic of this opportunity in that it will improve information re-use, facilitate collaboration within the research community, and increase the speed with which scientific results are produced. Airborne or in situ science data systems based on the OODT framework, similar to the Carbon in Artic Reservoirs Vulnerability Experiment (CARVE), will also benefit greatly from our automated configuration technology.


Return to Top

Seungwon Lee, Jet Propulsion Laboratory
Climate Model Diagnostic Analyzer

Both the National Research Council (NRC) Decadal Survey and the latest Intergovernmental Panel on Climate Change (IPCC) Assessment Report stressed the need for the comprehensive and innovative evaluation of climate models with the synergistic use of global observations in order to maximize the investments made in Earth observational systems and also to capitalize on them for improving our weather and climate simulation and prediction capabilities. The abundance of satellite observations for fundamental climate parameters and the availability of coordinated model outputs from the Coupled Model Inter-comparison Project Phase 5 (CMIP5) for the same parameters offer a great opportunity to understand and diagnose model biases in climate models. In addition, the Obs4MIPs efforts have created several key global observational datasets that are readily usable for model evaluations.

We propose to develop a novel methodology to diagnose model biases in contemporary climate models and to implement the methodology as a web-service based, cloud-enabled, provenance-supported climate-model evaluation system for the Earth science modeling and model analysis community. The proposed information system is named Climate Model Diagnostic Analyzer (CMDA) and will be built upon the current version of CMDA, which is the product of the research and technology development investments of several current and past NASA ROSES programs led by the proposal team members. We will leverage the current technologies and infrastructure of CMDA and extend the capabilities of CMDA to address several technical challenges that the modeling and model analysis community faces in evaluating climate models by utilizing three technology components: (1) diagnostic analysis methodology; (2) web-service based, cloud-enabled technology; (3) provenance-supported technology.

The proposed diagnostic analysis methodology will help the scientists identify the physical processes responsible for creating model biases and incorporate the understanding into new model representations that reduce the model biases. Potentially, the results of the proposed work can significantly increase the model predictability of climate change because improving the model representations of the current climate system is essential to enhancing confidence in seasonal, decadal, and long-term climate projections.

Additionally, the proposed web-service based, cloud-enabled technology will facilitate a community-wide use and relatively effortless adoption of this novel model-diagnosis methodology. Its web-browser interface and cloud-based computing will allow instantaneous use without the hassle of local installation, compatibility issues, and scalable computational resource issues and offer a low barrier to the adoption of the tool.

Finally, the proposed provenance-supported technology will automatically keep track of processing history during analysis calls, represent the summary of the processing history in a human readable way, and enable provenance-based search capabilities. Scientists currently spend a large portion of their research time on searching previously analyzed results and regenerating the same results when they fail to locate them. The reproducibility of the analysis results by other scientists is also limited for the same reason. The proposed provenance support technology will greatly improve the productivity of the scientists using the analysis tool and enable scientists to share/reproduce the results generated by other scientists.


Return to Top

Jacqueline LeMoigne, Goddard Space Flight Center
Tradespace Analysis Tool for Designing Earth Science Distributed Missions

The ESTO 2030 Science Vision envisions the future of Earth Science to be characterized by “many more distributed observations,” and “formation-flying [missions that] will have the ability to reconfigure on the fly and constellations of complementary satellites with different capabilities will work together autonomously.” All these concepts refer to “Distributed Spacecraft Missions (DSMs)”, i.e., missions that involve multiple spacecraft to achieve one or more common goals, and more particularly to “constellations” or “formations”, i.e., missions designed as distributed missions with specific orbits from inception (in contrast to virtual or ad-hoc DSMs being formed after launch). DSMs include multiple configurations such as homogenous and heterogeneous constellations, formation flying clusters and fractionated spacecraft. They are gaining momentum in all science domains, because of their ability to optimize the return on investment in terms of science benefits, aided by the increasing prevalence of small satellites. In Earth science, DSMs have the unique ability to increase observation sampling in spatial, spectral and temporal dimensions simultaneously. Many future missions have been studying the possibility of using constellations to satisfy their science goals. However, since DSM architectures are defined by monolithic architecture variables and variables associated with the distributed framework, designing an optimal DSM requires handling a very large number of variables, which increases further in heterogeneous cases. Additionally, DSMs are expected to increase mission flexibility, scalability, evolve ability and robustness and to minimize cost risks associated with launch and operations. As a result, DSM design is a complex problem with many design variables and multiple conflicting objectives. There are very few open-access tools available to explore the trade space of variables, minimize cost and maximize performance for predefined science goals, and therefore select the most optimal design. Over the last year, our team developed a prototype tool using the MATLAB engine and interfacing with AGI’s Systems Tool Kit. The prototype tool is capable of generating hundreds of DSM architectures using a few pre-defined design variables, e.g., number of spacecraft, number of orbital planes, altitudes, swath widths, etc. and sizing the architectures’ performance using the limited metrics available in the off-the-shelf components. Currently, only Walker constellations are being considered. We have found off-the-shelf components do not support the necessary functionality to explore and optimize DSMs based on specific science objectives and architecture requirements. In the proposed work, the tool will be generalized to consider a larger number of parameters and metrics and different types of constellations, to enable analysis and design of architectures in terms of pre-defined science, cost, and risk metrics. The product will leverage existing modeling and analysis capabilities available in the General Mission Analysis Tool (GMAT), developed at NASA Goddard. GMAT is an open source, trajectory optimization and design system, designed to model and optimize spacecraft trajectories in flight regimes ranging from low Earth orbit to lunar applications, interplanetary trajectories, and other deep space missions. The tool will include a user-friendly interface that will enable Earth Scientists to easily perform trade space analyses and to interface with this tool when performing trades on planned instruments and when conducting Observing System Simulation Experiments (OSSEs) for mission design. The software developed under this proposal will enable: (1) better science through distributed missions, (2) better communication between mission designers and scientists, (3) more rapid trade studies, (4) better understanding of the trade space as it relates to science return. The final tool will be offered open source to the Earth Science community.


Return to Top

Mike Lieber, Ball Aerospace
Model Predictive Control Architecture for Optimizing Earth Science Data Collection

The increasing importance of distributed space-based sensor systems has led to exciting developments in large-scale data extraction software and synthesis of complex data products.  In particular, this has pushed the development of sensor web software to coordinate the data collection process. What is not well-developed is the local (or flight) software for fast control of systems with many degrees of freedom. Two examples of such systems are the newly developed Electronically Steerable Flash Lidar (ESFL), and tight formation flying control of future Cubesat missions.

We propose a local, multi-layered control system architecture which communicates with the higher level software layers. The local control is based upon an architecture known as Model Predictive Control (MPC). MPC has found use in many different complex systems where the controlled system is characterized as multivariable, with multiple constraints and possibly nonlinear. These include robotic vision systems, chemical processing and has been proposed for quad-rotor and formation flying spacecraft.  MPC optimizes the data collection at each time step from higher level constraints and commands and is enabled by the increased computational power now available in FPGA implementations. We propose to develop the MPC architecture for ESFL and use models to verify it’s capability from synthetic scenes and fusion with other sensors.  ESFL has potentially hundreds of individually steerable laser beamlets and when combined with other sensor poses a large optimization problem well suited to the MPC approach. The technology developed under this effort is also applicable to formation flying systems.


Return to Top

Constantine Lukashin, Langley Research Center
NASA Information and Data System (NAIADS) for Earth Science Data Fusion and Analytics

One of the key elements of advancing our understanding of Earth system via remote sensing is integration of diverse measurements into the observing system. As remote measurements capture larger amounts of data and higher quality data, the demand for advanced data applications and high-performance information processing systems becomes a greater challenge. These challenges are outlined in the OSTP Guidelines for Civil Space Observations (2013), recognized in the NASA Strategic Space Technology Investment Plan (2013), and addressed in the NASA Strategic Objective 2.2 (2014) and its implementation by “…developing new technologies and predictive capabilities, and demonstrating innovative and practical uses of the program’s data and results for societal benefit”. In response to these challenges, we propose to develop the NASA Information and Data System (NAIADS) – a prototype framework for the next generation Earth Science multi-sensor data fusion, processing, and analytics. The concept of maximizing information content by combining multi-sensor data and enabling advanced science algorithms, was successfully used by several past and on-going projects: CERES experiment (Earth radiation budget), fusion of the CERES, MODIS and MISR observations (for estimating instantaneous shortwave flux uncertainties, and multi-instrument calibration comparison), fusion of MODIS and PARASOL observations to enhance cloud and aerosol retrievals, fusion of data from CALIPSO, CloudSat, CERES, and MODIS (A-Train) for comprehensive aerosol and cloud information, as well as CERES-derived fluxes. Advanced science algorithms allowed to reduce uncertainty in weather and climate parameters. The future satellite constellations and NASA missions: RBI, TEMPO, CLARREO, ACE, and GEO-CAPE will require tools for efficient data fusion and massive process scaling. Objective of proposed effort is to develop a prototype of a conceptually new middleware framework to modernize and significantly improve efficiency of the Earth Science data fusion, big data processing and analytics. The key components of the NAIADS include: Service Oriented Architecture (SOA) framework, into-memory Data Staging, multi-sensor coincident Data Predictor, multi-sensor data-Event Builder, complete data-Event streaming (a workflow with minimized IO), on-line data processing control and analytics services. The NAIADS project will leverage existing CLARA SOA framework, developed in Jefferson Lab, and integrated with the ZeroMQ messaging library. The services will be prototyped and incorporated into a system. Data merging and follow-on aerosol retrieval from combined TEMPO, GOES-R ABI simulated, and real VIIRS observations will be used for NAIADS demonstration and performance tests in Compute Cloud and Cluster environments.

The proposed NASA Information And Data System (NAIADS) provides a novel approach to significantly improve efficiency in the Earth Science multi-sensor big data processing and analysis by deploying conceptually new workflow and state of-the-art software technologies. Within the 2-year project, beginning May 2015, the NAIADS technology readiness will increase from TRL 3 to TRL 6.


Return to Top

Christian Mattmann, Jet Propulsion Laboratory
SciSpark: Highly Interactive and Scalable Model Evaluation and Climate Metrics for Scientific Data and Analysis

We will construct SciSpark, a scalable system for interactive model evaluation and for the rapid development of climate metrics and analyses. SciSpark directly leverages the Apache Spark technology and its notion of Resilient Distributed Datasets (RDDs).  RDDs represent an immutable data set that can be reused across multi-stage operations, partitioned across multiple machines and automatically reconstructed if a partition is lost. The RDD notion directly enables the reuse of array data across multi-stage operations and it ensures data can be replicated, distributed and easily reconstructed in different storage tiers, e.g., memory for fast interactivity, SSDs for near real time availability and I/O oriented spinning disk for later operations.  DDs also allow Spark’s performance to degrade gracefully when there is not sufficient memory available to the system. It may seem surprising to consider an in-memory solution for massive datasets, however a recent study found that at Facebook 96% of active jobs could have their entire data inputs in memory at the same time. In addition, it is worth noting that Spark has shown to be 100x faster in memory and 10x faster on disk than Apache Hadoop, the de facto industry platform for Big Data. Hadoop scales well and there are emerging examples of its use in NASA climate projects (e.g., Teng et al. and Schnase et al.) but as is being discovered in these projects, Hadoop is most suited for batch processing and long running operations.  SciSpark contributes a Scientific RDD that corresponds to a multi-dimensional array representing a scientific measurement subset by space, or by time. Scientific RDDs can be created in a handful of ways by: (1) directly loading HDF and NetCDF data into Hadoop Distributed File System (HDFS); (2) creating a partition or split function that divides up a multi-dimensional array by space or time; (3) taking the results of a regridding operation or a climate metrics computation; or (4) telling SciSpark to cache an existing Scientific RDD (sRDD), keeping it cached in memory for data reuse between stages. Scientific RDDs will form the basis for a variety of advanced and interactive climate analyses, starting by default in memory, and then being cached and replicated to disk when not directly needed. SciSpark will also use the Shark interactive SQL technology that allows structured query language (SQL) to be used to store/retrieve RDDs; and will use Apache Mesos to be a good tenant in cloud environments interoperating with other data system frameworks (e.g., HDFS, iRODS, SciDB, etc.) One of the key components of SciSpark is interactive sRDD visualizations and to accomplish this SciSpark delivers a user interface built around the Data Driven Documents (D3) framework. D3 is an immersive, javascript based technology that exploits the underlying Document Object Model (DOM) structure of the web to create histograms, cartographic displays and inspections of climate variables and statistics. SciSpark is evaluated using several topical iterative scientific algorithms inspired by the NASA RCMES project including machine learning (ML) based clustering of temperature PDFs and other quantities over North America, and graph-based algorithms for searching for Mesocale Convective Complexes in West Africa.


Return to Top

Victor Pankratius, Massachusetts Institute of Technology
Computer-Aided Discovery of Earth Surface Deformation Phenomena

Key Objectives: Earth scientists are struggling to extract new insights from a sea of large data sets originating from multiple instruments. The goal of this proposal is to provide enhanced assistance to enable the most beneficial division between human and machine efforts. We propose the creation of enhanced environments for computer-aided discovery that support humans in the search process of making new discoveries related to earth surface deformation phenomena.

Methods & Techniques: Earth surface deformation measurements currently rely on two key techniques. Over 4000 continuously operating GPS sites collect global information to study motions of the Earth’s surface with accuracies as low as 0.5 mm per day. Interferometric Synthetic Aperture Radar (InSAR) detects deformations based on imaging and has sensitivity of a few millimeter differential motions over swath widths of up to 100 km. After an earthquake both GPS and InSAR are analyzed in detail to model the co-seismic offsets, seismic wave propagation (GPS only) and post-seismic processes. Our understanding of deformation processes can be furthered by including additional data characterizing for example volcanic inflation, episodic tremor slips or characterizing biases in a better way, such as snow on antennas, antenna failures, or atmospheric delays. To identify and eliminate such anomalies, scientists need to gain insight by cross-comparison of multiple data sets acquired through different techniques and instrumental principles. The data portfolio available to Earth Scientists already includes GPS sensor time series, InSAR, MODIS imagery, land temperature, GRACE gravity field changes, and upcoming missions like NASA NISAR mission will increase the temporal density of InSAR images by orders of magnitude. Scientists therefore need better automation support and more sophisticated tools. We propose an approach for computer-aided discovery based on software infrastructure providing key features to facilitate the discovery search: (1) A software environment engaging scientists to programmatically express hypothesized scenarios, constraints, and model variants (e.g. parameters, choice of algorithms, workflow alternatives), so as to automatically explore with machine learning the combinatorial search space of possible model applications in parallel on multiple data sets and identify the ones with better explanatory power. (2) A cloud-based infrastructure realizing a high-performance parallel model evaluation capability for data sets that reside in NASA’s data centers. Various search modes will be provided, e.g., including one where the system can use scientist feedback from model evaluations to parameterize the search in new runs and direct the system to identify more analogous features and reduce false positives. Workflows will be stored in a workflow warehouse in the cloud so other scientists can easily rerun them on new data sets. We will demonstrate our approach with three specific case studies. (1) Volcanics at Yellowstone; (2) Groundwater phenomena in the Central Valley; (3) Atmospheric phenomena and effects of lee waves on position determination. All demonstrations will use a fusion of data sets consisting of GPS, InSAR, and other data that is archived at UNAVCO and NASA Earth Exchange. This data contains known phenomena as well as other potentially unknown phenomena, so it can be leveraged for controlled computational experiments to quantify the effectiveness of our techniques.

Significance: Our proposal will advance NASA’s capability for modeling, assessment, and computing of Earth Science data (2.1.2 Computational Technologies; 2.2.1 Innovation Breakthroughs for Modeling, Analysis, and Prediction) and improve technical means to assess, mitigate, and forecast natural hazards. Computer-aided discovery will enhance the productivity and ability of scientists to process big data from a variety of sources and generate new insight.


Return to Top

Rahul Ramachandran, Marshall Space Flight Center
Illuminating the Darkness: Exploiting untapped data and information resources in Earth Science

We contend that Earth science metadata assets are dark resources, information resources that organizations collect, process, and store for regular business or operational activities but fail to utilize for other purposes.  The challenge for any organization is to recognize, identify and effectively utilize the dark data stores in their institutional repositories to better serve their stakeholders. These metadata hold rich textual descriptions and browse imagery that allow users to review search results and preview data, but have not been fully exploited by information systems to serve the research and education communities.  This proposed work looks at these metadata assets in a completely new and innovative light; it will result in a search tool built on semantic technologies to create new knowledge discovery pathways in Earth Science.

This proposal brings together a strong team of informatics experts with a long history of research in data systems, scientific search and semantics, as well as a proven track record of previous collaborations: PI Dr. Rahul Ramachandran (NASA/MSFC), geoinformatics specialist and Manager of GHRC DAAC; Co-I Dr. Christopher Lynnes (NASA/GSFC), Information Systems Architect at the GES DISC; Co-I Dr. Peter Fox (Rensselaer Polytechnic Institute; RPI), Tetherless Constellation World Chair; and Manil Maskey (University of Alabama in Huntsville; UAH), lead designer and developer for multiple projects in Earth science information systems.

The proposed work addresses the core AIST topic of Data-Centric Technologies, with a particular focus on utilizing semantic technologies to explore, visualize, and analyze representations of semantically identified information in order to discover new useful information “directly addressing the subtopic, Alternative Approaches / Disruptive Technologies for Earth Science Data System.  This project will develop a Semantic Middle Layer (SML) consisting of a content based image retrieval service to provide for visual search for events or phenomena in Earth science imagery; an ontology based data curation service which uses structured metadata and descriptive text to find data relevant to that event, phenomenon, or thematic topic; and a semantic rule based processing service to create curated data albums consisting of data bundles and exploratory plots generated on the fly.  Together these components will allow users to identify events of interest in images and assemble a collection of pre-processed data to support scientific investigations focused on these events.  We will design the SML and a demonstration Event Nexus Discovery Client using three science use cases developed in collaboration with Dr. Sundar Christopher, an expert in satellite remote sensing at UAH.


Return to Top

Shawn Smith, Florida State University
A Service to Match Satellite and In-situ Marine Observations to Support Platform Inter-comparisons, Cross-calibration, Validation, and Quality Control

We propose developing a Distributed Oceanographic Match-up Service (DOMS) for the reconciliation of satellite and in situ datasets in support of NASA’s Earth Science mission. The service will provide a mechanism for users to input a series of geospatial references for satellite observations (e.g., footprint location, date, and time) and receive the in-situ observations that are matched to the satellite data within a selectable temporal and spatial domain. The inverse of inputting in-situ geospatial data (e.g., positions of moorings, floats, or ships) and returning corresponding satellite observations will also be supported. The DOMS prototype will include several characteristic in-situ and satellite observation datasets. For the in-situ data, the focus will be surface marine observations from the International Comprehensive Ocean-Atmosphere Data Set (ICOADS), the Shipboard Automated Meteorological and Oceanographic System Initiative (SAMOS), and the Salinity Processes in the Upper Ocean Regional Study (SPURS). Satellite products will include JPLv3 QuikSCAT winds, the Aquarius v3.0 L2 orbital/swath dataset, and the high-resolution gridded L4 MUR-SST product. Importantly, although DOMS will be established with these selected datasets, it will be readily extendable to other in situ and satellite collections, which could support additional science disciplines.

DOMS is needed by the marine and satellite research communities to support a range of activities. Use cases include, but are not limited to: (1) Iterative calibration/validation of Aquarius sea surface salinity (SSS) retrieval algorithms.  In this case, the user must identifying co-located salinity observations from the satellite to surface salinity from ships or floats. (2) Validation of wind directions measured in-situ on ships to ensure they are representative of the large-scale wind flow derived from a scatterometer. For each ship wind measurement, the user must determine whether or not a satellite wind observation can be matched to the ship data (close enough in space and time). Traditionally, researchers match observations by downloading large volumes of satellite and in situ data to local computers and using independently developed one-off software. Frequent repetition of this process consumes substantial human and machine resources. The proposed service will provide a community accessible tool that dynamically provides data matchups and allows the scientist to only work with the subset of data where the matches exist.

Technical challenges to be addressed include (1) ensuring that the match-up algorithms perform with sufficient speed to return desired information to the user, (2) performing data matches using datasets that are distributed on the network, and (3) returning actual observations for the matches [e.g., salinity] with sufficient metadata so the value difference can be properly interpreted. We will leverage existing technologies (i.e., the Extensible Data Gateway Environment, Webficiation, OPeNDAP, and SQL and graph/triple-store databases) and cloud computing to develop DOMS. DOMS will be equipped with a web portal interface for web users to browse and to submit match-up requests interactively. DOMS will also provide an underlying web service interface for machine-to-machine match-up operations to enable external applications and services.

The proposal team includes experts from Florida State University’s Center for Ocean-Atmospheric Prediction Studies (COAPS), NASA’s Jet Propulsion Laboratory (JPL), and the National Center for Atmospheric Research (NCAR). DOMS will be hosted at JPL close to the satellite oceanographic data archive (containing the largest volumes of data used by DOMS) and the SPURS in situ data. Other distributed DOMS nodes hosting the project software stack and serving the ICOADS and SAMOS data will be at partnering institutions NCAR and COAPS, respectively.


Return to Top

Tomasz Stepinski, University of Cincinnati
Pattern-based GIS for Understanding Content of very large Earth Science Datasets

The research focus in the field of remotely sensed imagery has shifted from collection and warehousing of data “tasks for which a mature technology already exists, to auto-extraction of information and knowledge discovery from this valuable resource “tasks for which technology is still under active development. In particular, intelligent algorithms for analysis of very large rasters, either high resolutions images or medium resolution global datasets, that are becoming more and more prevalent, are lacking. We propose to develop the Geospatial Pattern Analysis Toolbox (GeoPAT) a computationally efficient, scalable, and robust suite of algorithms that supports GIS processes such as segmentation, unsupervised/supervised classification of segments, query and retrieval, and change detection in giga-pixel and larger rasters. At the core of the technology that underpins GeoPAT is the novel concept of pattern-based image analysis. Unlike pixel-based or object-based (OBIA) image analysis, GeoPAT partitions an image into overlapping square scenes containing 1,000-100,000 pixels and performs further processing on those scenes using pattern signature and pattern similarity – concepts first developed in the field of Content-Based Image Retrieval. This fusion of methods from two different areas of research results in orders of magnitude performance boost in application to very large images without sacrificing quality of the output.

GeoPAT v.1.0 already exists as the GRASS GIS add-on that has been developed and tested on medium resolution continental-scale datasets including the National Land Cover Dataset and the National Elevation Dataset. Proposed project will develop GeoPAT v.2.0 “much improved and extended version of the present software. We estimate an overall entry TRL for GeoPAT v.1.0 to be 3-4 and the planned exit TRL for GeoPAT v.2.0 to be 5-6.  Moreover, several new important functionalities will be added. Proposed improvements includes conversion of GeoPAT from being the GRASS add-on to stand-alone software capable of being integrated with other systems, full implementation of web-based interface, writing new modules to extent it applicability to high resolution images/rasters and medium resolution climate data, extension to spatio-temporal domain, enabling hierarchical search and segmentation, development of improved pattern signature and their similarity measures, parallelization of the code, implementation of divide and conquer strategy to speed up selected modules.

The proposed technology will contribute to a wide range of Earth Science investigations and missions through enabling extraction of information from diverse types of very large datasets.

Analyzing the entire dataset without the need of sub-dividing it due to software limitations offers important advantage of uniformity and consistency.  We propose to demonstrate the utilization of GeoPAT technology on two specific applications. The first application is a web-based, real time, visual search engine for local physiography utilizing query-by-example on the entire, global-extent SRTM 90 m resolution dataset. User selects region where process of interest is known to occur and the search engine identifies other areas around the world with similar physiographic character and thus potential for similar process. The second application is monitoring urban areas in their entirety at the high resolution including mapping of impervious surface and identifying settlements for improved disaggregation of census data.


Return to Top

Chaowel Yang, George Mason University
Mining and Utilizing Dataset Relevancy from Oceanographic Dataset (MUDROD) Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access

We propose to mine and utilize the combination of Earth Science dataset, metadata with usage metrics and user feedback to objectively extract relevance for improved data discovery and access across a NASA Distributed Active Archive Center (DAAC) and other data centers. As a point of reference, the Physical Oceanographic Distributed Active Archive Center (PO.DAAC) aims to provide datasets to facilitate scientists in selecting Earth observation data that fit better their needs in various aspects of Physical Oceanography. The TRL 5 technology of data relevance mining, developed by George Mason University (GMU), NASA, and U.S. Geological Survey (USGS) to support the Geosearch operation and contributed as open source through GeoNetwork, will be improved and tested within the PO.DAAC’s metadata-centric discovery system as a TRL 7 technology upon completion of the project. This project will focus on the following objectives and activities:

Integrating and interfacing the data relevance mining and utilizing the data relevance mining system to include the functionality of a) dataset relevance reasoning based on Jena, an open source semantic reasoning engine, b) dataset similarity calculation, c) recommendations based on dataset metadata attributes and user workflow patterns, and d) ranking results based on similarity between user search terms and dataset usage contexts.

Leveraging the PO.DAAC data science expertise and user communities to a) capture the ocean science data context and record relevant dataset relevance metrics as triple stores, b) analyze and mine user search and download patterns, c) test the developed system in an experimental environment, d) integrate the system into the PO.DAAC testbed and test the feasibility of integration for open usage and feedback.

Laying the groundwork for an objective mining and extraction service for data relevance with other data search and discovery systems, such as ECHO, GEOSS clearinghouse, and Data.gov, for data sharing across NASA and non-NASA data systems.

The proposed technology has the potential to enhance the NASA Earth Science data discovery experience by more efficiently and objectively providing scientists with the ability to discover and select the datasets most relevant to their scope of interest. TRL: Existing Data Relevance Mining and Usage/TRL5. We expect a TRL 7 exit of the technology through the two year research Keywords: PO.DAAC, Data Relevance, Mining, Reasoning, Ranking, and Recommendation.