IST Mini-NRA Selections Announced

NASA’s Office of Earth Science Awards Six Grants for Advanced Information Systems Technology

08/19/2004 – The National Aeronautics and Space Administration (NASA) has
awarded funding for six new investigations for information systems technology
development, under the Advanced Information Systems Technology (AIST) Program,
which supports NASA’s mission to understand and protect our home planet.
The proposals, selected from a field of 30 submitted proposals, focus on high-priority
information technology areas: tools for warehousing, data mining, and knowledge
discovery; technologies to facilitate queries/access of multi-disciplinary data;
and techniques to facilitate customized data services. The data mining technologies
sought address two challenge areas: ocean biology and biogeochemistry data mining,
and data mining for climate and weather models. The total funding for these
investigations, over a period of two years, is approximately $1.9 million; investigators
hail from 7 states.

The main purpose of AIST is to invest in research and development
of new and innovative information technologies to support and enhance the Earth
science capability. AIST focuses on creating mature technologies leading to
smaller, less resource-intensive and less expensive flight systems that can
be built quickly and efficiently, and on more-efficient ground-based processing
and modeling systems that improve the use of Earth science data.

The technologies selected include a statistical data mining and
machine learning toolkit whose development will enable scaling of global data
sets and integration of heterogeneous data sources to evaluate/predict the effects
of varying weather patterns on agricultural crop yields. A spatiotemporal data
mining tool will enable monitoring and modeling for multiple oceanographic objects,
such as river-based plume and harmful algae blooms.

Technologies to improve the utilization of large heterogeneous
data sets will also be developed. These include the modification of data compression
techniques for use as a data reduction method to create small summary data sets
that are substantially reduced in volume and complexity, and the wavelet analysis
of local information content in a data scene to intelligently select the density
of observations to use for weather and climate modeling.

Climate modeling and prediction techniques will be further enhanced
through the development of data mining and knowledge discovery tools. A suite
of data mining tools based on new information-theoretic techniques will enable
rapid identification, characterization and quantification of causal interactions
among relevant climate variables in large distributed data sets, allowing evaluation
and prediction of climate and climate subsystem changes over time in response
to natural and human-induced changes. Data mining and knowledge discovery techniques
will facilitate analysis, visualization, and modeling of land-surface variables
obtained from the TERRA and AQUA platforms in support of climate and weather
applications to enable better parameterization of the relevant processes in
forecast models for weather and inter-annual climate prediction.

The investigations selected by NASA’s Earth Science Technology
Office are

Braverman, Amy (Jet Propulsion
Laboratory (JPL), Pasadena, CA):
Mining Massive Earth Science Data Sets for Climate and Weather Forecast
Models
Cai, Yang (Carnegie Mellon
University, Pittsburgh, PA):
Data Mining System for Tracking and Modeling Ocean Object Movement
Hoffman, Ross (Atmospheric
and Environmental Research (AER) Incorporated, Lexington, MA):
Selection Technique for Thinning Satellite Data for Numerical Weather
Prediction
Knuth, Kevin (NASA Ames Research
Center, Moffett Field, CA):
Rapid Characterization of Causal Interactions among Climate/Weather
System Variables: An Advanced Information-Theoretic Technique
Kumar, Praveen (University of
Illinois, Urbana, IL):
Data Mining for Understanding the Dynamic Evolution of Land-Surface
Variables: Technology Demonstration Using the D2K Platform
Wagstaff, Kiri (JPL, Pasadena,
CA):
Interactive Analysis of Heterogeneous Data to Determine the Impact
of Weather on Crop Yield

Title Mining Massive Earth Science Data Sets for Climate
and Weather Forecast Models
Full Name Amy Braverman
Institution Name JPL
Proposal # AIST-QRS-04-3014
In this proposal we address the technology
objectives specified in Section I.1. of the Mini-AIST NRA announced May
5, 2004. Specifically, we will provide tools and support for data warehousing,
data mining, and knowledge discovery for the ESE science challenge posed
in Section I.2.2.b: Data Preparation for Medium Range Weather Forecasts.
The sheer volume of Earth science data precludes interactive, real-time
scientific exploration required to characterize and understand features
that can inform and improve physical models. We propose to solve this
problem by creating small, reduced volume and complexity summary data
sets which can be used in place of the original as input to models, or
for comparisons to model output. We propose using data compression techniques,
modified for use as data reduction methods, to create summary data sets
of small size and high accuracy for observational data from AIRS, MISR,
ISCCP (International Satellite Cloud Climatology Project) together with
model data from NCAR’s CAM3 and GFDL’s AM2 atmospheric models.
These summary data sets can be thought of as “thinned” in the sense
of retaining representative observations which, taken together, preserve
the statistical and distributional character of the original data. The
summary data can therefore also be used to create customized data products
that estimate features of modelers’ choice. Our technology is currently
at TRL 4, and we expect to achieve TRL 6 in the 24-month performance period.

Title Spatiotemporal Data Mining System for Tracking
and Modeling Ocean Object Movement
Full Name Yang Cai
Institution Name Carnegie Mellon University
Proposal # AIST-QRS-04-3031
Tracking and modeling spatiotemporal
dynamics of ocean objects are essential to ESE missions in oceanographic
studies, such as monitoring and predicting harmful algal blooms along
the coastline, or river-based plume discharged to the open ocean.

In this project, we propose a spatiotemporal data mining system for following
objectives: 1) tracking the movement of ocean objects that have been identified;
2) discovering the correlations between the object attributes and satellite
readings from multiple databases; 3) predicating the movement of identified
objects.

This generalized spatiotemporal data mining tool enables monitoring and
modeling for multiple oceanographic objects, such as plume and harmful
algal blooms. This may also be applied to other spatiotemporal problems,
such as monitoring dust storms.

We will use SeaWiFS database as our main source. Meanwhile, we will explore
the use of other remote sensing databases such as MODIS.

The technology would be based on our lab prototypes of multi-sensor data
mining framework with the entrance Technical Readiness Level 4. The project
deliverable would reach TRL 5 to 6. The total time for this project is
for two years.

The Co-PI Dr. Richard P. Stumpf, Oceanographer from NOAA will specify
the requirements for the data mining tool and validate the product with
field data. Dr. Han-Shou Liu, Geophysicist of GSFC, will support computational
models for data mining.

Title Selection Technique for Thinning Satellite Data
for Numerical Weather Prediction
Full Name Ross Hoffman
Institution Name Atmospheric and Environmental Research, Inc.
Proposal # AIST-QRS-04-3019
Operational weather prediction centers
use only a fraction of observations of the atmosphere and the earth’s
surface that are made by satellite, in situ, and ground-based instruments.
In many cases satellite data are selected by regular decimation, i.e.,
every nth observation. The objective of this proposed project is to develop
a more intelligent selection method that uses the local information content
in a data scene to determine the density of observations to use. The method
will be based on a wavelet analysis of the satellite data. Tests using
QuikSCAT scatterometer wind observations in analysis and forecast systems
will compare results based on ALL of the data to results from the REGULAR
and WAVELET selections. A two year level of effort is proposed to advance
the TRL of the method from 4 to 6.

Title Rapid Characterization of Causal Interactions
among Climate/Weather System Variables: An Advanced Information-Theoretic
Approach
Full Name Kevin Knuth
Institution Name NASA Ames Research Center
Proposal # AIST-QRS-04-3010
The NASA Earth Science Enterprise
is focused on obtaining a better understanding of our home planet. While
it is clear that the Earth’s climate changes over time, it is not
known how this change occurs, what the primary causes of change are, or
how climate subsystems respond to natural and human-induced changes. Vast
amounts of data are being collected on the Earth climate system and it
is increasingly important to rapidly discover relevant climate variables,
and qualify and quantify their causal interactions. We will develop a
suite of data-mining tools based on new information-theoretic techniques
to rapidly identify, characterize, and quantify causal interactions among
relevant climate variables in large distributed datasets. This information-theoretic
approach relies on established quantities such as mutual information in
addition to novel quantities called co-informations and derived quantities
such as transfer entropy, which enable us to quantify complex causal interactions
over different spatiotemporal scales. We have demonstrated these techniques
at TRL 4, and during the period of performance from 10/01/04 through 9/30/06
the development of these tools will take them to TRL 6. In addition to
quantifying causal interactions, these tools will also quantify the errors
in the estimates thus quantifying inherent uncertainties in the results.
These uncertainties are crucial to accurately evaluating our state of
knowledge about the climate system, which is an element of key interest
to the US Climate Change Science Program. These measures will be demonstrated
using important climate datasets including MODIS, TRMM, and the International
Satellite Cloud Climatology Project (ISCCP) dataset.

Title Data Mining for Understanding the Dynamic Evolution
of Land-Surface Variables: Technology Demonstration using the D2K Platform
Full Name Praveen Kumar
Institution Name University of Illinois
Proposal # AIST-QRS-04-3015
The objective of this research proposal
is to develop data mining and knowledge discovery in databases (KDD) techniques,
using the “Data to Knowledge” (D2K) platform developed by
National Center for Supercomputing Application (NCSA), to facilitate analysis,
visualization and modeling of land-surface variables obtained from the
TERRA and AQUA platforms in support of climate and weather applications.
The specific technology objective addressed is: Tools and support for
data mining and knowledge discovery. The ESE science challenge addressed
is: Data mining for climate and weather applications. The specific science
questions that this project will focus on are:
1. How are evolving surface variables such as vegetation indices, temperature,
and emissivity, as obtained from the TERRA and AQUA platforms, dynamically
linked?
2. How do they evolve in response to climate variability such as ENSO
(El Niño Southern Oscillation)? and
3. How are they dependent on temporally invariant factors such as topography
(and derived variables such as slope, aspect, nearness to streams), soil
characteristics, land cover classification, etc?
Answers to these questions, at the continental to global scales, using
data mining, will enable us to develop better parameterization of the
relevant processes in forecast models for weather, and inter-seasonal
to inter-annual climate prediction.

The entry Technology Readiness Level (TRL) for the project is 4 and we
expect that after the two year project duration the exit TRL will be 6.

Title Selection Interactive Analysis of Heterogeneous
Data to Determine the Impact of Weather on Crop Yield
Full Name Kiri Wagstaff
Institution Name JPL
Proposal # AIST-QRS-04-3004
We will develop a versatile toolkit
for statistical data mining and machine learning that will enable (1)
rapid, in-depth analysis of subtle relationships between multiple, different
science data products, and (2) efficient testing of competing scientific
hypotheses. The toolkit will feature advanced methods that are optimized
for the analysis of data with spatial dependencies. We will include technologies
for classification (support vector machines), clustering (using spatial
constraints), and prediction (multivariate spatial models) that currently
exist only as standalone methods (TRL 4). These techniques will be refined
and demonstrated in a critical scientific investigation: a study of the
fine-scale effects of varying weather patterns on agricultural crop yields.
The final system will be an integrated toolkit with an easy-to-use graphical
interface, demonstrated on full-scale science data (TRL 6). As a result
of this work, scientists will be able to easily answer questions such
as, “What is the impact on corn yield if Kansas receives only 75%
of its normal rainfall?” Benefits over the state of the art include
a) analysis methods that scale to global data sets and b) the ability
to integrate heterogeneous data sources for improved prediction accuracy.
The anticipated period of performance is October 2004 to September 2006.