Title: Illuminating the Darkness: Exploiting Untapped Data and Information Resources in Earth Science
Presenting Author: Rahul Ramachandran
Organization: NASA MSFC

Abstract:
One of the continuing challenges in any Earth science investigation is the amount of time and effort required for data preparation before analysis can begin. Data preparation covers discovery, access and preprocessing of useful science content from the increasingly large volumes of Earth science data and related information available. While current Earth science data and information systems are designed to support discrete steps within the data to knowledge process, these disparate systems have their own shortcomings. For example, the current data search systems are designed with the assumption that researchers find data primarily by metadata searches on instrument or geophysical keywords, assuming that users have sufficient knowledge of the domain vocabulary to be able to effectively utilize the search catalogs. These systems work well for those who know exactly which data sets they need, but they lack support for new or interdisciplinary researchers who may be unfamiliar with the domain vocabulary or the breadth of relevant data available. There is clearly a need to innovate and evolve current data and information systems in order to enable data discovery and exploration as an integrated process rather than a series of discrete steps, thereby substantially reducing the data preparation time and effort for the users of Earth science data. We contend that Earth science metadata assets are dark resources, information resources that organizations collect, process, and store for regular business or operational activities but fail to utilize for other purposes. The challenge for any organization is to recognize, identify and effectively utilize the dark data stores in their institutional repositories to better serve their stakeholders. NASA Earth science metadata catalogs contain dark resources consisting of structured information, free form descriptions of data and pre-generated images. For example, the NASA EOS Clearing House (ECHO) operational catalog serves the Earth science community worldwide by allowing users to search and discover data sets available at the twelve distributed NASA data archives. As of Aug 4, 2013 (ECHO 2013), the ECHO catalog holds 3666 data collections metadata and over 127 million metadata records for individual data files. In addition, it holds information for 67 million browse images. With the addition of emerging semantic technologies, such catalogs can be fully utilized beyond their original design intent of supporting current search functionality, to provide novel data discovery and exploration pathways to science and education communities. In this contribution, we present the initial design of a Semantic Middleware Layer (SML) to exploit these metadata resources and provide novel data discovery and exploration capabilities with the goal of significantly reducing data preparation time. SML utilizes a varied set of semantic web and image mining technologies. This 'middle layer' supports new data discovery and pathways using imagery, structured and free form text, and enable automation of commonly required preprocessing and exploratory analysis tasks. SML is a Service Oriented Architecture (SOA) to allow individual components to be reused and easily integrated into NASA's data and information systems. SML works with existing Earth science data system resources, rather than requiring significant new work or system redesign as needed for a linked data approach (http://linkeddata.org). We also discuss plans for a prototype Event Nexus Discovery Client to link data, imagery and information resources around Earth Science phenomena and specific events.