Title of Presentation: Solution Service Composition for Analysis of Online Science Data

Primary (Corresponding) Author: Sara Graves

Organization of Primary Author: University of Alabama in Huntsville

Co-Authors: Rahul Ramachandran, Ken Keiser, Manil Maskey, Christopher Lynnes and Long Pham


Abstract:  This NASA ACCESS project is creating a suite of specialized deployable data mining web services designed specifically for science data.  The project leverages the Algorithm Development and Mining (ADaM) toolkit as the base data analysis components.  The ADaM toolkit is a robust, mature and freely available science data mining toolkit that is being used by different research organizations and educational institutions worldwide.  These deployable services will provide the scientific community a powerful and versatile data analysis capability that can be used to create higher order products from current and future NASA satellite data records with methods that are not currently available.  Many of the data mining, pattern recognition, image processing and data preparation algorithms in ADaM are specifically geared towards satellite imagery, making these tools a perfect fit for NASA satellite data.  The deployable package of mining and related services have been developed using web services standards so that community based measurement processing systems can access and interoperate with them.  The maturation of web services standards and technology sets the stage for a distributed “Service-Oriented Architecture” (SOA) for NASA's next generation science data processing.  This architecture will allow members of the scientific community to create and combine persistent distributed data processing services and make them available to other users over the internet.  The project team has adapted workflow tools that support the construction of BPEL (business process execution language) definitions.  The definitions are then deployed so the workflows can be executed at a distributed data repository. This allows remote users to define and execute analysis processes against large amounts of Earth science data available at NASA data archives.  The Goddard Earth Sciences Data and Information Services Center (GES DISC) is collaborating as an operational data repository for this project.