Title: Semi-Automatic Science Workflow Synthesis for High-End Computing on the NASA Earth Exchange
Presenting Author: Petr Votava
Organization: NASA Ames
Co-Author(s): R. Nemani

Abstract:
NASA Earth Exchange (NEX) is a collaboration platform for the Earth science community that provides a mechanism for scientific collaboration, knowledge and data sharing together with direct access to almost 1PB of Earth science data and 10,000-cores processing system. As one of the ways to deal with scientific processes within NEX, we have been working with VisTrails scientific workflow management system - an open-source solution that also provides a fairly comprehensive provenance infrastructure. In order to improve VisTrails integration for scientists and researchers, we have first started by developing a tool for automated identification of processes within user science codes and scripts and converting these processes automatically into VisTrails workflow components. One of the benefits of this tool was an immediate improvement in transparent data provenance capture for NEX users with only minimal impact to their work environment. This presentation describes this provenance capture and management in the context of NEX provenance architecture as well as its application in the Web-enabled Landsat Data (WELD) project – a multistage petabyte Landsat processing pipeline that is currently deployed on NEX at the NASA Advanced Supercomputer Division (NAS). We will describe our handling of multiple internal and external Landsat metadata components as well as data, process and executable provenance.