Title: Next-Generation Hybrid-cloud Science Data System (HySDS) In Support of Big Data Analytics, Processing, Monitoring, and Hazards Response
Presenting Author: Hook Hua
Organization: NASA Jet Propulsion Laboratory / Caltech

Abstract:
The Hybrid-cloud Science Data System (HySDS) provides a cloud computing fabric approach to science data system capabilities for data ingestion, metadata extraction, cataloging/indexing, and high-volume data processing to Big Data streams. It has been infused into the Advanced Rapid Imaging & Analysis for Monitoring Hazard (ARIA-MH) project for continuous processing of geodetic InSAR data streams as well as infused into atmospheric processing of merged A-Train data records. But by also leveraging a faceted approach to interfacing with the science data system, it has enabled situational awareness of both the data products as well as the science data system health. Furthermore, a novel faceted rules approach has been employed to enable custom monitoring and user-driven actions to automatically monitor specific conditions in the data stream and trigger automatic processing of subsequent data products. Our ARIA-MH science data system has enabled both science and decision-making communities to monitor areas of interest with derived geodetic data products via seamless data preparation, processing, discovery, and access. We will present our findings on the use of hybrid-cloud computing to improve the timely processing, monitoring, and delivery of geodetic data products, provisional scaling of computing and storage needs, process migration, as well as providing faceted browse results for quick looks with other tools for integrative analysis. Additionally, the sheer data volumes needed to handle a continuous stream of InSAR data sets also presents a bottleneck. It has been estimated that continuous processing of InSAR coverage of California alone over 3-years would reach PB-scale data volumes. Early estimates of the upcoming NISAR mission is expecting upwards of ~70TB data products to be generated per day. We will also show some results of how these hybrid-cloud architecture can handle these expected data volumes.