Title: Empowering Data Management, Diagnostics, and Visualization of Cloud-Resolving Models (CRMs) via a Cloud Library based upon Spark and Hadoop
Presenting Author: Wei-Kuo Tao
Organization: NASA GSFC
Co-Author(s): Shujia Zhou, Northrop Grumman Information Technology Xiaowen Li, Morgan State University Xian-He Sun, Illinois Institute Technolog Toshihisa Matsui, University of Maryland, College Park

Abstract:
In AIST14 project, we have developed a Super Cloud Library (SCL), capable of high-resolution cloud resolving model (CRM) database management (IO control and compression), distribution, visualization, subsetting, and evaluation. SCL architecture is built upon a Hadoop framework. The Hadoop distributed file system (HDFS) is a stable, distributed, scalable and portable file-system. The Hadoop framework supports Python, which enables 2D and 3D visualization via IDL code wrappers. Furthermore, Hadoop R enables various standard/non-standard statistics and their visualization. Within the Hadoop framework, a CRM’s diagnostic capabilities are further enhanced with Spark, built on top of HDSF, which accelerates the Hadoop MapReduce process by ~100 times. The SCL is built on the NCCS Discover system, which directly stores various CRM simulations, including the NASA-Unified Weather Research and Forecasting (NU-WRF) and Goddard Cumulus Ensemble (GCE) models. Thus, SCL users can conduct large-scale on-demand tasks automatically, without the need to download voluminous CRM datasets and various observations from NASA field campaigns and satellite data to a local computer.