Title: Empowering Data Management, Diagnosis, and Visualization of Cloud-Resolving Models by Cloud Library upon Spark and Hadoop
Presenting Author: Wei-Kuo Tao
Organization: NASA GSFC
Co-Author(s):
Shujia Zhou (Northrop Grumman Information Technology, MD) Xian-He Sun (Illinois Institute of Technoloy, Chicago, Il) Toshihisa Matsui (University of Maryland, College Park, MD) Xiaowen Li (Morgan State University, Baltimore, MD)

Abstract:
A Super Cloud Library (SCL), capable of CRM database management (IO control and compression), distribution, visualization, subsetting, and evaluation is developed. The SCL architecture is built upon a Hadoop framework. The Hadoop distributed file system (HDFS) is a stable, distributed, scalable and portable file-system. The Hadoop framework supports Python, which enables 2D and 3D visualization through wrapping IDL codes. In addition, Hadoop R enables various standard/non-standard statistics and their visualization. Within the Hadoop framework, CRM's diagnostic capability is further enhanced with Spark, built on top of HDSF, which accelerates Hadoop MapReduce process by ~100 times. SCL is built on the NCCS Discover system, which directly stores various CRM simulations, including the NASA-Unified Weather Research and Forecasting (NU-Forecast (WRF) and Goddard Cumulus Ensemble (GCE) models. Thus, SCL users can conduct large-scale on-demand tasks automatically, without downloading volu-minous CRM datasets and various observations from NASA Field Campaigns and Satellite data to a local computer.