Title: Minerva: Data Intensive Analysis and Visualization on NASA's NEX platform using Spark
Presenting Author: Aashish Chaudhary
Organization: Kitware Inc.
Co-Author(s):
Petr Votava, Dr. Rama Nemani, Chris Kotfila, Michael Grauer, Jonathan Beezley

Abstract:
Kitware (http://resonant.kitware.com) and NASA have been working together on building data-driven workflows targeting big data for NASA's OpenNEX initiative using cutting-edge technologies such as Spark. NEX (https://nex.nasa.gov/nex) is a collaborative platform that brings together a state-of-the-art computing facility with large volumes (hundreds of terabytes) of NASA satellite and climate data as well as a number of modeling and data analysis tools and services. Our ongoing web-based data analytics toolkit Minerva (https://github.com/kitware/minerva), seamlessly interfaces with both HPC and cloud environments, providing data management, analysis, and visualization. The main goal of the project is to develop and deploy large-scale data analysis and visualization pipelines that enable sharing provenance enhanced results with the wider scientific community. Minerva is a platform built on top of our open source data management tools Girder, workflow engine Romanesco, and GeoJs for web-visualization of large climate and other related scientific datasets. Our on-going efforts include enabling the use of Spark in a workflow execution model, developing and providing support for scientific data in Spark, and engaging with the community to create scientific tools and frameworks that can leverage Spark to its fullest potential. In this talk, we will present our work on Minerva, various improvements we have made to enable running Spark for scientific computing, and describe some of the challenges and opportunities in the domain of HPC and data science.