Title: Reproducible Containers for Process-oriented Collaborative Analytics
Presenting Author: Kwo-Sen Kuo
Organization: Bayesics, LLC.
Co-Author(s): Tanu Malik, Moaz Reyad, Ashish Gehani, Michael Rilee, Jiun-Dar Chern, Rohan Tikmany, Aniket Modi, Krishna Kamath, Niklas Griessbaum, Mike Bauer, That Dai Hai Ton

Abstract:
Reproducibility is paramount for digital twins since, without the measures to enforce reproducible behavior, the veracity of and trust for DTs come into question. Our AIST project, "Reproducible Containers for Process-Oriented Collaborative Analytics,” provides a data-efficient containerization technology for reproducibility and demonstrates it with a process-oriented model diagnostics (POMD) use scenario. It aims to provide scientists with the necessary libraries, tools, and container interfaces to conduct POMD, for which we have designed an exercise using precipitation features (PFs) derived from model outputs and satellite observations. In the exercise, we use the STARE technology to enable efficient comparisons between PFs derived from GEOS5 outputs (simulation) and those derived from the Global Precipitation Measurement (GPM) mission’s IMERG data product (observation). We have developed containers for the exercise with I/O specialization, maximizing container storage efficiency without jeopardizing reproducibility by implementing byte-level and object-level carver libraries. We are currently developing the more compute- and memory-intensive capabilities, which require parallel and distributed technologies. Scientists can build their experiments on their local machines or in the Cloud, such as the Science Managed Cloud Environment (SMCE), and use our technology to capture, store, and reproduce their analysis results. We have successfully tested our system with other projects. The efficiency of our system and its benefits are very encouraging and exciting.