Title: An Environment for Systemizing Data Preparation and Machine/Deep Learning
Presenting Author: Kwo-Sen Kuo
Organization: Bayesics, LLC
Co-Author(s): Thomas L Clune Khoa Doan Amy Lin Amidu O Oloso Michael L Rilee

Abstract:
We have developed the essential components of an efficient data-intensive environment for systemizing data preparation and thus machine learning and deep learning. There is little doubt that machine/deep learning can greatly boost scientific research and application productivities. However, it is reported that data preparation often takes disproportional amount of time and effort of a machine learning endeavor. This unfortunate consequence is indeed inevitable with our present 2-step data practice, i.e. packaging data in files first and cataloging the metadata for discovery. Our data-intensive environment preloads and fully structures the data using an innovative indexing scheme that homogenizes the diverse variety of datasets. This indexing scheme supports data placement alignment for the most common data analysis operations, optimizing performance. In addition, we have incorporated a suite of re-gridding/re-mapping tools that further facilitates integrative analyses and consistent interpretations with diverse datasets. This data environment offers the promise not only to ease researchers' day-to-day data-intensive analysis demands but also to systemize data preparation and thus machine learning and deep learning operations.