Title: Hyperspectral Image Analysis in the Cloud via a Functional Data Model
Presenting Author: Anne Wilson
Organization: Laboratory for Atmospheric and Space Physics (LASP)
Co-Author(s): Odele Coddington, Peter Pilewski, Doug Lindholm

Abstract:
Researchers in the Earth remote sensing community need tools to effectively handle the massive data volumes presented by hyperspectral imagery. An analysis could demand 100TB of storage. The Hylatis project is building a toolset in the cloud to support interactive analysis of very large volumes needed for hyperspectral analysis. Via this toolset development, Hylatis explores the deeper informatics question of modeling scientific data. Because data are foundational in any analysis system, how they are represented in the software impacts how easy or hard it is to perform arbitrary operations on them. In particular, infusing scientific domain semantics at the data model level limits the datasets that can be integrated into a system due to a data model mismatch situation. This occurs particularly when integrating data from disparate domains. For example, many Earth data access and analysis tools provide the semantics of geolocated data looking at Earth, but do not have semantics and thus capabilities for integrating spectral data or time series data from space. These data model mismatches require increase coding effort and complexity to resolve, which in turn increases the project cost and potential for errors in translation. Hylatis leverages the LaTiS middleware, which uses a functional data model, in the mathematical sense, to represent a dataset as a domain agnostic, algebraic function of independent and dependent variables. Scientific domain semantics are removed and reapplied at other layers in the LaTiS framework. At the domain neutral pure data layer, mathematic semantics are available for any computational purpose, supporting server side computations of any kind. As nearly any dataset can be modeled this way, the problem of data model mismatch is avoided. LaTiS uses a functional programming style of coding. Of particular value to science, pure functional programming better supports both parallelization and proofs of correctness.