NSTC2007

Title of Presentation: Information Extraction and Knowledge Discovery from High-Dimensional and High-Volume Complex Data Sets through Precision Manifold Learning

Primary (Corresponding) Author: Erzsébet Merényi

Organization of Primary Author: Rice University

Co-Authors: William H. Farrand, Robert H. Brown, Thomas Villmann, Colin Fyfe

Abstract: We have been developing HyperEye, an algorithm and data analysis environment for “precision” learning of compli-cated manifold structures. Its purpose is fast, accurate and detailed identification of relevant or critical information from large and complex data sets, such as hyperspectral imagery, from space and Earth science missions.

The core of HyperEye consists of biologically inspired (Hebbian, self-organizing type) neural learning machines and hybrid architectures thereof. The salient properties are:

a) Non-standard (for example, information theoretically motivated) features compared to basic unsupervised cluster-ing with self-organizing maps, and compared to commonly used supervised neural learning for classification; b) Re-liability and robustness, including principled assessment of the quality of extracted information (such as the assess-ment of quality of unsupervised clustering without external information), which can be used as feedback for improve-ment of the learning; c) Potential for implementation in massively parallel hardware, for (near-) real time onboard processing.

These innovations result from our research and enable precise determination of the cluster structure of high-dimensio-nal data manifolds without having to compromise discovery potential with prior dimension reduction – a task at which many other methods fail or severely underperform. (We have handled spectral data up to 2300 channels.) This allows effective discovery of small, interesting, surprising clusters from complicated high-dimensional data, as well as sub-sequent accurate classification of many classes, and provides benchmark performance for feature extraction and com-pression methods. We also developed data compression with non-linear neural “relevance learning”, which retains more information with less number of selected spectral bands than PCA, wavelets, or a human expert, for a given classification. One aspect driving our research is automation, with onboard decision support, or processing of large archives, in mind.

We will present current capabilities through results from analyses of spectral and hyperspectral Earth and space science imagery.