Title of Presentation: Cross-Matching Very Large Datasets

Primary (Corresponding) Author: Maria A. Nieto-Santisteban

Organization of Primary Author: Johns Hopkins University

Co-Authors: Ani Thakar, Alex Szalay


Abstract: The primary mission of the National Virtual Observatory (NVO) is to make distributed digital archives accessible and interoperable in such a way that astronomers can maximize their potential for scientific discovery by cross-matching multi-wavelength data between multiple catalogs on-the-fly. While small datasets cross-matches are possible at present, large-scale cross-matches that involve all or a large fraction of the sky cannot be performed on demand.  This is a serious deficiency that prevents the NVO from realizing its mission and hinders its widespread acceptance by the community. In this paper, we analyze the issues and requirements that an environment aiming to enable large-scale astronomical science needs to consider, and describe a workbench environment where astronomers can cross-correlate, analyze and compare vast amounts of data and share their results with others.  We focus in catalog data managed by commercial Database Management Systems and analysis tasks expressed in SQL. We describe an indexing algorithm named Zones and its crucial role enabling parallelization and cross-matching of very large datasets.