Title: Mining the scientific user knowledge from log files to improve data discovery
Presenting Author: Chaowei Yang
Organization: George Mason University
Co-Author(s):
Ed Armstrong, Dave, Moroni, Thomas Huang, Chris Finch, JPL Yongyao Jiang, Yun Li, JPL

Abstract:
Data discovery accuracy is a challenging topic for both Earth science and other domains. It is especially true for scientific data sets that are not used so popular as Amazon or Google data. This presentation will introduce the collaborative 2014 AIST project on mining oceanic knowledge from the PO.DAAC user log files to improve the end user data discovery experience at PO.DAAC. There are three steps in the research: a) the oceanographic semantics were extracted from three resources of SWEET, GCMD ontology, and the keywords used by end users for searching PO.DAAC datasets, b) mining the linkage among different vocabularies based on user data discvoery sessions, and c) build the linkage among vocabularies based on a comprehensive approach by considering domain de facto standard, e.g., SWEET and GCMD, and the knowledge mined from the log files. The semantics is used to improve data discovery for ranking results, navigating among vocabularies, and recommending data based on user searchers. The project results is being tested by GMU and JPL collaborative team and will be released as open source and integrated into PO.DAAC and other data discovery systems.