Title of Paper: WebTheme: Visual Text Mining for the World-Wide Web


Principal Author: Mr. Mark Whiting


Abstract: WebThemeTM is an interactive, visual text mining prototype being developed for the NASA Goddard Space Flight Center’s Earth Science Technology Office. The system uses text visualization approaches to enhance users’ access to NASA data and supports researchers seeking relevant documents across the Internet. WebTheme employs software agents to perform user-directed document harvesting from the World Wide Web. The results are presented as visual representations to help users quickly interpret the contents of the documents in relationship to each other. The tool allows users to explore themes and concepts across the document collection. WebTheme can be used to comprehensively understand the contents of a collection of web documents chosen from disparate locations or it can be used to understand the organization and content of a web site. WebTheme also interfaces with search engines to visualize the results of a query. The visualization displays include a 2-D representation called a “Galaxies” view which uses a starfield paradigm and a 3-D representation known as a “ThemeView” employing a landscape depiction. Analysis tools developed as part of the system include a Document Viewer, a Gisting tool, a Query-by-Example and by-Keyword tool, and a Grouping (subsetting) tool. Hyperlinks among harvested documents may also be shown resulting in a layered information display. WebTheme is a client-server system, so that users may operate and review results on desktop machines. Java-enabled web-browser clients run on PC, Macintosh, and Unix systems. The WebTheme server runs on Solaris, Irix, or Linux systems. One application of WebTheme has been to improve user access to the Center for International Earth Science Information Network (CIESIN) Socio-economic Data and Applications Center’s (SEDAC) textual data through the use of visualization techniques. Our accomplishments this year in the evolution of the WebTheme prototype include support for the Z39.50 protocol, an enhanced login and password protection scheme, and updated client installation support.