Resources at ARSC Science at ARSC Newsroom User Services About ARSC ARSC Home

Inside this Issue

Navigating Information Space
Dr. Greg Newby, ARSC/UAF, Assistant Research Professor
Dr. Buck Sharpton, UAF President’s Professor for Remote Sensing and GINA Program Director

Story by Jenn Wagaman

smoke over alaska

A GINA satellite image shows smoke over the state of Alaska on June 30, one of the peak days of the 2004 fire season in the state. (Image courtesy of Dr. Buck Sharpton)

Data is everywhere, and getting it fast and getting the right information is a challenge for everyone. But perhaps more important than getting data to satisfy everyday needs, like finding information about a particular subject online, is how to get large quantities of important data to decision makers in times of crisis or emergency.

During the summer of 2004, Alaska experienced a severe wildfire season. By September, 655 fires had burned in the state, covering over 6.5 million acres of land. The Geographic Information Network of Alaska (GINA) satellite reception ground station at the University of Alaska Fairbanks (UAF) was able to receive images of the fires and the smoke covering much of the landscape at times when aerial fly-overs were impossible due to the unsafe conditions and lack of visibility from smoke. NASA satellites pass over Alaska many times each day and send their data to GINA’s ground station where it is processed and delivered to fire management agencies within 30 minutes of acquisition. This information was particularly helpful this year in discovering fires in Alaska’s remote wilderness and in helping agencies decide how to deploy their fire fighting assets. But this is only one of the many possible ways that satellite imagery and other geospatial data can help meet the challenges our state faces each year. Scientists at ARSC and UAF are working together to realize this potential.

Dr. Buck Sharpton, UAF President’s Professor for Remote Sensing and GINA Program Director, and Dr. Greg Newby, ARSC, are using Information Retrieval (IR) theory and leading-edge predictive modeling to explore ways to make geospatial data more accessible and more powerful. The application of automated document indexing and other statistical IR techniques can more precisely “tune” spatial metadata search results to the searcher’s needs. The integration of high-precision satellite information products with adaptive agent-based modeling techniques could provide better capabilities for understanding natural system components and forecasting change associated with climate variability or human development.

The Field of Information Retrieval

Although the birth and growth of the world wide web has increased demand in IR, the field itself has been in existence for many years. Forty years ago, researchers had invented methods of organizing and searching information that some argue are often being reinvented by today’s IR specialists. At the same time, with the plethora of information available using current technology, better ways to search, sort, find and access that information are still needed.

Today, the average human receives more information in a day than the average human a century ago received in a lifetime, but our ability to process that information has not changed. Thus, IR researchers are constantly searching for ways to make finding exactly what is needed easier and more accurate.

Imagine that you are planning a trip to Alaska, and you want to find out where you can learn to mush dogs during your stay. In order to look up this information using a web search engine, you must decide on a query to facilitate your search. This query is, by its very nature, going to restrict which documents are searched. Additionally, the search engine you use must in some way pinpoint particular documents that match your query, and then rank the results in a way that is meaningful. This process has many flaws that can inadvertently lead you down the wrong path, or prevent you from accessing the most pertinent information. IR researchers are working toward new ways to search and present information so that users get the right information quickly and on the first try.

Newer information retrieval systems are focusing on ways to get the information the user is seeking by tailoring information queries to the user. These systems refine the user profile over time by building a model of the user’s long-term interests. By doing this, when a user presents a query, the system can consider the immediate query and combine it with their particular long-term profile to narrow the search results.

Information visualization is another approach researchers are investigating for people to access, process and understand information. How information is arranged on a screen or other device is important to how the user processes that information. Eliminating or obfuscating unimportant information can allow a user quick, accurate access.

Each of these aspects of IR are being considered in Newby and Sharpton’s work. By taking organization, retrieval and visualization into account, the researchers can facilitate easy access to geospatial and other data.

Creating Toolkits and Visualizing Data

irtools

IRTools query sample page from Dr. Greg Newby’s toolkit.

ARSC faculty member Greg Newby is working on several other IR applications as well. Currently, his main research is the development of the IRTools toolkit (http://sourceforge.net/projects/irtools). IRTools is a software toolkit intended for research and built to operate as a programmer’s toolkit for IR experimentation. The kit encompasses several major IR models including the vector space model, or VSM; Boolean retrieval; and variations on latent semantic indexing, or LSI. It enables both interactive use via a web-based front end, and batch-oriented retrieval for experimenting.

IRTools aims to consider a document beyond the “bag of words” level. Looking at documents as a bag of words does not take each document’s structure into account. The document structure can provide important insights into the relevance and content of the particular document. IRTools does take the document structure into account, enabling new types of queries that allow users to search structures, such as html tags (including headings, paragraph text, and titles). Additionally, Newby is exploring ways to uniquely weigh a term occurrence within a document. IRTools is one of several systems being adopted as a reference system for Grid Information Retrieval, a working group under the Global Grid Forum.

Newby is also working on development of the Yavi system as a way to visualize data. The Yavi system navigates through an information space in three dimensions. The information seeker can select terms, then choose from the documents containing those terms.

To achieve this, a document correlation matrix is derived from the term by document occurrence matrix. The information space is then generated by performing an eigensystems analysis on the term. The occurrence matrix is simply a list of all terms and their count in each document. Most terms do not occur in most documents, so the matrix is quite sparse. The resulting multidimensional information space is a geometric representation of the statistical relations among terms and documents. This statistical relation has been shown, through human subjects research conducted by Newby, to approximate the cognitive space of human information seekers. A new version of Yavi that is under development will be able to interactively generate the information space from a set of terms and documents produced by IRTools. end

Page 1 | Challenges Index

 

Arctic Region Supercomputing Center | PO Box 756020, Fairbanks, AK 99775 | voice: 907-450-8600 | email:

home | search | about | support | news | science | resources