Stanford University News Service



CONTACT: Stanford University News Service (650) 723-2558

Creating a Dewey decimal system for the data superhighway

STANFORD -- Imagine trying to find information in a library where the books are organized differently on every shelf.

It's much like that for people searching for information on the Internet. The growing number of electronic libraries available over the data highway employ a variety of different methods to index and search for information.

To address that problem, Stanford's integrated digital library project is attempting to invent the electronic equivalent of the Dewey decimal system: a simple and consistent interface that enables people to find the electronic information that they seek, regardless of where the information is stored and how it is organized.

Hector Garcia-Molina, professor of computer science and electrical engineering, provided an overview of the $3.6 million project that he directs at the annual meeting of the American Association for the Advancement of Science in Atlanta on Sunday, Feb. 19.

The Stanford project was one of six digital library projects begun last September as a result of a joint $24.4 million initiative by the National Science Foundation, the Department of Defense Advanced Research Projects Agency and the National Aeronautics and Space Administration. The other five projects involve setting up prototype digital libraries of various sorts. But the four-year Stanford effort is designed to create a “virtual library” by providing Internet users with a seamless interface to the wide variety of information sources and collections becoming available on the network.

The Stanford project is a joint effort with researchers from Dialog Information Services, Hewlett-Packard, Co. NASA/Ames Research Center, the Association for Computing Machinery, Interconnect Technologies Corp., Enterprise Integration Technologies, Bell Communications Research, Interval Research Corp., O'Reilly and Associates, WAIS Inc. and Xerox Palo Alto Research Center.

"Today, digital libraries come in a number of different architectures and file structures. This variety makes it very difficult for people to find the information they are looking for,” said Garcia-Molina. “So we intend to develop a common environment that links everything from personal information to library collections to large research databases."

The basis of this environment will be something that the researchers call an "information bus." It will consist of basic concepts, language and protocols that can tie together the materials, services and users of information. Special programs, called protocol machines, will be developed for specific digital libraries. These will translate between the library and the information bus. At the other end, special client interfaces will be developed that connect the user to the information bus.

One of the interfaces they will be experimenting with is an information map. This will be like a street map, except that it will map information structures, and will allow users to move about by pointing and clicking at different parts of the map. Another approach is to use animation: for example, moving down a data highway and passing road signs that tell users the information located in each block.

In addition to the technical problems it will address, the project will directly tackle concerns about the cost of information and such critical issues as protection of intellectual property rights, privacy and security of personal information.

Because society has not yet agreed on how to apply these basic principles to the digital realm, the researchers will be attacking these problems analytically and experimentally. They will develop possible solutions, build them into demos and see how well they work.

In the area of intellectual property rights, for example, the researchers will develop a "copy detection service." This will be a registry where people can send their documents. Once the documents are registered, the service can compare them with questionable documents to determine the extent of duplication. "In some preliminary tests, we find that we get about a 5 percent random match," said Garcia-Molina.

Another type of service the scientists will develop is one that allows users to create their own automated agents for library services. Agents are a general faculty for providing help. They can automate tasks, navigate to specific locations, notify users when certain conditions occur and exchange information with other agents.

For example, a user might tell an agent to find all the information on the network involving digital libraries. The agent would then search all the libraries on the network for files including the term, arrange payment for the information in cases where there are charges, and return all the information it finds to the user's computer.



This is an archived release.

This release is not available in any other form. Images mentioned in this release are not available online.
Stanford News Service has an extensive library of images, some of which may be available to you online. Direct your request by EMail to

© Stanford University. All Rights Reserved. Stanford, CA 94305. (650) 723-2300.