CONTACT: Stanford University News Service (650) 723-2558
Beyond browsing on the Internet
STANFORD -- People "browse" and "surf" the World Wide Web. These words accurately portray a Web strength the chance to discover unexpected information and one of its major flaws the difficulty of getting straight answers to simple questions.
Stanford computer scientists now have developed a system, called Infomaster, that they hope will overcome this limitation.
"The underlying idea is that you should be able to ask a single question and get a meaningful answer, even when the information required is stored in several different places," said Michael Genesereth, associate professor of computer science, who heads Infomaster's development.
Infomaster technology makes this possible by extending the Web beyond its current text and document base into the realm of databases, highly structured collections of data that are specifically designed to make it easy to search and locate information.
The effectiveness of the Infomaster approach will be tested over the next two years on the Stanford campus through a project called the Stanford Information Network, which will be used to link a broad range of administrative and academic databases. On the academic side, the project will integrate some of the 135 databases now found in Folio and the Libraries' reference database. These include scientific tables, geologic data, stock prices and data about the human genome, among many other subjects. On the administrative side, the program will be applied to data about employees and students, organizations, courses, grades, events, room reservations and contractors. One of its first applications will be to create a much more powerful electronic campus directory.
The Stanford Information Network is a joint effort of the Computer Science Department, the University Libraries, and Information Technology Systems and Services. Principal investigators are Genesereth, University Director of Libraries Michael Keller and Glen Mueller, Stanford's chief information officer. With an estimated cost of $600,000, it is the first major project approved by the new Commission on Technology, Teaching and Learning.
Infomaster "is the first technology we've seen with the potential for allowing people to get answers to questions phrased in a natural way that can be readily applied to numerous databases," Keller said.
Take the question, "How many children under 5 lived in Cleveland before 1923?" The computer must find and interpret many pieces of information to answer this question, such as where Cleveland is and when it was founded; what children are; and Cleveland's past population.
"Infomaster has the capability to break down such a query into its basic elements, find the pieces of information it needs and reassemble them into a meaningful answer. It can become a search engine that we will be proud to use for academic resources," Keller said.
The technology has a similar potential on the administrative side of the university, according to Dennis Rayer, a project manager at Information Technology Systems and Services and the first staff member assigned full time on the project.
"This is very exciting stuff," he said. "It combines new elements with conventional databases. No one can anticipate all the uses that it will be put to."
Prototype system already in use
The computer scientists have a prototype of Infomaster up and running that illustrates some of its capabilities. The demonstration service includes:
Creating a super-Whois
One of the first campus systems that the researchers will upgrade is the electronic directory, the system called Whois. Currently, Whois provides an employee's name, title, department, address, phone numbers and e-mail address. By linking the existing Whois with other databases such as publications and the course catalog, Infomaster will allow users to locate individuals and groups of people that meet a variety of different criteria.
For example, by asking a single question a user should be able to get a list of all the associate professors who teach introductory computer science and who have published in the journal Artificial Intelligence. The current Whois can be searched only by an individual's name. The researchers already have developed a slightly enhanced version of the directory that can be searched on a number of fields, including name, department, title, and address.
Forming a different kind of network
Infomaster will be accessible through World Wide Web browsers such as Mosaic and Netscape and Explorer. But the network that it forms has a number of important differences from the Web.
The Web consists of millions of linked documents, images, and audio and video clips along with search engines designed to help people locate text information. The documents can be organized in hierarchical structures such as web directories like Yahoo. Web search engines generally look for a sequence of letters and characters and return a list of documents that contain this sequence. These documents then must be searched one by one to find the information the user is seeking.
The basic components of the Infomaster information network, by contrast, are databases and other kinds of structured information sources. The system also can handle certain kinds of text information that can be automatically translated into a database format, such as the rental ad information included in the demonstration.
Infomaster serves as an automated "information broker" that can use information stored in a number of different databases to answer a client's questions.
To do this, the system must be able to translate between different databases. Take the case of two databases, one that contains student information and the other that contains information about faculty members. The student database might list students by name and their faculty advisers in a field called "advisers." The faculty database, on the other hand, might list faculty members by name and put their students in a field called "advisees." To translate effectively, Infomaster must know that the "adviser" field in the student database is the same as the "name" field in the faculty database, and that the "advisee" field in the faculty database is the same as the "name" field in the student database.
To provide Infomaster with this information, it has a "meta-programming" procedure that requires the person adding a new database to define the terms that it uses. These semantic connections allow the system to handle complex queries. According to Keller, this is very similar to the process the library uses to catalog new acquisitions.
"One of the questions we have to answer is how much work will be required to add and maintain new databases," Rayer said.
A number of non-technical issues also are involved in setting up such a network. The most demanding of these include:
Such decisions must be made by the university, but, according to its researchers, the Infomaster technology has been specifically designed to handle a wide variety of authorization and payment procedures.
The development team is creating a detailed action plan for the two-year pilot project. This will be followed by a two-year "open enrollment" period during which campus information owners would be aided and encouraged to incorporate their databases into the network.
Infomaster's development was supported in part by CommerceNet, Inc., a non-profit subsidiary of Smart Valley, Inc. CommerceNet was set up to develop the tools required to use the Internet for electronic commerce. As part of this effort, a group of Stanford computer scientists, headed by senior research scientist Arthur Keller, has been developing "virtual catalog" software that will allow participants to search companies' on-line catalogs, even when they give the same products different names and organize their information in different ways. It is a specialized application of Infomaster technology.
Download this release and its related files.
The release is provided in Adobe Acrobat format. Any images shown in the release are provided at publishing quality. Additional images also may be provided. Complete credit and caption information is included.