A more flexible architecture for parallel processors

1/30/96

CONTACT: Stanford University News Service (650) 723-2558

COMMENT: Prof. Mark A. Horowitz, E.E. & Computer Science (415) 725-3707
e-mail: horowitz@ee.stanford.edu
Prof. Mendel Rosenblum, Computer Science (415) 723-0474
e-mail: mendel@cs.stanford.edu
Prof Anoop Gupta, E.E. & Computer Science (415) 725-3716
e-mail: ag@pepper.stanford.edu
Prof. John Hennessy, E.E. & Computer Science (415) 725-3712
e-mail: jlh@vsop.stanford.edu

A more flexible architecture for parallel processors

STANFORD -- Imagine connecting all the computers in a building so that they can share resources, allowing a computer that is running a really big program to borrow memory and even processors from idle computers nearby. Now imagine that such a system would be comparable in cost and reliability to a normal network.

That is one of the possible results of a large project called FLASH being conducted at Stanford's Computer Systems Laboratory. FLASH stands for Flexible Architecture for Shared Memory. Its basic goal is to develop a new and more flexible type of parallel processor - a computer that uses more can one central processing unit at once.

The research team is headed by John Hennessy, professor of electrical engineering and computer science, and is made up of six faculty members and more than 40 researchers, staff and students. The project is a cooperative effort with the companies LSI Logic and Silicon Graphics, and is supported by the Department of Defense's Advanced Research Projects Agency.

The researchers expect to have a prototype of this new machine built and running by the end of the year. The project also includes the development of an operating system called HIVE that should allow the computer to run many existing programs.

"We want to create a large machine that is equally as good at supporting a large number of users running moderately sized programs as it is at supporting a small number of users running very large programs," said Mark A. Horowitz, associate professor of electrical engineering and computer science, who helps administer the project.

To people running small programs, like word processors or spreadsheets, working on a FLASH machine would be about the same as using a single workstation. But those with large programs, like complex graphics programs, large databases and engineering simulations, would benefit by receiving substantially more resources than a single workstation can provide.

Such a computer could be constructed in a single box, or it could be broken up into individual processors, or nodes, and spread around a network. This would differ from current networks because the nodes would continue to work together as a single computer. The distributed configuration would be slightly slower than the single machine, however, because of speed-of-light delays in transmitting information over the longer distances involved.

Today, most computers contain a single microprocessor. That means they can do only one thing at a time, although they can do it very rapidly, performing millions of operations per second. Parallel processing largely has been limited to supercomputer-class machines containing thousands of processors. But that is changing. A number of scientific workstations now can run two or more processors. And many file servers - computers designed to run office networks - also utilize several processors.

FLASH incorporates a radically new approach to what has been one of the thorniest problems facing parallel processing: how to handle memory.

The architecture used by the first generation of parallel computers, pioneered at the California Institute of Technology, divided memory up equally among all its processors. This meant that when a processor needed some information stored at a different node, it had to specifically request it by sending a message to the other processor.

An architecture that is easier to program is called shared memory, which is used in today's file servers. In this approach, all the memory is accessible from all the processors. Initially, this was accomplished by placing the memory and all the processors on a common communication line called a data bus. The only tricky issue was dealing with caches, the small, fast memory chips located next to the processor, where it stashes frequently used data. To ensure that the cache data remains valid, the cache must "snoop" on the bus to see if any of its data must be updated.

But, as processors got faster, the bus became a serious bottleneck. In 1989, Hennessy put together a team to develop a shared memory machine that did not require a single, global bus. They called this design DASH, Directory Architecture for Shared Memory.

"We ended up building a machine that looked a lot like a distributed memory machine. Instead of requiring that the programmer remember to send messages to get remote data, however, we put in some hardware called a directory that figured out what messages were needed and took care of them," Horowitz said.

DASH was completed in 1991. The machine ran programs effectively and exhibited substantial improvements in performance as additional microprocessors were added. The approach it pioneered, called distributed shared memory, is now finding its way into commercial products, according to Horowitz.

But the Stanford researchers were not satisfied. For one thing, the DASH design was inflexible. For another, its maximum size was 64 processors: It could not be scaled up into very large sizes. In 1992, the researchers embarked on FLASH.

The key innovation in the FLASH design is a special controller that sits between the processor and the memory and acts something like an electronic bureaucrat. Called MAGIC, this circuitry controls the flow of information between the processor, the memory and the network. Mostly, it passes requests for data and the data itself back and forth between the processor and the memory. But, while a request or data are being moved, it also does whatever bookkeeping is needed. For example, to ensure that caches are consistent, MAGIC must track which chunks of memory are being stored in each cache.

"The best way to explain this is to think about what the controller chip does. Mostly [it] moves information around. When a request comes in, the request says, in hardware, move these data here, and, by the way, I am this kind of a request. So the processor wakes up and says, for this kind of request, I have to do this kind of bookkeeping," Horowitz said.

What is unique about MAGIC is that the controller is a little computer of its own. The rules for each operation are set by programs that run on this processor. That makes it easy to change the basic communications protocols that the machine follows.

The MAGIC controller has been designed and is currently being made into a single integrated circuit. Once it is completed, the researchers will combine it with Silicon Graphics boards and processors to make the prototype computer.

"Because we are using Silicon Graphics processors, I have insisted that our operating system be compatible with programs written for Silicon Graphics workstations," said Mendel Rosenblum, assistant professor of computer science, who is in charge of developing the operating system.

The operating system (a special version of UNIX) was named HIVE because of its cellular structure, analogous to that of a beehive. Essentially, it uses the MAGIC controllers to divide the system up into a number of cells, and instructs each controller to check the validity of information coming from outside its cell. The cells thus serve as internal fire walls and are the basis of a protection scheme that the researchers call "fault containment." This isolates each user's area in the system so that, if a hardware or software failure occurs in one cell, the rest of the system can continue to operate.

"For a big, expensive resource of this kind, it wouldn't be acceptable if we had a thousand users and one user could crash the whole system," Rosenblum said. "If a FLASH machine carried a much greater risk of failure than a workstation, people would use the slower machine. So we have tried to keep the risk of failure proportionate to the amount of resources that an individual is using."

To get the advantages of a shared memory machine, the program allows cells to "borrow" memory from other cells. When a program suddenly needs more space to perform a function, it can readily get the resources it needs.

The operating system's cellular structure is also the key to getting hundreds or thousands of processors to work simultaneously, Rosenblum said. Each of the cells can act as an independent operating system. Because the vast majority of activity takes place within cells, not between them, the addition of more processors does not overload the system.

Rosenblum has the basic HIVE code running on a sophisticated simulator that emulates both the hardware and software environment of FLASH. Although a lot of refinements still must be made, the operating system will be ready to go when the hardware is completed, he said.

In addition to the operating system, a group of researchers, directed by Anoop Gupta, associate professor of electrical engineering and computer science, is developing scientific application programs for the new machine. These programs range from computer circuit simulations to models of how water behaves at the molecular level, and from how astronomical objects behave when moving under the influence of their collective gravitational attractions to computer graphic techniques like ray tracing.

The other faculty members involved in the project are Monica Lam, assistant professor of computer science, and Kunle Olukotun, assistant professor of electrical engineering.

-dfs-

960130flash.html

Download this release and its related files.

The release is provided in Adobe Acrobat format. Any images shown in the release are provided at publishing quality. Additional images also may be provided. Complete credit and caption information is included.
960130flash.sit