September 30, 2015

Stanford computer scientist Christopher Ré named MacArthur fellow

Stanford's Christopher Ré, an assistant professor of computer science, has been awarded a "genius grant" from the John D. and Catherine T. MacArthur Foundation. He was recognized for his work in developing a data-inference system that analyzes data with a high degree of certainty.

By Bjorn Carey

Christopher Ré, an assistant professor of computer science at Stanford, is one of this year’s recipients of a MacArthur Foundation “genius grant.” (Image credit: Don Feria)

Christopher Ré, an assistant professor of computer science at Stanford, has been named one of the 2015 fellows of the John D. and Catherine T. MacArthur Foundation.

The fellowships, popularly known as “genius grants,” are awarded to scholars for their achievement and potential, and include a $625,000 stipend over five years. The honors rank among the most prestigious prizes in academia and the creative arts.

The MacArthur Foundation recognized Ré for excellence in a breadth of computer science disciplines, describing him as “democratizing big-data analytics through open source data-processing products that have the power of machine learning algorithms but can be integrated into existing and applied database systems.”

Some of these efforts have helped Ré develop the DeepDive data inference system, which can analyze large batches of genetic and medical studies to advance drug development, and which DARPA uses to comb data on the “dark web” to identify and break up human trafficking rings. The MacArthur Foundation hailed these as examples of how Ré is revolutionizing researchers’ ability to make big data truly accessible and widely useful.

Ré was informed of the honor a few weeks ago, but was sworn to secrecy.

“They tell you that you can only tell one person, so I told my wife,” said Ré, who is also a Robert N. Noyce Family Faculty Scholar. “It’s been an interesting three or four weeks, and I’m very excited to tell the rest of my family.”

One of the major challenges of crunching the types of data sets that DeepDive analyzes is the sheer number of decisions that must be computed, Ré said.

For instance, if the system is analyzing academic studies regarding fossils in a particular type of rock formation, it must determine if a particular section of text or table is referring to a given rock formation, and whether that rock formation is the same as a particular site in the real world, or another formation with the same name. It needs to parse out the species name, and decide whether that name refers to the creature, or the rock. Is the seam where the fossil was discovered the same seam as the location where another fossil was found and described in another paper?

For each of these decisions, it needs to consider both the “yes” and “no” options, and then weigh how likely either option is to produce the ultimate recommendation.

“All those choices that a human makes when reading the paper, the machine has to do, too,” Ré said. “When you are looking at millions of journal articles, and thousands of these types of inference choices in each, those choices add up very quickly in the realm of billions. The approaches that we’ve built can do this task better than human volunteers.”

A key to DeepDive is getting this overwhelmingly large number of questions answered really efficiently, Ré said, and this required a couple of innovative programming acrobatics.

The first involved getting the system to run well on modern hardware. One of the ways that computer processors have increased in speed in recent years is to harness the power of multiple processors crunching data in parallel. Ré and his colleagues found that those parallel threads spend a lot of time communicating with each other, a crosscheck system intended to reduce errors.

So they shut down communications and let the processors run independently, a process they call “Hog wild!”

“Sequential operation is something everyone is trying to use, but we decided to just let it run wild,” Ré said. “This was stunning, and it allowed us to take advantage of all the hardware. We found that it would still compute the right answer.”

Ré is also devising ways to improve the logic side of the problem to produce answers in crisp, formal language very efficiently. He and his colleagues applied a theory from geometry to their programming and created a way to generate a worst-case possible guarantee in their analysis, so that the system would perform no more computation than necessary in the worst case.

“The theory is nice and interesting, but now we are trying to build a system around it to use it,” Ré said. “The initial results suggest that it can be orders of magnitude faster than classical approaches.”

Ré said he’s still overwhelmed by the MacArthur honor, but is excited to put the winnings and new recognition to work.

“Every academic has ideas in his or her drawer that can’t get funding because maybe it’s too crazy, even though the outcome will be big,” Ré said. “I’m hoping to undertake some of those in the next couple of months, and maybe staff up some other side impacts that could have a huge impact. I’m really excited about the opportunity.”

Ré joined the Stanford faculty in 2013, where he is currently an assistant professor in the Department of Computer Science in the Stanford School of Engineering. Before Stanford, he was an assistant professor at the University of Wisconsin at Madison. Ré earned his bachelor of science degree from Cornell University in 2001, and received his doctorate from the University of Washington at Seattle in 2009. A video interview is on the MacArthur Foundation site.