“This is the microscope and the telescope and the linear accelerator of the 21st century all in one,” said Emmanuel Candès, faculty director of Stanford Data Science, describing his team’s latest installation. Marlowe, a state-of-the-art GPU-based computational instrument, will begin accepting applications from the entire Stanford research community on its website on Jan. 15, 2025.
Candès can barely contain his excitement. In its ability to churn through computations as never before, Marlowe will put Stanford at the forefront of data science. This “superpod” comprises 248 Nvidia H100 graphical processing units (GPUs), the chips fueling the research and development of AI innovators like OpenAI.
Under the stewardship of the Office of the Vice Provost and Dean of Research, the university will invest $30 million to purchase hardware, recruit a team of research data scientists, support Marlowe’s operational needs, and facilitate collaboration opportunities – all important to get the most out of Marlowe in a rapidly evolving field. Marlowe is now being tested at Stanford Research Computing data center, where it will be housed for the foreseeable future.
Since Marlowe’s installation this summer, a technical team has partnered with beta testers to refine the system's performance. Even in the early days, one of those beta testers, Professor Gordon Wetzstein said, “Marlowe has already turbocharged my research and made possible something that wasn't just three months ago.”
“I fully expect Marlowe will be oversubscribed on day one,” Candès added.
Broad horizons
While the hardware is impressive, Candès is the first to admit that it is table stakes in the rapidly evolving field of computational data science. For him, nothing less than Stanford’s continued place at the forefront of research in biology, chemistry, physics, engineering, cosmology, medicine, artificial intelligence, and other fields is riding on Marlowe’s success.
“For Stanford’s research output in almost every field, computation is the future,” he said. “That is how much Marlowe means to Stanford’s continued leadership in research. I think it’s going to have a huge impact.”
Candès hopes Marlowe will not just empower, but embolden Stanford faculty to broaden their research horizons. Beyond pure data science, Candès thinks Marlowe will also help attract and retain world-class faculty, postdocs, and students hoping to work on the data-intensive models that are its calling card.
“Without an instrument like Marlowe, there's no way Stanford can simulate how the universe works. There's no way we can discover that next breakthrough drug. There’s no way to understand the mysteries of human life,” Candès says. “With Marlowe, however, we become computational explorers.”
Get to know a few of the computational explorers who will soon be utilizing Marlowe and how they intend to use it to the fullest.
Jure Leskovec | Rod Searcey
Jure Leskovec: AI virtual cell
Jure Leskovec is a computer scientist who, among other aims, is interested in developing AI for large, interconnected systems, such as the social sciences, human biology, and drug discovery. These are known as foundation models. Leskovec’s latest focus is on creating AI models to accurately simulate individual human cells.
The vision is for scientists to one day conduct experiments on computers rather than on living cells – in silico versus in vivo. This would not only be safer for humans and far less costly, but much faster, too. The long-term goal would be to build models of myriad cell types to build computational tissues or even whole organs.
“One way to view a cell is to represent it as a bag of molecules. To create a virtual cell you need to represent how all these types of molecules interact mathematically,” Leskovec said. “Our immediate aim is to create a single virtual cell – some call it a ‘digital twin’ of a real biological cell. We are building up from foundational models of biological molecules like DNA, RNA, and the many proteins that make cells work.”
Such capabilities, Leskovec said, extend to modeling disease systems, like cancer, multiple sclerosis, or Alzheimer’s, to speed research into what goes wrong when biological systems fail. Then, these same computational models could be used to develop drugs and therapies to slow, or perhaps even cure, disease.
Marlowe’s computational powers will allow Leskovec’s team to scale their AI foundation models, addressing challenges in traditional, labor-intensive, lab-based experimental biology. He hopes to use Marlowe to harmonize data from many different sources and labs across the world and translate these data into a universal model that is the basis for future models of molecules, cells, and organs.
“This type of work involves massive, massive datasets of all the various proteins and other biomolecules that coexist in the cell and then to build computational models that capture biological variability,” he said. “If we can do this in silico rather than in a wet lab, it would speed research by orders of magnitude while also making it orders of magnitude cheaper. That’s the real promise of Marlowe.”
Jennifer Pan | Jeff Singer
Jennifer Pan: Mapping social media’s reach
Jennifer Pan is a political scientist who studies political communication and how it is used to advance authoritarian politics in the age of global, instantaneous digital media. She uses computational methods to explore huge and intricate datasets on political communication – what the messages are, where they originate, and how they spread and evolve over time to shape political preferences and behaviors.
“We’re very interested in how information transmits across borders and across time in different modalities,” Pan says. “Not only text data but image, audio, and video data as well.”
She’s excited for Marlowe to come online to help her crunch the profound amounts of data she has collected to analyze how text, images, and video, spread across platforms, such as U.S.- and China-based social media platforms like YouTube, Weibo, and Douyin (Chinese TikTok).
“Analyzing where and how messages originate and travel is just one aspect of our research,” Pan explained. “The other aspect is to try to model how governments are proactively trying to manipulate the information environment, whether it's through censorship or through content injections.”
These are large-scale data collections, Pan notes, involving networks of billions of users and giant collections of historical content tracked over time. For instance, she has been collecting data from the Chinese social platform, Weibo, since 2009. And that is just one platform and one type of data.
“Our current work is often quite focused and limited in scope and time – for instance, the period after Russia first invaded Ukraine. By not being able to look wider and deeper, we might be missing key insights,” Pan said of a limitation on her current work that Marlowe might remove. Marlowe, she believes, would allow a broader analysis, across geography, time scales, and data types, to understand patterns of information transmission and manipulation.
“I think having a cluster located at Stanford will be a great boon to my research and, I’m sure, many others in the computational social sciences,” Pan said.
Susan Clark | Christopher Michel
Susan Clark: Computing the cosmos
Susan Clark is an astrophysicist who studies the mysteries of the Milky Way. She uses computational methods to calculate the dispersion of matter across the galaxy and to project how stars will form and behave. Clark and her team study things like the interstellar medium – the gasses and other matter between stars that will one day form new stars – and the circumgalactic medium – the diffuse matter that surrounds the disk of the Milky Way.
She takes advantage of data from missions like the European Space Agency’s Gaia Project, which is creating an “extraordinarily precise” three-dimensional map of more than a billion stars in the Milky Way, tabulating their trajectories, brightness, temperature, and atomic makeup.
And yet, a billion stars is only a tiny fraction of the visible universe – like a single dot in a pointillist painting. The Milky Way, a single galaxy, may contain some 400 billion stars, while the universe may hold as many as 2 trillion galaxies. The data are overwhelming and the computational models, like those Clark creates, are hard-pressed to make sense of it all.
“It’s staggering really, the scales at which we work,” Clark said. “We are using the models to understand processes like gas dynamics, star formation, and the flow of energy throughout the galaxy.”
Current computational models, therefore, only work on very small portions of the Milky Way.
“This model is just one percent of the Milky Way,” said Philipp Frank, a postdoc in Clark’s group, demonstrating one of the models he and his team have created. “It was produced by a single GPU and took weeks to process.”
That’s where Marlowe could help. With its almost 250 state-of-the-art GPUs, Marlowe will enable faster and larger-scale calculations, potentially reducing weeks of computation to days or even hours.
“We are already preparing methodologies for upcoming datasets that will allow us to explore the intricacies of the interstellar environment and the gas way out in the Milky Way’s halo,” Clark said. “That’s what Marlowe will mean to our work.”
Sherri Rose | Rod Searcey
Sherri Rose: The math of medicine
Sherri Rose is a statistician and an expert in health policy who uses advanced computer modeling to analyze the financial and social effects of health care policies. For example, she develops AI that predicts health care costs and is able to identify financial inefficiencies – even potential fraud – and point out inequities in the American health care system.
“You can learn a lot about the American health care system by following the money – looking at how and on whom we spend can improve health equity by improving the way that we make payments in the system,” Rose said.
Rose notes that Marlowe has the potential to substantially speed up the development of algorithms trained on simulated and synthetic data; tools that can ultimately help her team root out ways the current health care system is shortchanging marginalized groups on needed services.
On the fraud front, Rose is developing AI auditing tools in projects led by Biomedical Data Science PhD student Oana Enache to detect unethical practices. “Marlowe will support more complex, larger-scale algorithms, providing deeper insights into health systems that might help us detect fraud, improve equity, and manage costs over the long term,” Rose says.
For more information
Emmanuel Candès is also the Barnum-Simons Chair in Math and Statistics in the School of Humanities and Sciences.