In brief
- The “reproducibility crisis” in science is the concern that it’s very difficult to reproduce results, given the same data and methods. Stanford experts say this is a mischaracterization of the challenges of reproducing science.
- They argue that the issue is that science is noisier and more complex than we’ve realized. And the solution, then, is to report details of procedures and analyses thoroughly, accurately, and transparently.
- Stanford has a long history of addressing these challenges, from early innovations in data documentation and sharing to CORES and SPORR – two groups at Stanford now leading the way in supporting research rigor and reproducibility, and open science.
In April, at the yearly symposium of Stanford’s Center for Open and REproducible Science (CORES), Stanford cognitive neuroscientist Russell Poldrack put up a slide that captured one of science’s thorniest problems. A colleague in his department, psychology professor Michael Frank, had tried to re-run 25 published analyses using openly available data. He could reproduce the results 60 percent of the time – but often only with help from the original authors. Without that assistance, the rate fell to 36 percent.
The outcome was not unusual. Similar attempts in cancer biology, ecology, geology, and the social sciences have been just as dismal. The pattern they describe has, over the past 15 years, acquired its own shorthand: the reproducibility crisis.
Steven Goodman, who directs the Stanford Program on Research Rigor & Reproducibility (SPORR) at the School of Medicine, has spent much of that period trying to address the problem, even as he’d like to retire the phrase itself.
“It’s not a crisis of reproducibility – that’s a misnomer,” said Goodman, who is a professor of epidemiology and population health at Stanford Medicine. “Instead, it’s a phenomenon of there being more variability than we ever expected – in analyses, in preparations of datasets, you name it. Framing it as a reproducibility crisis – suggesting studies are ‘right’ or ‘wrong’ or often contradictory – slowed us down in recognizing that.”
Three words, three concepts
The work of making science more reliable depends on three ideas that often get bundled together but mean different things:
- Rigor is the quality of the work itself – adequate sample sizes, bias control, preregistered hypotheses, transparent analytical choices.
- Reproducibility is whether another scientist, starting from the same data and methods, can arrive at the same result.
- Openness – sometimes called open science – is the infrastructure that enables verification: shared research materials, shared data, shared code, open-access publication.
What the data show, in Goodman’s view, is that the processes that affect scientific results are noisier and more complex than scientists once thought, making it critical to report details of procedures and analyses thoroughly, accurately, and transparently. “That helps us get to the truth faster, making our research efforts more productive,” said Goodman.
Poldrack, the Albert Ray Lang Professor in Psychology in the School of Humanities and Sciences, adds that the focus on reproducing individual studies misses how science actually works: Knowledge emerges from the consensus of many studies over time. “People outside the sciences often think of an idea going from tentative to proven,” he said. “But science isn’t about proof — we’re not mathematicians. We’re providing evidence. We calibrate our faith in a particular claim based on what the evidence looks like, and that’s always changing.”
Better science, over time
CORES and SPORR build on a long Stanford tradition of grappling with why even high-quality scientific findings can be hard to replicate. This work often focuses on documentation, orderly data, and making the details of science accessible and transparent.
In 1978, Stanford computer scientist Donald Knuth created TeX, an open-source typesetting system for technical text and mathematical formulas that helps researchers share their work more easily and remains popular today. In 1992, Stanford geophysicist Jon Claerbout wrote about the potential of electronic documents to merge publications with their underlying analyses and to share detailed data (via CD-ROM) between different labs. In 1995, Stanford statistician David Donoho and his co-authors wrote that “an article about computational science in a scientific publication is not the scholarship itself, it is merely the advertising of the scholarship.” The actual scholarship, they argued, was the code, the data, and the instructions that produced the figures.
In 2014, epidemiologist John Ioannidis – who had joined Stanford after publishing the famous 2005 PLOS Medicine paper “Why Most Published Research Findings Are False” – co-founded the Meta-Research Innovation Center at Stanford (METRICS) with Goodman to improve the validity and transparency of science through “meta-research,” or research on research. Poldrack runs another effort in the same lineage: OpenNeuro, a public archive holding nearly 80,000 brain-imaging datasets from more than 1,700 studies.
Friendly cousins
SPORR and CORES cover different territories and scales: SPORR operates within the School of Medicine, aiming to improve the whole research process; CORES works across the broader university and emphasizes open science.
“We’re very friendly cousins,” Goodman said. He sits on the CORES advisory board, and Poldrack sits on his. “They go wide and broad. We go narrow and deep. In either case, you’ve got to get the science right to start, and that’s where openness really helps.”
Most public discussion of open science focuses on openness externally – letting scientists outside your lab view your data and code. Goodman and SPORR are more concerned with internal openness.
“Within the group, there are often people doing absolutely critical things that other people can’t understand,” Goodman said. “A senior faculty member attests to the integrity of the process, and then – surprise, surprise – there’s a problem. Does that mean the faculty member was dishonest? No. They’re attesting to their faith in whoever handled the data. But the data has gotten so big and procedures so complex that the PI may not have fully understood how the results were produced.”
An audit trail and other tools that let a PI, or principal investigator, and other lab members critique and verify each other’s work can surface honest errors before they propagate – and, in rarer cases, expose intentional misconduct when corrections are still possible.
It’s not a crisis of reproducibility – that’s a misnomer. Instead, it’s a phenomenon of there being more variability than we ever expected.Steven GoodmanDirector of the Stanford Program on Research Rigor & Reproducibility (SPORR) at the School of Medicine
The fix, Goodman argues, has to be systemic. He likens it to patient safety. No medical professional wants to operate on the wrong leg or give the wrong medication, but it happens. Hospitals reduce harm by addressing structural issues: communication, staffing, handoffs, training, and equipment standards. Research, he says, is in the same place; getting it right must be a team effort – even an institutional one – with tools and incentives built to support it.
“If we feel like the scientific endeavor isn’t working quite right, we have to look at the system that’s producing it,” Goodman said. “We need to align our incentives with our priorities – rewarding getting it right instead of just getting it published. The School of Medicine has eliminated publication counts from its promotion criteria, an important step in that direction.”
Other parts of the university build openness and transparency into their day-to-day operations. The Graduate School of Business’s Behavioral Lab guides researchers using its participation pool through preregistration – publicly recording a study’s plan before data collection, so a hypothesis can’t later be reshaped to fit results. Stanford Libraries runs the Stanford Digital Repository, the university’s public archive for research outputs, and an annual Data Sharing Prizes program for researchers who exemplify FAIR (Findable, Accessible, Interoperable, and Reusable) practices. The Stanford Research Computing office provides managed computing environments and secure storage to enable analyses to be reproduced and data to be preserved.
The reproducibility conversation that began in psychology and biomedicine now extends to AI. Stanford’s Institute for Human-Centered Artificial Intelligence (HAI) publishes the annual AI Index, whose Responsible AI chapter documents how sparsely frontier model developers report on responsible AI benchmarks – covering safety, fairness, transparency, and governance – compared to capability benchmarks.
Appreciating unglamorous work
A significant challenge, both directors agree, is that the processes required to make research clear and sharable – cleaning data, sharing code – take time and require extra work, although AI may help.
“People are incentivized to have lots of papers in fancy journals,” Poldrack said. “When you do things more rigorously, it’s harder to have splashy findings without blemishes. There’s rarely a real payoff for it, at least in the short term.”
In an effort to change that calculus, SPORR and CORES together developed a new academic CV template for the School of Medicine that highlights how faculty ensure the rigor and transparency of their research. Other Stanford schools are considering adopting it.
“That’s never been in an academic CV before,” Goodman said. “It makes rewarding faculty for how they do research possible by making that process more visible. But visible and valued are not the same thing, so that’s our next challenge.”
CORES is partnering with MIT Press to develop openly available, “living” textbooks that give evolving science publication clout.
Both centers also give out annual awards recognizing exemplary work in rigor, reproducibility, and open science. Last year, School of Medicine Dean Lloyd Minor asked Ruth O’Hara, senior associate dean for research, to impanel a task force on Research Practice and Culture, now chaired by Goodman, to recommend changes across the school to support research reproducibility. And CORES recently joined HAI, placing the open science conversation alongside one of the most active research areas on campus.
A new wave
As Poldrack sees it, CORES has ridden three successive waves of attention: reproducibility in the 2010s, open science around 2023, and now AI.
At the symposium, he noted that AI could improve code review, code sharing, and the detection of mismatches between methods and written descriptions. He also showed what happens when the tools fail: Working with an AI coding assistant a few weeks earlier, Poldrack had to feed his project’s code through a second instance of the assistant to catch a fatal flaw the first one missed. AI tools need trust gates, he argued – the same scrutiny scientists already apply to each other’s results.
“Becoming known as a rigorous scientist and doing work that you know is going to stand up in the future is one thing,” said Poldrack. “But every field is increasingly embracing open science, so being among the first movers in that better direction will position you to be a leader in your field.”
For more information
CORES has been supported by Stanford Data Science, the Vice Provost and Dean of Research, and the Alfred P. Sloan Foundation. SPORR is supported by the Clinical and Translational Science Award from the National Center for Advancing Translational Sciences at the National Institutes of Health, with the support of principal investigator Ruth O’Hara, senior associate dean for research at the Stanford School of Medicine.
Writers
Taylor Kubota
Ker Than

