Stanford pilots data science fellowship program

Students in the Data Science for Social Good fellowship program develop data-driven solutions with social impact.

A fellowship program piloted last summer trained Stanford students to find data-driven solutions to pressing societal challenges.

Students who participated developed skills ranging from coding to public speaking. They also received real-world experience by partnering with an organization that needed data science research assistance. Due to the success of the pilot, the program is expected to continue in 2020.

Data Science for Social Good (DSSG) is a full-time, paid, on-campus summer opportunity open to all students and runs for nine weeks. It’s inspired by a similar program at the University of Chicago and was brought to Stanford to further the university’s Long-Range Planning initiative of expanding the scope of data science research.

Mathematical and computational science major Emily Guthrie, ’20, discusses findings from her team’s data science project. (Image credit: Farrin Abbott)

Ben Stenhaug, a PhD student at Stanford Graduate School of Education, completed the fellowship at the University of Chicago last year and is now a mentor at Stanford. He said DSSG is an experience that goes beyond what a traditional classroom setting can offer.

“It’s sort of the opposite of a teacher giving you a really well-curated data set and then you running a model,” he said. “Rather, students collaborate with a project partner that has a need that they’re trying to meet, and students work to service that need.”

The seven fellows who recently completed the pilot were divided into two teams, working to solve a problem faced by either the Stanford Blood Center or the Department of Veterans Affairs.

Predicting platelet need

Emily Guthrie, a senior majoring in mathematical and computational science, participated on a team that partnered with the Stanford Blood Center to predict future demand for platelets, which are tiny blood cells that help the body form clots to stop bleeding. She explained that, once harvested from a donor, platelets can be effectively transfused only for five days, two of which are spent in lab testing, leaving only three days to be used by the center.

“They’re wasting about $400,000 a year just by virtue of these platelets expiring,” Guthrie said. To solve this problem, the center gave her team lab test, hospitalization and transfusion data to analyze.

“We had to use this data to figure out how much we think the hospital is going to need today, tomorrow, the day after that and the day after that – so a four-day prediction,” she said. The long-term goal of the project is to get as close to zero waste as possible.

Guthrie said the program enabled her to work with data in a way she couldn’t in a classroom setting.

“With coursework, there are usually beautiful datasets and instructions of what to do and something close to a ‘right’ or ‘wrong’ answer,” she said. “In this case, though, we have to spend a lot of time cleaning messy data and thinking about a strategy with which to approach the problem. It’s not obvious what decisions to make when you find that there’s an error in the data or what modeling technique to use.”

Veterans and the opioid epidemic

Blanca Villanueva, a graduate student in biomedical informatics, participated on a team that partnered with the Department of Veterans Affairs to understand how the opioid epidemic is affecting the veteran population, particularly minorities. Team members spent weeks analyzing the VA’s data, looking for trends over time and across demographics. Throughout the process, they worked closely with VA officials.

“I think a lot of the novelty [of this fellowship] comes from the support we get from the community partner,” she said. “We get to talk to people who are very entrenched in the stats work that they do in-house, like statisticians, practicing physicians and people who are higher up on the administrative side of the VA.”

During the program, fellows received technical guidance from faculty and graduate student mentors with experience in data science. Students also heard from guest speakers from other departments, such as Stanford Law School, and took field trips to local companies and organizations, like Google and IDEO, to meet data scientists working on similar projects. Students also worked on developing team-building and communication skills – the latter applied to a final presentation to their partner organization in the last week of the program.

Ethics of data

The program also gave students the chance to consider the ethical implications of using data and to question what “social good” means when someone’s data is involved.

“We’ve done a lot of work on data ethics,” Guthrie said. “We’ve looked at things like algorithmic ethics and fairness, data privacy and security.”

DSSG is open to undergraduate and graduate students from any discipline. The first cohort of fellows came from a wide range of academic backgrounds – from sociology to computer science – and brought with them a variety of complementary skill sets.

“We want this to be super-inclusive,” Stenhaug said. While there are no prerequisites, participants should have some experience working with data, such as completion of a statistics course or experience writing code.

The fellowship is part of Stanford’s broader goal of expanding its scope of data science research and education while also serving the larger community.

“The DSSG program is a perfect example of this,” said Chiara Sabatti, professor of biomedical data science and of statistics. “Undergraduate and graduate students learn the challenges of analyzing data, increasing knowledge and deploying it effectively. They leverage a lot of technical skills, and they realize that in order for the results to be impactful and effective, one needs to be mindful of a lot of other aspects, from a legal and ethical framework to persuasive communication.”

More information is available from the Data Science Institute.