The NSF-DOE Vera C. Rubin Observatory houses the largest digital camera ever built for astrophysics. Its mission: To conduct a 10-year “Legacy Survey of Space and Time,” resulting in the widest, fastest, and deepest views of the night sky ever observed. SLAC National Accelerator Laboratory, which built the LSST Camera, also houses the U.S. Data Facility, which will handle 15 terabytes of data per night and generate astronomical catalogs that are thousands of times larger than any previously compiled dataset. This effort is one of many examples of how data science is shaping our world, shedding new light on the human experience and the vast world around us.
In other fields, data scientists are developing novel AI technology for predicting protein structures, discovering new areas of the brain in neuroscience, and studying molecular data in biomedical statistics to learn about infectious disease patterns. When you put large datasets in the hands of capable scientists, incredible discoveries and innovations start to unfold.
To share and celebrate achievements in this space, Stanford Data Science will host Women in Data-Driven Discovery (WiD3), a one-day conference held on March 6, 2025, to highlight the inspiring work of women researchers, practitioners, and industry leaders in data-intensive science fields. The conference is open to all Stanford affiliates and members of the public to attend. Whether experts themselves or still learning about data-driven science, attendees will have a rare opportunity to connect and converse with leading data scientists from both academia and industry, at all stages of their careers.
“We want to celebrate the amazing discoveries scientists have uncovered that wouldn’t be possible without big data and sophisticated algorithms to process it,” says Laura Gwilliams, assistant professor of psychology, Stanford Data Science faculty, and Wu Tsai Neurosciences Institute scholar, who is a co-organizer of the event.
Engaging all ideas and voices in data science
WiD3 builds on the Women in Data Science (WiDS) initiative, which launched at Stanford in 2015 with a long-term vision of full and equal representation in decision-making, economic prosperity, and opportunities. In its annual conferences, WiDS has covered diverse topics such as health frontiers, Earth systems, and generative AI, to unite data science enthusiasts, both in-person and virtually, fostering knowledge-sharing, community-building, and empowerment for all women in the field.
Once WiDS became an independent organization, Stanford Data Science hatched the idea of hosting a data-driven discovery conference highlighting women in the field. To expand its impact, they invited fellow CoDa residents – Stanford Impact Labs and the Stanford Institute for Human-Centered Artificial Intelligence (HAI) – to collaborate. “Having a diversity of voices clearly contributes to success in data science projects, so we’re bringing together women from multiple disciplines, such as education, healthcare, and psychology,” says Elizabeth Wilsey, director of engagement and partnerships for Stanford Data Science.
Accordingly, this week’s event highlights the many impactful contributions women have made to data science discovery while also celebrating the breadth of discoveries scientists have uncovered – progress that wouldn’t be possible without big data and algorithms to process it. In addition to guest speakers from Johnson & Johnson Innovative Medicine, Netflix, The Walt Disney Company, and OpenAI, six lightning talks will feature Stanford PhD students and postdoc presentations covering data-driven insights in areas related to their research.
Data science defined
For those who are not familiar with the fast-moving field of data science, it enables much of today’s emerging technology and research. Data science is fundamental to large language models and generative AI, but it also plays a key role in causal science, which measures the cause-and-effect relationships between variables, often using smaller datasets.
In her field of language neuroscience, Gwilliams practices data science in two primary ways. She uses large language models such as GPT and Llama, which were created from vast stores of general data. She also collects and analyzes human neural data to understand how humans process and produce language. “In the ideal world, we put those two strategies together, so we can interpret the big neuroscience datasets using sophisticated large language models, as well as develop our own algorithms that can be applied across domains.”
In another example of data science at work, Julia Palacios, associate professor of statistics and of biomedical data science, uses machine learning approaches to recover past infectious disease dynamics of pathogens like HIV from present day molecular data sequences.
Through keynotes and panel discussions, the conference organizers hope to address some of the common misconceptions about data science. For example, some people question if data science removes the scientific method. But Gwilliams explains, “Big data doesn’t mean throwing away hypotheses; instead, we’re bringing big data together with the scientific method, leveraging tools and insights that we can only get with access to large datasets.”
Data quality is important, too. “People may think that a large enough dataset will eliminate issues like confounding or distribution shifts, but this is not always the case. Extrapolating results to more general populations remains a challenge. ” Palacios says.
Why attend WiD3
Whether you are curious about data science, beginning a career in the field, or an expert with insights to share, the Stanford Data Science team welcomes you to join WiD3 for a day of learning and networking. “It’s a special opportunity to get outside your own discipline and make new connections – with other people and within your own mind,” says Susan Clark, assistant professor of physics. “I spend a lot of my time in physics, but it’s wonderful to make connections across fields.”
Women in Data-Driven Discovery takes place at the Simonyi Conference Center in the new Computing and Data Science (CoDa) building and is open to the entire Stanford community, including alumni/friends, faculty/staff, the general public, members, postdocs, and students. Register today to attend in person on March 6.