New Stanford Libraries division supports scholars’ data needs

Stanford Libraries has created a new division called Research Data Services (RDS) that will collaborate with other campus units, such as the Stanford Research Computing Center, to provide services for Stanford scholars across the university working with data.

Stace Maples, head of the RDS Geospatial Center, and former Stanford medical student, Hannah Wild, look through satellite data. (Image credit: Courtesy Hannah Wild)

RDS was established to support the data life cycle and digital scholarship by convening research support in digital humanities, geospatial analysis, computational social science and statistical support, and data management across all disciplines including STEM.

Peter Leonard is the inaugural assistant university librarian for research data services. Before joining Stanford, he served as director of the Digital Humanities Lab at Yale Library at Yale University. Prior to that, he oversaw humanities research computing at the University of Chicago.

Here, Leonard discusses why Stanford Libraries created RDS, how data is evolving, and how this new division will serve scholars across fields and disciplines at Stanford.

Why has Stanford Libraries created the division of Research Data Services?

Scholars across the disciplines contend with enormous amounts of research data from diverse sources: their own experiments or observations, commercial vendors, government agencies, and cultural heritage institutions. This research data could be mortgage lending records, satellite imagery of the Amazon rainforest, or thousands of digitized books. Entire research imperatives, such as the drive to ensure a sustainable planet, are predicated on access to, and facility with, complex and large-scale datasets.

The Stanford Libraries are committed to helping researchers with the entire research data lifecycle. From discovery or acquisition, to curation and cleaning, to algorithmic and quantitative analysis, to storage in a repository. We want to build on our tradition of service – and learn from researchers on campus what we need to do next.

What services inside Stanford Libraries contributed to the formation of RDS?

The Stanford Geospatial Center, which supports data, methods, and technology in spatial data science; the Center for Interdisciplinary Research (CIDR), which designs and develops data tools and methods; Stanford Libraries’ science data librarians; and Stanford Libraries’ digital research architect. We hope to build on the excellent work these teams are already doing – including data acquisition, licensing, curation, preservation, and sharing – and make the services we offer even more clear and accessible, both physically and virtually.

What are some examples of how these units in RDS support scholars?

The Stanford Geospatial Center is the university’s main support center for GIS (geographic information system) and spatial data science. Spatial techniques and approaches are shared across many disciplines. For example, they can help a marine biologist map out current patterns of marine life on the California coast and let a historian studying the Salem witch trials explore the geography of accusers and the accused in the 17th century.

“We should be mindful of the ways data can be used for harm.”

—Peter Leonard

CIDR consists of academic technology specialists who support digital scholarship in a specific program or department. CIDR also has developers who build software for faculty digital research projects. And CIDR’s Software & Services for Data Science (SDSS) group helps scholars learn languages such as Python and R, which are important in text and data analysis.

RDS also has staff who convert and enhance datasets with millions of records into useable formats for researchers, and then place them on Stanford Libraries servers. We also have science librarians who catalog results of research projects so that information can be shared with others.

Who can utilize RDS and are there any costs?

RDS serves all Stanford scholars free of charge. Whether you’re a student with questions about multilingual text analysis, a postdoc interested in geospatial tools and data, or a faculty member looking for data for your next project, RDS can help.

Are there any challenges or concerns that come with working with data?

It’s useful to consider the entire lifecycle of research data – whether from its acquisition or discovery, or to its cleaning or transformation into forms better suited to computation, or to the actual work of analyzing it (in either traditional or deep learning contexts), and finally to the ways that research products derived from it can be safely stored long term for discovery and re-use. All of these steps will differ by disciplinary, departmental, and individual contexts – and it’s a particular responsibility of Stanford Libraries to understand as many of these as we can. For that reason, our subject specialist colleagues are a vital resource.

Additionally, many discussions of “big data” today are increasingly informed by questions related to bias, algorithmic harm, and unrepresentative or incomplete training data. We should be mindful of the ways data can be used for harm, and we hope Stanford Libraries will continue to be a place where these questions and concerns are discussed and solutions are proposed.

Is RDS the only place at Stanford that provides help with data and digital scholarship?

No. RDS exists in an ecosystem of department and school libraries across campus, some of which report directly to those academic units and provide valuable resources for researchers through their disciplinary expertise. For example, colleagues at the Lane Medical Library are very knowledgeable about research data issues in health research, such as the new NIH data management and sharing requirements. The Research Hub and Library at the Graduate School of Business also has a knowledgeable team that provides many data and research-related services to the GSB community. We benefit greatly from their close connection to their client base and knowledge of their respective fields. And finally, the Stanford Research Computing Center (SRCC) is a hub for high-performance computing, as well as expertise, advice, and experience for computation and storage at scale. They play a key role in advancing research at Stanford.

For more information visit the Research Data Services website.