Community of scientists spins web of genetic data
BY TRACIE WHITE
Members of the Stanford Genomic Resources team celebrate a milestone: their 100-millionth Web hit on June 3. First row: Stuart Miyasato, Qing Dong, Anand Sethuraman and Mike Cherry. Second row: Marek Skrzypek, Martha Arnaud, Catherine Beauheim, Dianna Fisk, Cathy Ball, Karen Christie, Jodi Hirschman, Nick Stover, Eurie Hong, Farrell Wymore, Gail Binkley, Zac Zachariah, Maria Costanzo and Rob Nash. Third row: Shuai Weng, Gavin Sherlock, Ben Hitz, Rama Balakrishnan, Julie Park, Heng Jin, Don Maier, Michael Nitzberg and Janos Demeter.
The model organism databases at Stanford—a team that provides bioinformatic resources via the Web—continues to be at the forefront of the data-crunching revolution in genetics, reaching a new milestone June 3 with its eye-popping 100-millionth Web hit.
"Ten years ago it was basically three of us on this project," said Mike Cherry, PhD, associate research professor of genetics who now leads a team of about 30 scientists. The group collects, transfers and interprets genetic research culled primarily from published studies and then places the data on the Stanford Genomic Resources Web site. "I certainly didn't perceive how large this would become," Cherry said. "And it's still growing."
The overwhelming increase in traffic to the Web site over the past decade provides a glimpse into just how essential databases have become in the field of genetic research. Universities, government agencies and industries throughout the world all use the Stanford databases, which cater to specific research communities looking for information on the genetic sequencing of saccharomyces, candida and tetrahymena. In the past week, the Web site received 89,000 hits from Asia alone, one-quarter of the total number for that period.
"It's just indicative of the amount that people have come to rely on the existence of these types of resources," said Gavin Sherlock, PhD, research assistant professor of genetics who has been one of the leaders with the database team since the late 1990s. He frequently uses the databases for both his own genetic research and for writing grant proposals. "If they were lost, or if they suddenly disappeared it would set research back years," Sherlock said.
The project consists of four databases: the Saccharomyces (budding yeast) Genome Database, the Stanford Microarray Database, the Candida Genome Database and the Tetrahymena Genome Database. A team of about 30 "biocurators," all with doctorates in biological studies, collects and interprets data on comparative genome sequencing between organisms and posts them on the Web site. Their effort has yielded new insights into the relationships between genes and illness, as well as fundamental biological discoveries.
Comparing available genetic sequencing data in budding yeast to human genes of unknown function can provide a wide array of useful information for medical researchers, Cherry said.
"Evolution is real," Cherry said. "All life is descended from a common ancestor."
Not all scientific databases are as rigorous in sifting through relevant information, making connections between pertinent information and constantly updating new research as it appears, said Peter Good, PhD, program director for the Genome Informatics Program at the National Institutes of Health which funds several of the databases.
"It's a very high quality database," said Good. "The NIH has invested a fair amount of money in genome sequencing beginning with yeast, up to humans, and beyond. These resources provide a way to find out what information is known about the genes and any sequence features that are on the genome."
"It's kind of like a library," added Diana Fisk, PhD, molecular biology and senior scientific curator for the Saccharomyces Genome Database. "Instead of a building we have the Internet. Instead of a book, we have a page. There is one Web page for every gene. Our reason for being is to make all that information as accessible as possible. We do that on a gene-by-gene basis."
These databases have been created to help researchers share information more easily and as quickly as possible. Other than the Stanford Microarray Database, which includes some private data reserved for the Stanford community, the Web site is entirely free and open to the public. Anyone can use it anytime and many users reciprocate by providing updates on their own research to the Web site and sending more information when inaccuracies are discovered.
"The community has been very nice," Cherry said. "They share and help us make sure the information that gets in our database is right. We get several thousands of people sending us e-mails. They understand the importance of these databases."