Skip to main content



COVER STORY SIDEBAR

Library-managed 'arXiv' spreads scientific advances rapidly and worldwide

Publishing in a scientific journal can be a slow process, so for decades scientists circulated "preprints" of their research papers to a few colleagues. As a young physicist at Los Alamos National Laboratory in the early '90s, Paul Ginsparg, Ph.D. '81, realized that this gave him an unfair advantage.

Paul Ginsparg

Paul Ginsparg See larger image

"I was receiving preprints long before graduate students further down the food chain," Ginsparg recalls. "When we have success we like to think it was because we worked harder, not just because we happened to have access."

So he created a service where physicists could post their preprints as "e-prints" accessible to anyone with an Internet connection. The idea caught on, submissions multiplied, and subject matter expanded to include mathematics, astrophysics, computer science and, most recently, biology and statistics.

Eleven years ago Ginsparg joined the Cornell faculty, bringing what is now known as arXiv.org with him. (Pronounce it "archive." The X represents the Greek letter chi.) It is managed by Cornell University Library, allowing Ginsparg to devote more time to his research. As a theoretical physicist, he has made substantial contributions in quantum field theory, string theory, conformal field theory and quantum gravity.

"The arXiv is an information system, and it's the library's role to manage scholarly information systems," says Oya Rieger, arXiv program director for the library. "Although his role is changing, Paul is still a very important part of the work."

"The goal has always been to convert it to a long-term sustainable resource, epitomizing the expanded global role of libraries and universities in the online environment," Ginsparg adds.

Along with his physics research, Ginsparg holds a professorship in information science and works with that department's Digital Library Group to develop enhancements to the arXiv's capabilities. Electronic publishing can make supporting data available alongside a paper and offers new ways to manage information, including searches, data mining and detection of plagiarism.

What began on a single workstation under a desk at Los Alamos now runs on three high-powered servers in Rhodes Hall – and 15 mirror sites around the world – storing more than 770,000 papers. In 2011, more than 75,000 new articles were submitted and more than 70 million full-text articles were downloaded.

The arXiv server in the early 1990s

The arXiv server in the early 1990s. See larger image

The annual budget is approaching $500,000, and much of Rieger's effort has gone into creating a system to ensure permanent financial support by evolving the arXiv from an exclusive Cornell initiative to a collaboratively governed, community-supported world resource. In 2011 arXiv received contributions from 133 institutions, representing 18 countries, and 123 universities, libraries, research laboratories and foundations have pledged to become supporting members of the new organization, each contributing $2,300 to $4,000 per year based on how much their researchers use the service. Their contributions will be matched with up to $300,000 per year from the Simons Foundation over the next five years.

Members elect a member advisory board, chaired by Rieger, which advises on business and technical matters. A scientific advisory board, including Ginsparg, advises on content policies. The arXiv has no peer-review process, although it does restrict submissions to those with scientific credentials. A team of volunteer moderators around the world screens submissions for appropriate topics.

Besides leveling the playing field for a few graduate students, the system has spread the field around the world: Scientists in developing countries have the same instant online access to new research materials as faculty members at Ivy League schools, and the same opportunity to distribute their own work. However wide the new field grows, it is still level.

Back to top