· 6 min read ·

arXiv After Cornell: What Independence Means for the Infrastructure of Science

Source: hackernews

For thirty-five years, arXiv has been the place where science actually happens before it officially happens. Papers land there first. Researchers cite preprints from arXiv before journal versions exist. In fields like high-energy physics and machine learning, the journal publication is almost a formality. The real record is on arXiv.

So the announcement that arXiv is declaring independence from Cornell University is not a minor administrative restructuring. It is a change to the governance of infrastructure that the global scientific community depends on every day, infrastructure that was built under one institutional arrangement and will now have to sustain itself under a different one.

How arXiv Got Here

Paul Ginsparg launched what would become arXiv on August 14, 1991, while working at Los Alamos National Laboratory. The original system was an email-based preprint distribution server for high-energy physics, running on a machine called xxx.lanl.gov. The design was simple: researchers could upload TeX source files and others could retrieve them. No peer review, no editorial gatekeeping, no subscription paywall. Just papers.

The timing was right. The web was emerging, physics had a culture of sharing preprints by mail anyway, and the community was large enough that a centralized server provided real value over the informal circulation that preceded it. Within a few years, the system had expanded beyond hep-th to cover most of physics, then mathematics, then computer science.

When Ginsparg moved to Cornell University in 2001, arXiv moved with him. Cornell took on institutional hosting, technical infrastructure, and a substantial share of the operational costs. For over two decades, arXiv has been housed within Cornell’s library system, which provided organizational stability and a degree of insulation from the funding pressures that would otherwise fall directly on arXiv’s small team.

That arrangement made sense in 2001, when arXiv was receiving tens of thousands of submissions per year. The system now handles over 200,000 new submissions annually and has indexed more than 2.3 million papers. The user base, the operational complexity, and the stakes have all grown well beyond what a university library program was designed to support.

The Sustainability Problem arXiv Has Been Trying to Solve Since 2010

The independence announcement did not come out of nowhere. arXiv launched a formal sustainability initiative around 2010 specifically because it recognized that dependence on a single institution, even a well-resourced one like Cornell, was a structural vulnerability.

The model that emerged from that initiative asks member institutions, primarily research universities with high submission volumes, to pay annual fees on a tiered scale based on how many papers their affiliated researchers submit. The Simons Foundation has provided multi-million dollar grants to support the transition toward this diversified funding model. As of the early 2020s, arXiv was drawing funding from over 200 member institutions worldwide alongside Cornell’s direct contribution.

The sustainability initiative was explicitly a rehearsal for independence. It proved that arXiv could maintain operations with a distributed funding base rather than relying on a single institutional patron. What independence now requires is extending that logic to governance: not just funding the organization independently, but running it independently, with its own board, its own legal structure, and its own decision-making authority.

What Independence Actually Requires

Becoming an independent organization is harder than it sounds for something that has spent its entire existence inside a university. Cornell has not merely been providing servers and a domain name. The university has provided payroll, benefits administration, legal services, compliance infrastructure, procurement, and the institutional credibility that comes with being part of a major research university.

Building those capabilities from scratch, or contracting them out, costs money and management attention. An independent arXiv will need a board with genuine governance authority rather than an advisory structure that sits beneath Cornell’s institutional hierarchy. It will need audited financials, legal incorporation as a nonprofit, and relationships with funders that are predicated on its own institutional standing rather than Cornell’s.

The parallel that comes to mind is the Software Freedom Conservancy model, where projects that started inside or adjacent to universities eventually need their own homes to grow past what an academic institution can reasonably accommodate. The Linux Foundation, the Apache Software Foundation, the Wikimedia Foundation, and similar organizations all represent cases where infrastructure that outgrew its original host had to formalize into something that could operate at scale independently.

arXiv is not a software project in the traditional sense, but the governance problem is structurally similar. The question of who gets to make decisions about submission policies, content moderation, technical direction, and fee structures matters enormously to the research community, and those decisions should be made by an entity that is accountable to that community in a direct way.

The Technical Dimension

The governance transition coincides with an ongoing technical modernization that arXiv has been pursuing for several years. The original system accumulated decades of technical debt; much of it ran on Perl and LaTeX tooling that was, charitably, showing its age.

The arXiv-NG project, which began in earnest around 2017-2018, has been rewriting the submission and processing pipeline as a set of microservices. More recently, arXiv has been generating HTML versions of papers automatically from LaTeX source, which improves accessibility substantially and opens up possibilities for machine-readable scientific content that the static PDF format forecloses.

These technical investments are exactly the kind of thing that is easier to pursue under independent governance. A Cornell library program answers to Cornell’s priorities. An independent organization answers to its member institutions, its funders, and the research community. That is a meaningful difference when deciding whether to spend engineering resources on a new submission interface versus accessibility improvements versus API capabilities for programmatic access.

Why This Matters Beyond arXiv

The scientific community has spent decades building shared infrastructure that exists in an ambiguous institutional space. arXiv, PubMed Central, JSTOR, CrossRef, ORCID and similar services are not quite government agencies, not quite universities, and not quite commercial businesses. They occupy a kind of liminal zone where the public goods argument for their existence is strong but the institutional form for sustaining them is unclear.

The commercial alternative to this kind of independent infrastructure is a preprint server run by Elsevier or Springer Nature, with all the incentive misalignment that implies. The fact that arXiv has remained free to submit to and free to read, with no institutional author fees, for thirty-five years is not a natural outcome of market forces. It is the result of specific funding choices and institutional commitments that could have gone differently.

Independence, if structured well, should make arXiv more durable against the pressures that have eroded other open infrastructure. A university changes presidents, changes strategic priorities, changes budget allocations. An independent organization with a diverse funding base and clear governance can, in principle, maintain continuity through changes that would destabilize an institutionally dependent project.

The key word is “in principle.” The history of nonprofit infrastructure in the software world includes enough cautionary examples of independent organizations that ran into funding crises or governance failures to warrant some caution about the transition. The Perl Foundation, various XML standards bodies, and a long list of domain registries and protocol stewards have all demonstrated that independence is not automatically stability.

What to Watch For

The details that will determine whether this transition goes well are not the ones likely to make headlines. The board composition matters: who has governance authority, and what constituencies they represent. The fee structure for member institutions matters: whether smaller institutions in less wealthy countries can remain members. The relationship with funders like the Simons Foundation matters, and whether that relationship transfers cleanly to an independent entity.

arXiv has the advantage of a strong brand, a genuinely captive audience, and a track record of operational competence. Thirty-five years of continuous operation across multiple platform generations is not a trivial achievement. The community it serves is motivated to ensure it survives; no researcher wants to watch the canonical record of their field disappear or become inaccessible.

The separation from Cornell is the right structural move. The question is whether the independent arXiv will be built for the next thirty-five years or just for the next five.

Was this interesting?