arXiv: Preprints citing preprints

I’ve been reading a lot of AI research lately. Something has been nagging at me: the modern AI research ecosystem has quietly abandoned peer review, and the field has largely stopped noticing. I’m wrestling with understanding what this means, why it bothers me, and maybe recognizing I need to adjust my expectations in these times of incredible, rapid AI-assisted research on AI and AI-related topics.

A Brief History of arXiv

arXiv (pronounced “archive”) launched in 1991 as a preprint server for physicists who wanted to share results quickly before the long formal publication process concluded. The idea was simple and good: don’t lock up knowledge for 18 months behind a slow editorial queue. Let researchers read and build on each other’s work in near real-time.

It worked beautifully for physics. It has since become the dominant dissemination channel for computer science, mathematics, and AI. And in principle, that’s still a good thing. We now have a venue for research that is open, fast, and accessible to everyone regardless of institutional subscription status.

Is this OK? Or has something gone sideways?


The Citation Chain Problem

Here’s the pattern I keep running into: I open a recent AI preprint (something on large language model reasoning, reinforcement learning, or model training) and I look at its reference list. Nearly every citation reads something like:

arXiv preprint arXiv:2501.XXXXX

Not “published in NeurIPS 2025.” Not “Journal of Machine Learning Research, Vol. 22.” Just: preprint.

Some of the most widely cited papers in AI right now are unreviewed technical reports. The Qwen2.5 technical report (Qwen Team, 2024), describing one of the most widely used open-weights model families, is an arXiv preprint with (checking Google Scholar) has over 11,000 citations at the time of this post! The Qwen3 technical report (Yang et al., 2025)? Also a preprint. A paper I’ve recently been studying with my students, TTRL -a paper on test-time reinforcement learning that entire lines of follow-up work have been building on (Zuo et al., 2025) – initially circulated as a preprint before being accepted to NeurIPS 2025, meaning it spent months in heavy citation before clearing any formal bar. The NeurIPS reference has a little over 200 citations already. With respect to the utility of arXiv, as a Mandalorian would put it, “This is the way.”

And then there’s the category that doesn’t even seek conference review: industry technical reports from Meta, Google, Alibaba, and DeepSeek, describing frontier model families. The Llama 3 technical report from Meta (Grattafiori et al., 2024) is an arXiv preprint. DeepSeek-R1 (Guo et al., 2025) – the paper on incentivizing reasoning in large language models via reinforcement learning – circulated as a preprint for months before eventually being published in Nature in September 2025. That eventual publication is worth noting: the community built heavily on it, launched dozens of follow-up preprints, and treated it as established science well before any independent reviewer had looked at it. The formal peer review was really a lagging footnote to a citation trail that was already quite deep and impressive.

What does this means in practice for the AI researcher today? A new paper cites 30 sources, many of which were unreviewed at the time of citation. That new paper is itself a preprint. And within weeks, other preprints will cite it. The epistemic dependency chain is unvalidated all the way down. Should this be of concern?


Where Peer Review Has Gone

Traditional peer review assumed a sequential process: experiment → submission → expert review → revise → publish → cite. That model has effectively collapsed in AI/ML for a few compounding reasons:

  • Speed asymmetry. Peer review takes 6–18 months. The field moves in weeks. By the time a paper clears formal review, it may already have dozens of preprint descendants.
  • Venue congestion. Top conferences like NeurIPS, ICML, and ICLR, to name a few, accept roughly 20–25% of submissions at best. This means a large fraction of legitimate, solid work never clears the bar, not because it’s wrong, but because the venues are already overwhelmed.
  • Industry reports bypassing the queue entirely. Many of the most-cited papers in AI right now aren’t rejected conference submissions languishing on arXiv — they’re technical reports from industrial labs that were never intended for peer-reviewed venues. They describe systems their authors built and deployed, not experiments submitted for external scrutiny. This is a different kind of problem from slow review: it’s the deliberate absence of any review at all.
  • Thin conference review. Even papers that do get accepted at major venues typically receive 2–3 reviews, often written in 2–3 weeks by reviewers who may themselves be preprint-only researchers working in adjacent subfields. Having been a reviewer, it is a time-consuming process. The rewards are intrinsic, of course. It forces me to keep up to date, but in a field like AI today that is inundated with research, the process is too slow. arXiv is not the adversarial, months-long scrutiny that characterizes review in medicine or biology.

The result: the word “preprint” has functionally lost its meaning as a cautionary label. In AI, is a preprint now considered a de facto publication? It surely seems to be the case. It gets cited like one, benchmarked against like one, and built upon like one, often before anyone outside the authors’ own institution has carefully checked the work.


Citation Laundering

There’s a deeper problem I’d call citation laundering: a claim gets repeated across enough preprints that it acquires the social authority of an established fact without ever acquiring the epistemic warrant.

Consider how this plays out in practice. Paper A (a preprint) reports substantial accuracy gains on a reasoning benchmark using a new training method. Papers B, C, D, and E (all preprints) each cite Paper A as a foundational result and build refinements on top of it. Paper F then cites B through E, and its introduction reads as though A’s result is settled science. At no point in this chain has anyone outside the original research group independently verified the foundational claim. If Paper A overstates its result (through cherry-picked benchmarks, a subtle flaw in experimental design, or a training setup that doesn’t generalize at all), then all of B through F inherit that flaw.

This is not a hypothetical. There is a growing body of work raising exactly these concerns. Papers like “No Free Lunch: Rethinking Internal Feedback for LLM Reasoning” (Zhang et al., 2025) and “How Far Can Unsupervised RLVR Scale LLM Training?” (He et al., 2026) examine specific training paradigms — methods that use internal model signals instead of external reward supervision — and show that the gains they report tend to follow a rise-then-fall pattern: performance improves early in training, then collapses below the pre-training baseline. Aggregate benchmark numbers, looked at before that collapse sets in, would appear to be a genuine advance. Notably, both of those critical papers are also arXiv preprints.

The specific findings in those papers are about a particular class of methods, not a sweeping indictment of all AI benchmarking. But they illustrate the general risk well: a training approach can produce results that look compelling at the wrong moment in its training curve, get cited heavily in that window, and the problematic training dynamics surface only later when someone looks harder.

Perhaps this is the new form of peer review in rapidly evolving fields such as AI: release a preprint on arXiv, let it get some press, make the rounds on the socials where people (including the authors themselves) can post and hype up their work, or much more preferably, start discussed on review sites (e.g. https://gotit.pub) until another preprint comes out to critique and improve on prior work.


The Case for arXiv (Being Fair)

I don’t want to be entirely one-sided here, because the alternative, i.e. returning to traditional journal timelines, is not better. In fact, I could argue it is not suitable for AI at this time (though I would disagree with myself on that latter point!)

Peer review has its own well-documented failure modes. It is slow, biased toward established labs and prestigious institutions, and has failed catastrophically to catch replication crises in fields like social psychology and biomedical research that used it faithfully. Peer review is a filter, not a guarantee.

The open-access aspect of arXiv is also genuinely democratizing. Researchers at institutions without expensive journal subscriptions can fully participate in the conversation. That matters.

And the AI community does exercise a form of informal community review. As I mentioned above, it’s pretty safe to say that preprints risk getting publicly scrutinized on social media. Competing labs will work to replicate (or fail to replicate) the work, and will be challenged by follow-up work. For code and results that can be independently reproduced, this is actually quite fast and sometimes more effective than formal review. DeepSeek-R1 is again instructive: within days of its preprint release, multiple groups were attempting to reproduce its results, flag discrepancies, and extend its methods. The stronger the hype behind a paper (remember that huge dive NVDA took after DeepSeek?!?) So, that review process is real, and seems to work… most of the time. It just happens in public, messily, over months, rather than privately before publication.

The limitation of that informal process is also real, though: it works well for claims that can be reproduced from public code and standard benchmarks. It performs poorly for claims that hinge on proprietary data, undisclosed training details, or subtle methodological choices not visible in the paper. Those claims can circulate unchallenged for a long time.


The Problem Is the Conflation

Here’s my actual concern, stated plainly: arXiv is enormously useful as a communication tool, but it has been mistaken for a validation tool.

When a paper citing dozens of unreviewed technical reports is itself cited as authoritative in another preprint, we are building an increasingly tall structure on an unverified foundation. For fast-moving engineering claims that get stress-tested by replication at competing labs, this is uncomfortable but arguably tolerable. For deeper scientific claims about what these models actually learn, how they generalize, and whether reported gains reflect genuine capability acquisition, it is a meaningful epistemic risk that the field has largely chosen to accept without much reflection.

What would be better? At minimum: more careful hedging when citing unreviewed work, especially in introductions and related-work sections where preprints often get laundered into settled background. Journals and conference proceedings could normalize explicitly flagging which cited works were unreviewed at time of publication. Science journalists covering AI results could routinely note preprint status the way they note funding sources. None of these are radical changes in the process. It just requires some agility.

I know, I know. Agility is to academia as oil is to water.

Regardless, the speed of AI research is genuinely exciting. But speed without validation is just noise that moves fast.

It’s a reminder for all of us to be cautious and critical of all new preprints. Maybe we should have been doing this with all new research anyway.

Conscience Is Not One-Sided

I have a lot of respect for Alex Reid. He has a lot of great posts that often help me bring some balance with AI and technology in general. His recent post, “The Conscience of AI Refusal”, raises questions that matter deeply to those of us who spend our professional lives thinking about how, what, and why we teach. Reid draws on a recent resolution passed at the Conference for College Composition and Communication, affirming the right of instructors and students to refuse generative AI in the writing classroom. He built a thoughtful philosophical argument grounding that refusal in conscience. It’s worth a read. I have genuine respect for that argument. But I think it stops one step short of where we actually need to go, and that gap has real consequences for higher education.

I am not here to dismiss conscientious refusal, defend corporate imposition of AI tools on unwilling faculty, or pretend that AI is somehow culturally neutral or free from the market pressures (which Reid rightly identifies). On all of those points, he and I agree completely. No institution should be demanding that all faculty use AI or any other tools, for that matter, if it violates their conscience. Besides, there is that thing called academic freedom (though some may argue elements of that are eroding away, too.)

What I want to challenge is this: the implicit assumption that conscience, in this debate, belongs primarily to one side. It absolutely does not, and should not be construed as such.

Conscience Has Two Edges

Reid argues that if we accept AI as culturally and historically embedded, it becomes “unconscionable” to view it simultaneously as an efficient cognitive proxy. I understand the logic, but I’m not sure it holds here. There is no lack of examples one could consider:

  • The printing press transformed power structures, enabled propaganda and religious divides (and still does), was (and still is) deeply political and (still is) commercially driven, but also genuinely democratized literacy. No one seriously argues we should have refused it on cultural grounds alone.
  • The calculator. Ah, one of my favorite examples. Didn’t math educators have the exact same debate many decades ago, suggesting that using it erodes deep numerical reasoning? It’s culturally situated (built by corporations, shaped by market forces) and it frees cognitive load for higher-order thinking. Both things are true. (If mathematicians think AI isn’t going to affect them, think again.)
  • The textbook is a commercial artifact produced by publishers. It has enormous economic and ideological interests, yet we don’t conclude that using textbooks is epistemically unconscionable.
  • The internet is probably the best example here. It is culturally saturated, corporately controlled, surveilled, and politically weaponized, and used for all sorts of horrific things (social media for starters!) And yes, it’s also the medium through which experts like Reid and I publish our thoughts for the community.
  • Writing itself is one you’ll see over and over in this debate. I was reminded by a friend earlier this year how Plato famously argued that writing would weaken memory and corrupt genuine knowledge! He was not wrong about writing’s cultural embeddedness. He was wrong to conclude that refusal was therefore the conscientious response.

Acknowledging the cultural situatedness of a tool does not, by itself, settle the question of whether or how to use it. More importantly, integration, i.e. the thoughtful, critical, pedagogically intentional use of AI, is itself available as a conscientious act. I’ll say it – I am deliberately bringing AI into my classroom, and I do so because I believe, in good conscience, that I have a responsibility to prepare students for a world already being reshaped by these tools. To send them out without the deep experiences to learn how to use AI in responsible ways, and likewise to critically interrogate and challenge AI intelligently, is, to me, its own form of abdication. That is my conscience speaking. It is no less grounded in shared knowledge than the conscience of refusal, right?

The Symmetry We Must Name

The framework must be applied symmetrically. If refusal can be an act of conscience, then can’t engagement be as well? The moral seriousness of the act is not a property of the direction you choose. It is a property of the care, honesty, and accountability with which you choose it. To deny that symmetry is to do exactly what Reid cautions against in his closing lines: to become a “refuser of the refusers,” generating an infinite regress of competing moral condemnations. He sees this danger clearly when it threatens the refusers. We need to see it equally clearly when it threatens those who integrate.

The Real Threat to Higher Education

Higher education is already in crisis. Trust in institutions is eroding. The political and cultural fractures running through our society run straight through our campuses. Bucknell is not unique in this challenge. The impact will be the same, no matter what type of institution you are. In this environment, nothing accelerates our decline faster than faculty turning on each other over questions of pedagogical conscience. That dynamic is not hypothetical – it is already playing out in our departments. If the dominant message coming out of disciplinary organizations is that one side of this debate has conscience and the other does not, we will drive a deeper wedge into an already fractured community at precisely the moment when students need us to model something better. What we need is not consensus. We need healthy discourse with rational arguments. Not feelings and emotions. Genuine disagreement is healthy. It’s good for any relationship! What we need is mutual recognition: the shared acknowledgment that reasonable, thoughtful, ethically serious educators can look at the same situation and reach different conclusions.

What Good Conscience Actually Requires

Stengers, whom Reid invokes to powerful effect, urges us to “think with” our tradition rather than transcending it through withdrawal. For Stengers, this is not a passive move, but a demand to stay inside the friction and resist the temptation of a clean exit. The AI-refuser still inhabits a world saturated with AI-generated text, AI-assisted research, and is likely using a profound number of tools built on AI (e.g. autocorrect, spell check, predictive text, spam filters, Google Search, navigation with Google Maps, product recommendations with Amazon or Netflix, or fraud detection for questionable credit card purchases, to name a few.) More importantly, the refuser teaches students who will enter AI-shaped workplaces. And likewise, the AI-adopter and AI-integrator will need to carry the weight of what these tools displace, distort, and commodify. Both positions are forms of dwelling in the difficulty. The question is only how we dwell there, and with what level of honesty about the costs. Neither side is clean. Neither is innocent. Both are plagued with challenges moving ahead. What neither side can afford is to treat its own conscience as the universal standard against which the other is measured and found wanting. That is not conscience! That is a mirror being held up as a window – one’s own reflection being mistaken for an objective view.

Here is what I am asking of all of us: Let’s build an academic culture where we can look across the aisle at a colleague who has made a different pedagogical choice and say, “I see what you are doing. I may not make that choice myself. But I recognize that you are doing it seriously, thoughtfully, and in good faith.” And then let’s get back to the shared work of teaching students well, in a world neither of us fully controls. To me, our students need a path that lets them experience both sides of the AI divide and develop the ability to follow their own conscience.

That is the conscience higher education needs right now. And it belongs to both sides.

Is Computer Science Dead?

There is very little doubt in my mind that AI is here. Despite the plethora of thoughts and feelings of faculty across higher ed, AI is here, with all signs pointing to its continued growth and adoption. Its capabilities are growing at an absolutely profound rate, making any effort to try and figure out the best approach seem futile at best. It has introduced a level of disruption unlike anything we’ve seen, and nowhere is this more true than in computer science.

I have had quite a few discussions over the past couple of years about AI. The topic is often around how we are preparing our students. The discussions have involved every direct and indirect member of the Bucknell community, including prospective students and their parents, current students, our alumni, companies that consider hiring our students, staff on campus, and members of our administration at all levels, including the provost’s office, admissions, and university advancement. I also recently spoke before our Board of Trustees (as recently as last week) about how we are working to prepare our students. The only common takeaway from all of these conversations is that we’re all concerned, and we’re all working in our own way to provide the most meaningful path forward for our students.

I have been restructuring my courses to incorporate hands-on experience with AI tools, and I am far from alone in this effort. Many of my colleagues are doing the same. Across all three of our colleges, faculty are working diligently to find meaningful ways to integrate the thoughtful, careful, and responsible use of AI into their courses without encouraging cognitive offloading, while still giving students productive pathways for using these tools effectively. I believe we have a responsibility to equip all students, regardless of major, with strong AI literacy skills so they can thrive in an evolving workforce. However, there is no one uniform approach for all majors. The impact of AI across disciplines is wide and varied. The rapid evolution of AI technology sometimes leaves one feeling hopeless in education. It seems like the moment we convince ourselves we may have solid ideas moving forward, they quickly become irrelevant as new capabilities arise. Indeed, these are exciting times!

Computer science is being hit hard, but we have survived downturns repeatedly in the past, and we will survive this. However, this downturn is fundamentally different. The core tools of our trade are no longer relevant! It’s time for computer scientists and the discipline itself to evolve. We must recognize that we are in the midst of a new industrial revolution in our field. Everything we’ve been doing up to this point needs substantial renovation and reimagination. And, let’s agree that the field of Artificial Intelligence itself ought not to be considered only the domain of computer science. Sure, it rose in popularity largely because of substantial advances in computer science, computer engineering, electrical engineering (all brought to you our two lovely parents, math and physics!) Today, AI is not just a computer science or STEM initiative; it is a campuswide responsibility, one where the humanities and social sciences must play an essential role. How can it not? How can we teach artificial intelligence without having students in AI also learn about the very systems these computational algorithms are designed to replicate, i.e. human intelligence. It’d seem prudent that, at a minimum, any core AI curricular endeavor ought to include content that delves into human cognition.

In this new landscape, the urgent task of computer science education is not to outcompete AI at writing code. I cannot think of a more futile task. Writing code is no longer the exclusive marker of expertise and computational prowess it once was. (And for those like me who grew up thriving as teenage nerds writing our own games, and programs to solve our math problems in high school – just because we could – can I just say… damn, that sucks! I loved coding!) AI tools today can perform much of that work far more quickly, effectively, and efficiently than any programmer could.

So, is computer science dead? Absolutely not! We need to evolve, and those who refuse to evolve risk becoming irrelevant. We must cultivate and integrate the human capacities that AI cannot replace. Our students must learn to think critically and independently, to define and understand meaningful problems, and to evaluate AI-generated solutions for correctness, complexity, efficiency, and fitness to real-world constraints. They need experience working with people, assessing real-world problems through client communication, and defining their problems with solid goals and constraints that translate to plans that can easily be carried out in tandem with AI. They need to reason about consequences when systems fail or behave unpredictably, because AI, like humans, will never produce perfect solutions, especially as long as humans are orchestrating them. They must be able to communicate clearly with both technical and nontechnical audiences, design for people by optimizing for user experience and considering accessibility needs, and collaborate across disciplines to understand the social and organizational contexts in which their systems will operate. And ultimately, they must develop the ethical judgment to decide when and how a system should be built, deployed, or rejected altogether. These are not peripheral skills; they are the core of what it means to be a computing professional in an age where AI can generate code but cannot assume responsibility for its impact.

Computer Science Skills in the AI Era

It’s worth repeating. Let’s summarize some of the skills we want to be injecting in our courses throughout our curricula in CS. To be honest, I do not see anything new in my list. In fact, one could argue that AI is an amazing realization that is finally allowing us to put emphasis on things that computer science should have been emphasizing all along:

Critical thinking and independent reasoning – evaluating AI outputs rather than accepting them uncritically

Problem definition and framing – identifying what question is actually worth solving

Solution evaluation – assessing correctness, algorithmic complexity, efficiency, and real-world fitness

Consequence reasoning – anticipating failure modes, unintended behavior, and downstream effects

Communication – speaking and writing clearly for both technical and nontechnical audiences

Collaboration and teamwork – leading and contributing in interdisciplinary groups

UX and human-centered design – designing for real users, not just functional outputs

Accessibility and inclusion – building systems that serve diverse populations from the start

Debugging and system sense-making — diagnosing complex and AI-assisted systems rather than treating outputs as oracles

Data literacy – understanding data quality, bias, and the limits of models trained on it

Metacognition – knowing when AI is likely wrong and when to slow down or seek other perspectives, step back, and reconsider your plan

Adaptability and lifelong learning – keeping pace with rapidly evolving tools and ecosystems

Domain reasoning – applying computing thoughtfully within specific real-world fields and constraints

Ethical judgment – reasoning about fairness, responsibility, privacy, power, and accountability

Inspiring AI Literacy Initiatives

I’ll close with a handful of examples I’ve been collecting of how colleges and universities around the country are embracing AI with a liberal arts mindset:

  • Why AI is ‘resurrecting’ the liberal arts for the Class of 2026: At Wake Forest, educators are observing that the rapid evolution of workplace technologies is actually increasing the value of traditional liberal arts strengths. Rather than making humanistic study obsolete, AI platforms require graduates who can critically evaluate outputs, communicate effectively, and apply broad context to complex problems.
  • Building the Future: Why Teaching AI to Liberal Arts Students Is Critical Work: Dartmouth’s Tuck School of Business emphasizes that liberal arts students possess the exact intellectual framework necessary for an AI-augmented workforce. Their initiative demonstrates that non-STEM majors can excel in tech-driven environments when taught to leverage their natural abilities to ask probing questions and maintain ethical oversight.
  • Social scientists embrace the AI moment: Stanford researchers are highlighting how AI is fundamentally transforming empirical workflows and data analysis within the social sciences. By integrating AI into these fields, students learn to navigate new research methodologies while applying human judgment to automated text analysis and summarization tools.
  • Generative AI is raising new questions for liberal arts education: The University of Richmond is actively confronting the ethical and pedagogical challenges brought by generative models by fostering cross-disciplinary discussions. They are creating practical campus resources that help faculty and students collaborate to critically engage with AI tools across all departments.
  • Breaking Faculty Barriers to AI Literacy: The Digital Education Council provides a measured look at the widespread institutional struggle to move from mere interest in AI to actual classroom implementation. They argue that addressing faculty hesitation requires dedicated support structures, clear incentives, and practical guidance rather than just top-down mandates.