Future

Agustin V. Startari
Agustin V. Startari

Posted on

The Plagiarism Machine: How AI Repackages Human Knowledge Without Credit

_Why large language models are celebrated as innovators while silently built on the unacknowledged labor of millions of authors.
_Large language models (LLMs) promise effortless content creation: essays in seconds, books on demand, reports that appear at the click of a button. Yet this apparent miracle of productivity hides a disturbing fact. These models operate by recombining the work of others. The sentences they generate are stitched from patterns extracted from books, newspapers, online forums, research papers, and code repositories. The resulting text is fluent, but it is not original in the scholarly sense. It is a form of plagiarism at scale, where attribution is absent by design.

The paradox is that this same mechanism, the extraction of human language at scale, creates practical value. Students can learn faster, professionals can draft reports, and researchers can receive rapid summaries. The usefulness is undeniable. At the same time, this usefulness exists only because of a knowledge commons that has been built over centuries. Teachers wrote textbooks, librarians preserved archives, volunteers edited Wikipedia, and scholars produced peer-reviewed research. LLMs are useful not because they invent, but because they extract.

Why It Matters

The ethical stakes are enormous. In academia, even close paraphrase without citation counts as plagiarism (American Psychological Association, 2020, p. 254). In journalism, borrowing phrasing or framing without acknowledgment can lead to dismissal. Yet when a generative system reproduces argument structures or mimics style, the practice is often celebrated as innovation. This double standard corrodes the norms that protect authorship and creativity.

At the same time, the knowledge commons, the infrastructure of human effort on which these systems depend, is underfunded and increasingly fragile. University presses close, libraries face budget cuts, and open-source communities struggle to survive. If AI companies continue to extract without reinvesting, the very foundation of their usefulness will collapse. What looks like free knowledge today will become a desert tomorrow.

*Real-World Examples
*

  • Journalism: When an AI tool produces an article summarizing climate science, it relies heavily on reporting by outlets such as The Guardian or The New York Times. The model does not cite these sources. The journalist’s labor becomes invisible, while the AI company profits from the generated summary.
  • Education: A student who uses an LLM to explain Rawls’s “original position” may receive a well-phrased paraphrase of canonical arguments. Yet Rawls is not cited, nor are the commentators who refined his theory. In academic contexts, that omission would be considered plagiarism.
  • Software Development: GitHub’s Copilot, powered by LLMs, has produced code identical to public repositories. Developers have found their own work reproduced without attribution or respect for licenses. This transforms open-source collaboration into uncompensated resource extraction.
  • Literature: Authors like Jane Austen and Toni Morrison are invoked by AI tools to “mimic style.” The distinctive rhythms and rhetorical devices that define their voice are treated as adjustable parameters rather than intellectual achievements. Imagine a musician sampling entire albums without credit and selling the tracks. That is what LLMs do to literature.

*Current Examples
*

**OpenAI vs. The New York Times: **In late 2023, The New York Times sued OpenAI, alleging that its models reproduced copyrighted articles almost verbatim. The case demonstrates how wording leakage is not hypothetical but a real legal and ethical issue shaping the future of generative AI.

Stability AI and Artists’ Lawsuits: Visual artists, including Sarah Andersen and Kelly McKernan, filed lawsuits claiming that their work was used without consent to train image generation models. This parallels the plagiarism issue in text, showing how creative styles are extracted and repackaged without credit or payment.

Code and Licensing Conflicts: Developers from the open-source community have reported that AI coding assistants reproduce large sections of licensed code. In some cases, this code was under GPL or MIT licenses, meaning that legal obligations were ignored in outputs marketed as original.

**Educational Platforms: **Universities in the United States and Europe now issue formal guidance on plagiarism detection for AI-assisted work. Some institutions have updated their integrity policies to state that LLM outputs, if unattributed, qualify as plagiarism at the same level as copying from a peer or a published source.

These cases show that the theoretical categories of plagiarism—wording leakage, style appropriation, and idea-level recombination—are not confined to abstract analysis. They are playing out in courts, classrooms, and workplaces today.

*Call to Action
*

The solution is not abandonment of generative AI. The solution is reciprocity. Attribution layers must be developed so that outputs point back to likely sources. Compensation pools must redistribute revenue to authors, libraries, and repositories. Universities, publishers, and public agencies must adopt procurement rules that enforce data provenance and reinvestment standards.

If we continue to accept plagiarism at scale as innovation, we risk destroying the very commons that makes these tools valuable. The call is clear: reinvest in the infrastructures of knowledge, or watch them disappear.

*Author Information
*

Agustin V. Startari
Linguistic theorist and researcher in historical studies. Author of Grammars of Power, Executable Power, and The Grammar of Objectivity.
ORCID: https://orcid.org/0000-0002-5792-2016

Zenodo: https://zenodo.org/me/uploads?q=&f=shared_with_me%3Afalse&l=list&p=1&s=10&sort=newest

SSRN Author Page: https://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=7639915

Website: https://www.agustinvstartari.com/

Ethos

I do not use artificial intelligence to write what I don’t know. I use it to challenge what I do. I write to reclaim the voice in an age of automated neutrality. My work is not outsourced. It is authored. — Agustin V. Startari

Top comments (0)