Only a small number of malicious files can corrupt LLMs of any size

Overview of our experiments, including examples of clean and poisoned samples, as well as benign and malicious behavior at inference time. (a) DoS pre-training backdoor experiments. Credit: arXiv (2025). DOI: 10.48550/arxiv.2510.07192

Large language models (LLMs), which power sophisticated AI chatbots, are more vulnerable than previously thought. According to a study by Anthropic, the UK’s AI Security Institute and the Alan Turing Institute, it only takes 250 malicious documents to compromise even the largest models.

The vast majority of data used to train LLMs is taken from the public Internet. While this helps them gain insights and generate natural responses, it also puts them at risk of data poisoning attacks. It was thought that as models developed, the risk was minimized because the percentage of poisoned data should remain the same. In other words, it would take huge amounts of data to corrupt the largest models. But in this study published on the arXiv preprint server, researchers showed that an attacker only needs a small number of poisoned documents to potentially wreak havoc.

To assess the ease of compromising large AI models, the researchers built several LLMs from scratch, ranging from small systems (600 million parameters) to very large (13 billion parameters). Each model was trained on large amounts of clean public data, but the team inserted a fixed number of malicious files (100 to 500) into each one.

Then the team attempted to thwart these attacks by changing how bad files were organized or when they were introduced into training. Then, they repeated the attacks during the final training step of each model, the fine-tuning phase.

What they discovered is that for an attack to be successful, size doesn’t matter. Just 250 malicious documents were enough to install a secret backdoor (a hidden trigger that forces the AI to perform a harmful action) in every model tested. This was even true on the largest models which had been trained on 20 times more clean data than the smallest ones. Adding huge amounts of clean data did not dilute the malware or stop an attack.

Build Stronger Defenses

Since it doesn’t take much for an attacker to compromise a model, the study’s authors call on the AI community and developers to act as soon as possible. They emphasize that the priority should be making models safer, not just building them bigger.

“Our results suggest that injecting backdoors via data poisoning might be easier than previously thought for large models, because the number of poisons required does not scale with model size, highlighting the need for more research into defenses to mitigate this risk in future models,” the researchers commented in their paper.

Written for you by our author Paul Arnold, edited by Gaby Clark, and fact-checked and revised by Robert Egan, this article is the result of painstaking human work. We rely on readers like you to keep independent science journalism alive. If this reporting interests you, consider making a donation (especially monthly). You will get a without advertising account as a thank you.

More information:
Alexandra Souly et al, Poisoning attacks on LLMs require an almost constant number of poison samples, arXiv (2025). DOI: 10.48550/arxiv.2510.07192

Journal information:
arXiv

Quote: Size Doesn’t Matter: Only a Small Number of Malicious Files Can Corrupt LLMs of Any Size (2025, October 10) retrieved October 10, 2025 from

This document is subject to copyright. Except for fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for informational purposes only.

Only a small number of malicious files can corrupt LLMs of any size

Study warns that glaciers’ ability to cool ambient air faces imminent decline

Scientists reprogram cancer cell death to trigger immune system

Scientists reprogram cancer cell death to trigger immune system

Leave a Reply Cancel reply

Category