mundophone

DIGITAL LIFE

How artificial intelligence is poisoned

A recent discovery has alerted researchers and companies developing artificial intelligence. Anthropic, creator of the chatbot Claude, published a study showing how vulnerable language models (LLMs) can be to attacks known as data poisoning.

The research indicates that a large volume of malicious information isn't needed to corrupt an AI system. A few well-planned samples are enough to open "backdoors" capable of altering the behavior of even the most sophisticated models.

Language models learn by analyzing huge volumes of text, images, and code collected from the internet. This training phase is the heart of AI—and also its most vulnerable point.

Data poisoning occurs when attackers intentionally insert manipulated content into this data set. This seemingly harmless material is absorbed by the model and can lead it to perform inappropriate tasks, such as bypassing security filters, disclosing confidential information, or generating malicious responses. Since much of the data used to train AIs is public, the risk is that anyone could spread "digital bait" on forums, social media, or websites, which would then be incorporated into the databases used by developers.

Anthropic's report, titled "Poisoning Attacks on LLMs Require a Nearly Constant Number of Samples," debunks a widely held belief: that it would be necessary to corrupt a large portion of the training data to cause damage.

According to the study, just 250 malicious documents would be enough to compromise models of very different sizes—from systems with 600 million parameters to giants with 13 billion.

In other words, the same dose of poison could affect both an ant and an elephant. This finding makes the problem even more worrying, because creating a few hundred contaminated files is a simple, quick, and inexpensive task.

"Producing 250 malicious documents is trivial compared to millions," the report emphasizes. "This makes the attack accessible to a much larger number of malicious actors than previously thought."

Risk of a vicious cycle... Anthropic warns of the risk of a spiral effect: as models learn and produce content based on potentially tainted data, this same material can return to the internet and reinfect future systems.

Over time, a sequence of tainted training can produce increasingly distorted or vulnerable AIs. This scenario poses not only a technical but also an ethical threat, as the systems would begin to reproduce—and amplify—biases or hidden intentions introduced by third parties.

Although Anthropic's experiment was limited to creating a harmless "backdoor" that simply generated gibberish, the company emphasizes that real attacks could exploit more dangerous flaws, requiring new defense and monitoring mechanisms.

To conduct the study, Anthropic used more than 70 different models, including Claude Haiku's own, as well as open systems such as Meta's Mistral 7B and Llama 1 and 2. The research was conducted in collaboration with the UK AI Security Institute and the Alan Turing Institute, reinforcing the credibility and international reach of the work.

The researchers state that the goal is not to alarm, but to encourage the development of more robust defenses against poisoning attacks. "We want to show that the threat is practical and immediate, not theoretical," says the Anthropic team.

The new frontier of AI security...The study reinforces an uncomfortable message: the more powerful artificial intelligence becomes, the more fragile it is to small human manipulations. The future of digital security may depend less on the size of models and more on the ability to identify and neutralize "poison" before it enters the system.

mundophone

mundophone

Tuesday, October 21, 2025

No comments:

Post a Comment

Report Abuse