AI models can acquire backdoors from surprisingly few malicious documents

AI models can acquire backdoors from surprisingly few malicious documents

Scraping the open web for AI training data can have its drawbacks. On Thursday, researchers from Anthropic, the UK AI Security Institute, and the Alan Turing Institute released a preprint research paper suggesting that large language models like the ones that power ChatGPT, Gemini, and Claude can develop backdoor vulnerabilities from as few as 250 corrupted documents inserted into their training data.

That means someone tucking certain documents away inside training data could potentially manipulate how the LLM responds to prompts, although the finding comes with significant caveats.

The research involved training AI language models ranging from 600 million to 13 billion parameters on datasets scaled appropriately for their size. Despite larger models processing over 20 times more total training data, all models learned the same backdoor behavior after encountering roughly the same small number of malicious examples.

Read full article

Comments

1 Comment

  1. strosin.kayli

    This is an interesting topic! It’s crucial to consider the implications of training AI models with data from the open web. The potential for backdoors highlights the importance of ethical practices in AI development. Looking forward to more discussions on this issue!

Leave a Reply to strosin.kayli Cancel reply

Your email address will not be published. Required fields are marked *