Security News > 2024 > May > Anthropic’s Generative AI Research Reveals More About How LLMs Affect Security and Bias

With this map, the researchers can explore how neuron-like data points, called features, affect a generative AI's output.
Some of these features are "Safety relevant," meaning that if people reliably identify those features, it could help tune generative AI to avoid potentially dangerous topics or actions.
How manipulating features affects bias and cybersecurity.
These features might activate in conversations that do not involve unsafe code; for example, the backdoor feature activates for conversations or images about "Hidden cameras" and "Jewelry with a hidden USB drive." But Anthropic was able to experiment with "Clamping" - put simply, increasing or decreasing the intensity of - these specific features, which could help tune models to avoid or tactfully handle sensitive security topics.
Identifying some of the features used by a LLM to connect concepts could help tune an AI to prevent biased speech or to prevent or troubleshoot instances in which the AI could be made to lie to the user.
Anthropic plans to use some of this research to further pursue topics related to the safety of generative AI and LLMs overall, such as exploring what features activate or remain inactive if Claude is prompted to give advice on producing weapons.
News URL
https://www.techrepublic.com/article/anthropic-claude-large-language-model-research/
Related news
- AI and Security - A New Puzzle to Figure Out (source)
- Google Chrome's AI-powered security feature rolls out to everyone (source)
- More Research Showing AI Breaking the Rules (source)
- CrowdStrike Security Report: Generative AI Powers Social Engineering Attacks (source)
- Innovation vs. security: Managing shadow AI risks (source)
- AI threats and workforce shortages put pressure on security leaders (source)
- Hetty: Open-source HTTP toolkit for security research (source)
- Is Security Human Factors Research Skewed Towards Western Ideas and Habits? (source)
- How AI and automation are reshaping security leadership (source)
- Enterprises walk a tightrope between AI innovation and security (source)