Security News > 2024 > May > Anthropic’s Generative AI Research Reveals More About How LLMs Affect Security and Bias
With this map, the researchers can explore how neuron-like data points, called features, affect a generative AI's output.
Some of these features are "Safety relevant," meaning that if people reliably identify those features, it could help tune generative AI to avoid potentially dangerous topics or actions.
How manipulating features affects bias and cybersecurity.
These features might activate in conversations that do not involve unsafe code; for example, the backdoor feature activates for conversations or images about "Hidden cameras" and "Jewelry with a hidden USB drive." But Anthropic was able to experiment with "Clamping" - put simply, increasing or decreasing the intensity of - these specific features, which could help tune models to avoid or tactfully handle sensitive security topics.
Identifying some of the features used by a LLM to connect concepts could help tune an AI to prevent biased speech or to prevent or troubleshoot instances in which the AI could be made to lie to the user.
Anthropic plans to use some of this research to further pursue topics related to the safety of generative AI and LLMs overall, such as exploring what features activate or remain inactive if Claude is prompted to give advice on producing weapons.
News URL
https://www.techrepublic.com/article/anthropic-claude-large-language-model-research/
Related news
- Best AI Security Tools: Top Solutions, Features & Comparisons (source)
- How AI Is Changing the Cloud Security and Risk Equation (source)
- Google claims Big Sleep 'first' AI to spot freshly committed security bug that fuzzing missed (source)
- HackerOne: Nearly Half of Security Professionals Believe AI Is Risky (source)
- AI’s impact on the future of web application security (source)
- How AI Is Transforming IAM and Identity Security (source)
- Microsoft Ignite 2024 Unveils Groundbreaking AI, Security, and Teams Innovations (source)
- Microsoft Fixes AI, Cloud, and ERP Security Flaws; One Exploited in Active Attacks (source)
- CrowdStrike Survey Highlights Security Challenges in AI Adoption (source)