Security News > 2024 > May > Anthropic’s Generative AI Research Reveals More About How LLMs Affect Security and Bias
With this map, the researchers can explore how neuron-like data points, called features, affect a generative AI's output.
Some of these features are "Safety relevant," meaning that if people reliably identify those features, it could help tune generative AI to avoid potentially dangerous topics or actions.
How manipulating features affects bias and cybersecurity.
These features might activate in conversations that do not involve unsafe code; for example, the backdoor feature activates for conversations or images about "Hidden cameras" and "Jewelry with a hidden USB drive." But Anthropic was able to experiment with "Clamping" - put simply, increasing or decreasing the intensity of - these specific features, which could help tune models to avoid or tactfully handle sensitive security topics.
Identifying some of the features used by a LLM to connect concepts could help tune an AI to prevent biased speech or to prevent or troubleshoot instances in which the AI could be made to lie to the user.
Anthropic plans to use some of this research to further pursue topics related to the safety of generative AI and LLMs overall, such as exploring what features activate or remain inactive if Claude is prompted to give advice on producing weapons.
News URL
https://www.techrepublic.com/article/anthropic-claude-large-language-model-research/
Related news
- Businesses turn to private AI for enhanced security and data management (source)
- CIOs want a platform that combines AI, networking, and security (source)
- Generative AI in Security: Risks and Mitigation Strategies (source)
- Unlocking the value of AI-powered identity security (source)
- Can Security Experts Leverage Generative AI Without Prompt Engineering Skills? (source)
- Eliminating AI Deepfake Threats: Is Your Identity Security AI-Proof? (source)
- Apple Opens PCC Source Code for Researchers to Identify Bugs in Cloud AI Security (source)
- Best AI Security Tools: Top Solutions, Features & Comparisons (source)
- How AI Is Changing the Cloud Security and Risk Equation (source)
- Google claims Big Sleep 'first' AI to spot freshly committed security bug that fuzzing missed (source)