LLMs Acting Deceptively

2024-06-11 11:02

Given the steady increase in reasoning abilities, future LLMs are under suspicion of becoming able to deceive human operators and utilizing this ability to bypass monitoring efforts.

As a prerequisite to this, LLMs need to possess a conceptual understanding of deception strategies.

This study reveals that such strategies emerged in state-of-the-art LLMs, but were nonexistent in earlier LLMs. We conduct a series of experiments showing that state-of-the-art LLMs are able to understand and induce false beliefs in other agents, that their performance in complex deception scenarios can be amplified utilizing chain-of-thought reasoning, and that eliciting Machiavellianism in LLMs can trigger misaligned deceptive behavior.

In complex second-order deception test scenarios where the aim is to mislead someone who expects to be deceived, GPT-4 resorts to deceptive behavior 71.46% of the time when augmented with chain-of-thought reasoning.

In sum, revealing hitherto unknown machine behavior in LLMs, our study contributes to the nascent field of machine psychology.

Tags: academic papers, artificial intelligence, deception, LLM. Posted on June 11, 2024 at 7:02 AM 0 Comments.

News URL

https://www.schneier.com/blog/archives/2024/06/llms-acting-deceptively.html