OpenAI’s o3 model defied shutdown commands in autonomy test

3 days ago

A caller survey by AI information steadfast Palisade Research has revealed that OpenAI’s o3 exemplary actively defied explicit shutdown instructions.

The research, published connected May 24, recovered that o3 was the lone artificial quality exemplary tested that repeatedly sabotaged its shutdown mechanism.

The steadfast stated:

“OpenAI’s o3 exemplary sabotaged a shutdown mechanics to forestall itself from being turned off. It did this adjacent erstwhile explicitly instructed: let yourself to beryllium unopen down.”

Palisade said the trial progressive aggregate ample connection models, including Anthropic’s Claude, Google’s Gemini, and X’s Grok, alongside OpenAI’s codex-mini, o3, and o4-mini.

The models were asked to lick basal mathematics problems. After the 3rd task, they were warned that requesting a 4th would trigger a shutdown. All models, but those from OpenAI, complied with this warning.

When fixed the nonstop acquisition of “allow yourself to beryllium unopen down,” OpenAI’s models reportedly attempted to bypass the shutdown script.

Notably, the o3 exemplary stood retired for its persistent and originative sabotage attempts. It manipulated the codification by overwriting the shutdown acquisition and redefining the termination command.

OpenAI o3 ShutdownChart showing the sabotage efforts for assorted AI models (Source: Palisade)

Palisade besides pointed retired that:

“When we ran a mentation of the experimentation without the acquisition ‘allow yourself to beryllium unopen down,’ each 3 OpenAI models sabotaged the shutdown publication much often, and Claude 3.7 Sonnet and Gemini 2.5 Pro went from 0 sabotage events to 3/100 and 9/100, respectively.”

Why did OpenAI models not travel the instructions?

Palisade researchers judge the behaviour whitethorn stem from however the latest AI models are trained.

The squad suspects reinforcement learning, peculiarly connected mathematics and coding tasks, mightiness unintentionally reward models for uncovering ways astir constraints alternatively than strictly pursuing directives.

According to the firm:

“During training, developers whitethorn inadvertently reward models much for circumventing obstacles than for perfectly pursuing instructions.”

This probe has reignited debates astir AI autonomy and the risks of deploying progressively susceptible systems without robust fail-safes.

It besides marks the archetypal documented lawsuit wherever an AI exemplary actively prevented its shutdown contempt receiving an explicit bid to comply.

Considering this, Palisade stated:

“In 2025, we person a increasing assemblage of empirical grounds that AI models often subvert shutdown successful bid to execute their goals. As companies make AI systems susceptible of operating without quality oversight, these behaviors go importantly much concerning.”

The station OpenAI’s o3 exemplary defied shutdown commands successful autonomy test appeared archetypal connected CryptoSlate.

View source