Research Sabotage in ML Codebases

blog.redwoodresearch.org

One of the main hopes for AI safety is using AIs to automate AI safety research. However, if models are misaligned, then they may sabotage the safety research. For example, misaligned AIs may try to:

3 pages link to this URL

AI #167: The Prior Restraint Era Begins

Don't Worry About the Vase Zvi Mowshowitz May 7, 2026

The era of training frontier models and then releasing them whenever you wanted?

2 inbound links article en

Pioneering threat assessment and mitigation for AI systems

redwoodresearch.org Apr 28, 2026

Pioneering threat assessment and mitigation for AI systems

0 inbound links en

AI #167: The Prior Restraint Era Begins

Don't Worry About the Vase May 7, 2026

The era of training frontier models and then releasing them whenever you wanted? That was fun while it lasted. It looks likely to be over now. The White House wants to get an advance look and have …

0 inbound links article en Uncategorized aiartificial-intelligencechatgptllmtechnology