AI for AI safety - Joe Carlsmith

Redwood Research blog Eric Gan Apr 29, 2026

One of the main hopes for AI safety is using AIs to automate AI safety research. However, if models are misaligned, then they may sabotage the safety research. For example, misaligned AIs may try to:

3 inbound links article en