GeistHaus
log in · sign up

AI for AI safety - Joe Carlsmith

joecarlsmith.com

We should try extremely hard to use AI labor to help address the alignment problem.

1 page links to this URL
Research Sabotage in ML Codebases

One of the main hopes for AI safety is using AIs to automate AI safety research. However, if models are misaligned, then they may sabotage the safety research. For example, misaligned AIs may try to:

3 inbound links article en