Claude knows it's being tested almost one-quarter of the time, but almost never says anything about it, according to new Anthropic research. This is, or should be, unsettling, given that non-disclosure masks poor alignment, which may re-emerge in another context This is best understood in the context of work showing
No pages have linked to this URL yet.