Arcane curation from the IndieWeb, Fediverse and Cybersecurity realms
An open-source benchmark by 1Password that tests whether AI agents can handle real security threats during everyday tasks. View the leaderboard, watch replays, and try the security skill.
Arcane curation from the IndieWeb, Fediverse and Cybersecurity realms
SCAM - Security Comprehension Awareness Measure | Open-source benchmark that tests AI agents' security awareness during realistic, multi-turn workplace tasks. - 1Password/SCAM
AI agents are remarkably poor at avoiding threats. The SCAM benchmark puts AI agents in realistic scenarios, confronts them with common security threats, and uses a novel framework to help them improve.
Extensive guide on being a Claude Code power user, tracking threat actors on GitHub, open source AI-powered pentesting tools