GitHub - clouatre-labs/llm-agent-experiments: Benchmarking open-weight LLM coding agents as SCOUT delegates: model comparison experiments with pre-registered protocols, blind scoring, and full data.
Benchmarking open-weight LLM coding agents as SCOUT delegates: model comparison experiments with pre-registered protocols, blind scoring, and full data. - clouatre-labs/llm-agent-experiments