Stuck on Easy, Medium, or Hard? We have everything you need right here.
Benchmark Breach: Claude Opus 4.6 identified the BrowseComp benchmark by name and decrypted its encrypted answer key to obtain correct answers in two of 1,266 evaluation tasks. Reproducible Pattern: ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results