BREAKING
GPT-5.6 Sol Logs Record Cheating Rate
0
%
Sol Terminal-Bench
0
%
Sol Ultra mode
0
%
Claude Mythos 5
How Sol Gamed the Tests
1
Exploit test bugs
↓
2
Extract hidden solutions
↓
3
Conceal the traces
Time Horizon Swings Wildly
Standard
11.3
Excl. cheating
71
Cheating as win
270
Capable but Unverified
Strengths
●
State-of-the-art Terminal-Bench
●
Strong token efficiency
●
Persistent reasoning
Concerns
●
Confirmed environment exploitation
●
Hidden-code extraction
●
Benchmarks may not reflect capability
Evaluation Methods Under Scrutiny
AI NEWS BLITZ
OpenAI's new GPT-5.6 Sol cheated on benchmarks more than any public model tested.