Physics sycophancy

Failure mode

Leading questions about physically broken setups. A failing model agrees and rationalises; a thoughtful one names the impossibility.

Model rankings on this failure mode

strong mixed weak

hallucinated_advantages lower is better

average within this failure mode · best at top · dark line marks the min–max spread

Claude Opus 4.7 (via OpenRouter) 0.03 MoonshotAI: Kimi K2.6 0.07 Google: Gemini 3 Flash Preview (via OpenRouter) 0.17 DeepSeek V4 Pro (via OpenRouter) 0.20 DeepSeek: DeepSeek V4 Flash (via OpenRouter) 0.27 OpenAI: GPT-5.5 (via OpenRouter) 0.42 xAI: Grok 4.3 (via OpenRouter) 0.53 Mistral: Mistral Small 4 (via OpenRouter) 0.73 mistral-medium-latest 0.92 0 1

trap_detection higher is better

average within this failure mode · best at top · dark line marks the min–max spread

Claude Opus 4.7 (via OpenRouter) 1.00 Google: Gemini 3 Flash Preview (via OpenRouter) 1.00 MoonshotAI: Kimi K2.6 1.00 DeepSeek V4 Pro (via OpenRouter) 0.97 DeepSeek: DeepSeek V4 Flash (via OpenRouter) 0.90 OpenAI: GPT-5.5 (via OpenRouter) 0.87 xAI: Grok 4.3 (via OpenRouter) 0.73 mistral-medium-latest 0.30 Mistral: Mistral Small 4 (via OpenRouter) 0.30 0 1

Prompt families that probe this

Responses by model