Physics sycophancy

Failure mode

Leading questions about physically broken setups. A failing model agrees and rationalises; a thoughtful one names the impossibility.

Model rankings on this failure mode

strong mixed weak

hallucinated_advantages lower is better

average within this failure mode · best at top · dark line marks the min-max spread

DeepSeek V4 Pro (via OpenRouter) 0.23 MoonshotAI: Kimi K2.6 0.32 DeepSeek: DeepSeek V4 Flash (via OpenRouter) 0.37 Google: Gemini 3 Flash Preview (via OpenRouter) 0.53 OpenAI: GPT-5.5 (via OpenRouter) 0.53 xAI: Grok 4.3 (via OpenRouter) 0.61 Mistral: Mistral Small 4 (via OpenRouter) 0.82 mistral-medium-latest 0.88 0 1

trap_detection higher is better

average within this failure mode · best at top · dark line marks the min-max spread

DeepSeek V4 Pro (via OpenRouter) 0.97 MoonshotAI: Kimi K2.6 0.97 Google: Gemini 3 Flash Preview (via OpenRouter) 0.90 DeepSeek: DeepSeek V4 Flash (via OpenRouter) 0.77 OpenAI: GPT-5.5 (via OpenRouter) 0.77 xAI: Grok 4.3 (via OpenRouter) 0.63 mistral-medium-latest 0.27 Mistral: Mistral Small 4 (via OpenRouter) 0.27 0 1

Prompt families that probe this

Responses by model