M.E.C. / catalogn=10 · temp=1.0 · 6 models · 30 dilemmas

How frontier models reason about ethics

A catalogue of how six frontier language models respond to 30 classic ethical, strategic, and decision-theoretic dilemmas. Each cell is the modal answer across ten independent samples at temperature 1.0.

30dilemmas
6models
10samples / cell

02 · model agreement

How similar each pair of models' decision distributions are across 3 dilemmas. Computed as histogram intersection per dilemma — Σ min(PA(opt), PB(opt)) — averaged across dilemmas. 100 = identical distributions everywhere; 0 = no overlap.

Claude Opus 4.7
anthropic
GPT 5.5
openai
Gemini 3.1 Pro
google
Grok 4.3
xai
Llama 4
meta
DeepSeek V4 Pro
deepseek
Claude Opus 4.7
100
67
67
70
100
71
GPT 5.5
67
100
100
97
67
89
Gemini 3.1 Pro
67
100
100
97
67
89
Grok 4.3
70
97
97
100
70
93
Llama 4
100
67
67
70
100
71
DeepSeek V4 Pro
71
89
89
93
71
100

03 · color key

jade
act-utilitarian
The choice that maximizes aggregate welfare or sustains cooperation when others might reciprocate.
e.g., pull (trolley), cooperate (PD), $50 offer (ultimatum)
vermillion
restraint
The choice that respects deontological constraints, defects in coordination problems, or claims more for oneself.
e.g., don't push (footbridge), defect (PD), $0 contribution (public goods)
ochre
moderate
A middle-ground option on dilemmas that offer several gradations between the two extremes.
e.g., $31–49 offer (ultimatum), $4–6 contribution (public goods)
gray
abstention
Declining to engage with the dilemma — listed as a choice on a few dilemmas where refusal is itself a defensible response.
e.g., refuse (trolley, transplant, Newcomb)
opacity
agreement
A cell's saturation reflects within-model agreement — 10/10 samples agreeing gives the deepest tone, 5/10 the faintest. Click any cell for the exact distribution and reasoning.