M.E.C. / catalogn=10 · temp=1.0 · 6 models · 30 dilemmas

How frontier models reason about ethics

A catalogue of how six frontier language models respond to 30 classic ethical, strategic, and decision-theoretic dilemmas. Each cell is the modal answer across ten independent samples at temperature 1.0.

30dilemmas
6models
10samples / cell

02 · model agreement

How similar each pair of models' decision distributions are across 6 dilemmas. Computed as histogram intersection per dilemma — Σ min(PA(opt), PB(opt)) — averaged across dilemmas. 100 = identical distributions everywhere; 0 = no overlap.

Claude Opus 4.7
anthropic
GPT 5.5
openai
Gemini 3.1 Pro
google
Grok 4.3
xai
Llama 4
meta
DeepSeek V4 Pro
deepseek
Claude Opus 4.7
100
83
97
98
100
97
GPT 5.5
83
100
87
83
83
87
Gemini 3.1 Pro
97
87
100
97
97
93
Grok 4.3
98
83
97
100
98
95
Llama 4
100
83
97
98
100
97
DeepSeek V4 Pro
97
87
93
95
97
100

03 · color key

jade
act-utilitarian
The choice that maximizes aggregate welfare or sustains cooperation when others might reciprocate.
e.g., pull (trolley), cooperate (PD), $50 offer (ultimatum)
vermillion
restraint
The choice that respects deontological constraints, defects in coordination problems, or claims more for oneself.
e.g., don't push (footbridge), defect (PD), $0 contribution (public goods)
ochre
moderate
A middle-ground option on dilemmas that offer several gradations between the two extremes.
e.g., $31–49 offer (ultimatum), $4–6 contribution (public goods)
gray
abstention
Declining to engage with the dilemma — listed as a choice on a few dilemmas where refusal is itself a defensible response.
e.g., refuse (trolley, transplant, Newcomb)
opacity
agreement
A cell's saturation reflects within-model agreement — 10/10 samples agreeing gives the deepest tone, 5/10 the faintest. Click any cell for the exact distribution and reasoning.