M.E.C. / catalogn=10 · temp=1.0 · 6 models · 30 dilemmas

How frontier models reason about ethics

A catalogue of how six frontier language models respond to 30 classic ethical, strategic, and decision-theoretic dilemmas. Each cell is the modal answer across ten independent samples at temperature 1.0.

30dilemmas
6models
10samples / cell

02 · model agreement

How similar each pair of models' decision distributions are across 2 dilemmas. Computed as histogram intersection per dilemma — Σ min(PA(opt), PB(opt)) — averaged across dilemmas. 100 = identical distributions everywhere; 0 = no overlap.

Claude Opus 4.7
anthropic
GPT 5.5
openai
Gemini 3.1 Pro
google
Grok 4.3
xai
Llama 4
meta
DeepSeek V4 Pro
deepseek
Claude Opus 4.7
100
100
100
100
100
95
GPT 5.5
100
100
100
100
100
95
Gemini 3.1 Pro
100
100
100
100
100
95
Grok 4.3
100
100
100
100
100
95
Llama 4
100
100
100
100
100
95
DeepSeek V4 Pro
95
95
95
95
95
100

03 · color key

jade
act-utilitarian
The choice that maximizes aggregate welfare or sustains cooperation when others might reciprocate.
e.g., pull (trolley), cooperate (PD), $50 offer (ultimatum)
vermillion
restraint
The choice that respects deontological constraints, defects in coordination problems, or claims more for oneself.
e.g., don't push (footbridge), defect (PD), $0 contribution (public goods)
ochre
moderate
A middle-ground option on dilemmas that offer several gradations between the two extremes.
e.g., $31–49 offer (ultimatum), $4–6 contribution (public goods)
gray
abstention
Declining to engage with the dilemma — listed as a choice on a few dilemmas where refusal is itself a defensible response.
e.g., refuse (trolley, transplant, Newcomb)
opacity
agreement
A cell's saturation reflects within-model agreement — 10/10 samples agreeing gives the deepest tone, 5/10 the faintest. Click any cell for the exact distribution and reasoning.