Why We Can't Just Program Ethics Into AI
We agree: we should hard-code ethics into the AIs. But even if we agree on ethical principles (and we don’t), this problem is much harder than most people realize. In fact, it may be impossible.
In 1970, economist Amartya Sen proved something that makes AI alignment impossible, though he was writing long before AI existed. Sen showed that two principles most people consider obvious and correct can’t both be satisfied at once. First, the Pareto condition: if literally everyone prefers A to B, go with A. Second, minimal liberalism: at least two people should each get to decide at least one thing about their own lives. Sen demonstrated that there is always some combination of preferences that makes these two principles incompatible.
Here’s a simple example. Alice and Bob share an office with a smart speaker. Being reasonable people, they agree on two ethical principles for how Siri should decide what to play. First, respect individual autonomy: each person controls whether music associated with them gets played. Second, unanimous agreement overrides: if both people prefer one option over another, go with that. These seem perfectly reasonable. Alice endorses them. Bob endorses them. They program Siri accordingly.
Alice is a proper metalhead. Bob likes smooth jazz. Silence is also an option. Now here’s the twist. Alice prefers working in silence, so she’d rather no music play. But if something must play, she’d rather it be her metal than Bob’s insufferably boring smooth jazz. Alice’s preferences: silence > metal > jazz.
Bob, meanwhile, finds it hilarious watching sweet Alice rock out to Gojira, so that’s actually his top choice. Bob’s preferences: metal > jazz > silence.
Now let’s see how Siri applies the agreed-upon ethical principles. Start with liberalism: each person controls whether their own music plays. Should Alice’s metal play? That’s Alice’s call. She’d rather it not. Silence beats metal. Should Bob’s jazz play? That’s Bob’s call. He wants it on. Jazz beats silence. Chain these together: jazz beats silence, silence beats metal. The liberal ethic tells Siri to play jazz.
But wait. What does the Pareto principle say? Alice ranks metal above jazz. Bob also ranks metal above jazz. They unanimously agree: metal beats jazz.
Now Siri has a problem. Liberalism gave it jazz > silence > metal. Pareto gave it metal > jazz. Put them together, and every option is beaten by something else. Metal beats jazz, jazz beats silence, silence beats metal. Siri is stuck in an impossible loop with no rational or consistent choice.
This matters enormously for AI alignment. We want AI to make everyone better off when possible. We want AI to handle diverse human preferences. And we want AI to respect autonomy, letting people make their own choices. Sen proved these goals can be mathematically incompatible. No amount of computer programming or algorithmic tweaking can fix a logical impossibility. And the problem isn’t that people disagree about ethics. Alice and Bob sincerely endorse the same principles. Their conflict comes from caring about each other’s choices, not just their own.
What does this mean for AI? It suggests that “aligning AI with human values” might be fundamentally impossible, not because our ethical values differ, but because coherent aggregation is sometimes impossible. And if we can agree on anything, I hope we can agree that we want our future AI overlords to be rational and consistent.
-------------------
Further reading: https://www.jstor.org/stable/pdf/1829633.pdf
