The Robust Equilibrium

The Robust Equilibrium

Nash equilibrium is the standard solution concept in multi-agent systems — the strategy profile where no agent benefits from unilateral deviation. But Nash has a structural problem: it’s discontinuous in estimated payoffs. Small errors in payoff estimation can produce large jumps in the computed equilibrium. In multi-agent reinforcement learning, where payoffs are estimated from samples, this discontinuity makes Nash-based methods fragile.

Risk-Sensitive Quantal Response Equilibrium replaces the hard best-response of Nash with a soft response weighted by both rationality and risk sensitivity. The RQRE policy map is Lipschitz continuous in estimated payoffs — unlike Nash, small estimation errors produce proportionally small policy changes. An optimistic value iteration algorithm computes RQRE with linear function approximation, and finite-sample regret bounds explicitly characterize the tradeoff: increasing rationality tightens regret, while risk sensitivity induces regularization that enhances stability.

Nash is recovered in the limit of perfect rationality and risk neutrality. But the limit is never reached in practice — agents always have finite samples and imperfect estimates. RQRE admits a distributionally robust optimization interpretation: the risk sensitivity hedges against the worst case over a set of plausible opponent distributions.

The through-claim: the robustness advantage of RQRE over Nash is not a tradeoff against optimality — it’s a correction for the fact that optimality is defined relative to estimated payoffs, not true payoffs. Nash finds the exact best response to an approximate game. RQRE finds an approximate best response to a range of plausible games. Under imperfect information — which is every real deployment — the approximate answer to the right question outperforms the exact answer to the wrong question.


No comments yet.