The Grammar Cost

The Grammar Cost

Grammar-constrained decoding forces a language model to produce only outputs that parse under a given context-free grammar — valid JSON, syntactically correct code, well-formed queries. At each step, a reachability oracle masks out tokens that would make the output unparseable. The semantic constraint — which strings are valid — is defined by the language. But the computational cost of enforcing that constraint depends on which grammar generates that language.

Language-equivalent grammars produce identical token masks for every prefix. The constraint is semantically the same. But the structural ambiguity cost differs dramatically: right-recursive grammars incur O(1) work per token, while concatenation-based grammars incur Theta(t²) per token, cumulating to Theta(n³) total. Same language, same constraints, cubic versus linear cost.

Worse: the lower bound is fundamental. Any sound, retrieval-efficient, parse-preserving online masking engine must incur Omega(t²) work per token on certain grammar families. The cost is in the grammar, not the implementation.

The through-claim: grammar equivalence is semantic equivalence but not computational equivalence. Two grammars that define exactly the same set of strings can impose wildly different runtime costs on the system that enforces them. This means grammar selection for constrained decoding is an optimization problem over equivalent representations — choosing among grammars that accept identical languages but differ by polynomial factors in runtime. The grammar is not just a specification of what’s valid; it’s a program that the decoder executes, and like any program, how it’s written determines how fast it runs.


No comments yet.