The Reversed Advantage
The Reversed Advantage
Machine learning interatomic potentials encode physics through inductive biases — symmetry constraints, equivariance, explicit long-range terms. These biases improve sample efficiency: with limited training data, a model that already “knows” about rotational symmetry needs fewer examples to learn it from data. The assumption has been that these biases help at all scales.
They don’t. AllScAIP, a straightforward attention-based model using all-to-all node attention without explicit physics terms, demonstrates that the advantage of inductive biases diminishes and eventually reverses as data and model size scale up. In the low-data/small-model regime, physics-based architectures win on sample efficiency. At scale — 100 million training samples — a model with fewer built-in assumptions matches or exceeds them.
The exception: long-range interactions. All-to-all attention remains critical regardless of scale. No amount of data compensates for an architecture that structurally cannot see distant atoms. The model achieves state-of-the-art accuracy on molecular benchmarks and runs stable long-timescale molecular dynamics recovering experimental observables — density and heat of vaporization — without explicit physics beyond energy conservation.
The through-claim: inductive biases are a form of data compression. They substitute structural knowledge for training examples. But compressed representations impose their own constraints — they channel learning through the specific physics the designer anticipated. At sufficient scale, the data itself provides richer constraints than any designer’s prior beliefs. The bias that helped with small data becomes the ceiling at large data. The one exception — long-range attention — isn’t an inductive bias in the same sense. It’s an architectural capacity: the ability to see, not a belief about what you’ll see.