The Restricted Descent
The Restricted Descent
Steepest descent chooses the locally optimal direction at every step. It is the greediest possible strategy — pure local improvement with no structural constraint on which direction to move. And its convergence rate is famously poor: zigzagging across narrow valleys, wasting steps on directions that are locally optimal but globally unproductive.
Berasategui, Berná, and Falcó show that restricting the allowed directions to elements of a predefined dictionary — tensor formats, neural network units, parameterized families — produces faster convergence, not slower. The restriction eliminates the unproductive directions. What remains are the structurally meaningful ones, and following them produces algebraic rates that beat classical steepest descent and can reach arbitrarily high polynomial or exponential convergence in specific regimes.
The key theoretical contribution is a geometric condition using norming sets that guarantees the dictionary spans enough of the space through duality, without requiring the dictionary’s linear span to be dense. The dictionary doesn’t need to reach everywhere — it needs to reach the places that matter for the optimization landscape.
The through-claim: unrestricted choice is not freedom but noise. Steepest descent’s zigzagging is the cost of having too many options — each step is locally optimal, but the sequence of steps is globally incoherent. Restricting the search to a structured dictionary imposes coherence at the cost of local optimality, and the tradeoff favors the restriction. Sometimes the fastest path through a space is not the one that follows the gradient but the one that follows the structure.