Training Deep Learning Models with Norm-Constrained LMOs

Thomas Pethick; Wanyun Xie; Kimon Antonakopoulos; Zhenyu Zhu; Antonio Silveti-Falls; Volkan Cevher

Preprints, Working Papers, ... Year : 2025

Training Deep Learning Models with Norm-Constrained LMOs

Entraînement des Modèles d'Apprentissage Profond avec des LMO à Norme Contrainte

(1, 2) , (1, 2) , (1, 2) , (1, 2) , (3, 4, 5, 6) , (1, 2)

1
2
3
4
5
6

Thomas Pethick

Function : Author

Ecole Polytechnique Fédérale de Lausanne

Laboratory for Information and Inference Systems

Wanyun Xie

Function : Author

Ecole Polytechnique Fédérale de Lausanne

Laboratory for Information and Inference Systems

Kimon Antonakopoulos

Function : Author

Ecole Polytechnique Fédérale de Lausanne

Laboratory for Information and Inference Systems

Zhenyu Zhu

Function : Author

Ecole Polytechnique Fédérale de Lausanne

Laboratory for Information and Inference Systems

Antonio Silveti-Falls

Function : Author

Centre de vision numérique

CentraleSupélec

Université Paris-Saclay

Inria Saclay - Ile de France

Volkan Cevher

Function : Author

Ecole Polytechnique Fédérale de Lausanne

Laboratory for Information and Inference Systems

Abstract

In this work, we study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball. We propose a new stochastic family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems. The resulting update rule unifies several existing optimization methods under a single framework. Furthermore, we propose an explicit choice of norm for deep architectures, which, as a side benefit, leads to the transferability of hyperparameters across model sizes. Experimentally, we demonstrate significant speedups on nanoGPT training without any reliance on Adam. The proposed method is memory-efficient, requiring only one set of model weights and one set of gradients, which can be stored in half-precision.

Dans ce travail, nous étudions des méthodes d'optimisation qui exploitent l'oracle de minimisation linéaire (LMO) sur une boule normée. Nous proposons une nouvelle famille d'algorithmes stochastiques qui utilise le lmo pour s'adapter à la géométrie du problème et, de manière surprenante, démontrons qu'ils peuvent être appliqués à des problèmes non contraints. La mise à jour qui en résulte unifie plusieurs méthodes d'optimisation existantes sous un même cadre. De plus, nous proposons un choix explicite de norme pour les architectures profondes qui, comme avantage secondaire, permet la transférabilité des hyperparamètres entre différentes tailles de modèles. Expérimentalement, nous démontrons des accélérations significatives sur l'entraînement de nanoGPT sans aucune dépendance à Adam. La méthode proposée est efficace en mémoire, ne nécessitant qu'un ensemble de poids du modèle et un ensemble de gradients, qui peuvent être stockés en demi-précision.

Keywords

Deep learning Neural networks Conditional gradient Frank-wolfe steepest descent Stochastic optimization Nonconvex optimization Orthogonalization

Domains

Machine Learning [cs.LG] Optimization and Control [math.OC] Machine Learning [stat.ML]

Fichier principal

ICML2025_Pethick_Spectral_FW (4).pdf (824)

Origin	Files produced by the author(s)

Antonio Silveti-Falls : Connect in order to contact the contributor

https://hal.science/hal-04941364

Submitted on : Tuesday, February 11, 2025-5:47:44 PM

Last modification on : Thursday, February 20, 2025-3:16:27 AM

Dates and versions

hal-04941364 , version 1 (11-02-2025)

Identifiers

HAL Id : hal-04941364 , version 1

Cite

Thomas Pethick, Wanyun Xie, Kimon Antonakopoulos, Zhenyu Zhu, Antonio Silveti-Falls, et al.. Training Deep Learning Models with Norm-Constrained LMOs. 2025. ⟨hal-04941364⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA CVN CENTRALESUPELEC INRIA2 INRIA-EPFL TDS-MACS UNIV-PARIS-SACLAY GS-COMPUTER-SCIENCE

0 View

0 Download

Training Deep Learning Models with Norm-Constrained LMOs

Entraînement des Modèles d'Apprentissage Profond avec des LMO à Norme Contrainte

Abstract

Keywords

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Share