by , , ,
Abstract:
This paper studies the generalization performance of algorithms for solving nonconvex-(strongly)-concave (NC-SC/NC-C) stochastic minimax optimization measured by the stationarity of primal functions. We first establish algorithm-agnostic generalization bounds via uniform convergence between the empirical minimax problem and the population minimax problem. The sample complexities for achieving $\epsilon$-generalization are $\tilde{\mathcal{O}}(d\kappa^2\epsilon^{-2})$ and $\tilde{\mathcal{O}}(d\epsilon^{-4})$ for NC-SC and NC-C settings, respectively, where $d$ is the dimension of the primal variable and $\kappa$ is the condition number. We further study the algorithm-dependent generalization bounds via stability arguments of algorithms. In particular, we introduce a novel stability notion for minimax problems and build a connection between stability and generalization. As a result, we establish algorithm-dependent generalization bounds for stochastic gradient descent ascent (SGDA) and the more general sampling-determined algorithms (SDA).
Reference:
Generalization Bounds of Nonconvex-(Strongly)-Concave Stochastic Minimax Optimization S. Zhang, Y. Hu, L. Zhang, N. HeIn Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR, volume 238, 2024
Bibtex Entry:
@InProceedings{zhang2024generalization,
  title = 	 { Generalization Bounds of Nonconvex-(Strongly)-Concave Stochastic Minimax Optimization },
  author =       {Zhang, Siqi and Hu, Yifan and Zhang, Liang and He, Niao},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {694--702},
  year = 	 {2024},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/zhang24c/zhang24c.pdf}}(d\kappa^2\epsilon^{-2})$ and $\tilde{\mathcal{O}}(d\epsilon^{-4})$ for NC-SC and NC-C settings, respectively, where $d$ is the dimension of the primal variable and $\kappa$ is the condition number. We further study the algorithm-dependent generalization bounds via stability arguments of algorithms. In particular, we introduce a novel stability notion for minimax problems and build a connection between stability and generalization. As a result, we establish algorithm-dependent generalization bounds for stochastic gradient descent ascent (SGDA) and the more general sampling-determined algorithms (SDA). }
}