🧑‍🎓 About Me

I am Congliang Chen (陈淙靓). I received my B.S. from the School of Electronics Engineering and Computer Science, Peking University, and my Ph.D. from The Chinese University of Hong Kong, Shenzhen, advised by Prof. Zhi-Quan (Tom) Luo. Now, I am working as a research assistant professor at Shenzhen Loop Area Institute. My research focuses on numerical computation, optimization algorithms for large language models, and kernel generation and optimization.

My work on distributed Adam establishes theoretical acceleration in multi-worker settings, and I developed a communication-efficient Adam variant that enables neural network training with only 1-bit communication per parameter. I also contributed to Adam-mini, a lightweight and practical optimizer variant tailored for efficient large-scale training. In addition, I worked on GEM, which studies how to maintain output/response diversity during SFT to mitigate mode collapse and improve generalization. My research has been published in venues such as JMLR, IEEE TSP, and top-tier conferences including NeurIPS and ICLR with .

Recruiting: We’re recruiting Research Assistants and PhD students to work on LLM optimization and compuational acceleration.

Topics include:

Optimization Algorithms for Large Language Models
Model Adaptation and Computational Acceleration

If you’re interested, please email me with (1) your CV, (2) a short summary of your research/engineering experience, and (3) links to papers/code (if any).

📝 Publications

(* indicates equal contributions, † indicates corresponding author).

Journal

Towards practical adam: Non-convexity, convergence theory, and mini-batch acceleration
Congliang Chen*, Li Shen*, Fangyu Zou*, and Wei Liu, Journal of Machine Learning Research 23, no. 229 (2022): 1-47.
Efficient-adam: Communication-efficient distributed adam
Congliang Chen, Li Shen, Wei Liu, and Zhi-Quan Luo, IEEE Transactions on Signal Processing 71 (2023): 3257-3266.
Quantized adam with error feedback
Congliang Chen, Li Shen, Haozhi Huang, and Wei Liu, ACM Transactions on Intelligent Systems and Technology (TIST) 12, no. 5 (2021): 1-26.
A unified analysis of AdaGrad with weighted aggregation and momentum acceleration
Li Shen, Congliang Chen, Fangyu Zou, Zequn Jie, Ju Sun, and Wei Liu, IEEE Transactions on Neural Networks and Learning Systems 35, no. 10 (2023): 14482-14490.

Conference

Communication efficient primal-dual algorithm for nonconvex nonsmooth distributed optimization
Congliang Chen, Jiawei Zhang, Li Shen, Peilin Zhao, and Zhiquan Luo, In International conference on artificial intelligence and statistics, pp. 1594-1602. PMLR, 2021.
Adam-mini: Use fewer learning rates to gain more.
Yushun Zhang*, Congliang Chen*, Ziniu Li, Tian Ding, Chenwei Wu, Diederik P. Kingma, Yinyu Ye, Zhi-Quan Luo, and Ruoyu Sun, In The Thirteenth International Conference on Learning Representations.
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
Ziniu Li, Congliang Chen, Tian Xu, Zeyu Qin, Jiancong Xiao, Zhi-Quan Luo, and Ruoyu Sun, In The Thirteenth International Conference on Learning Representations.
Why transformers need adam: A hessian perspective
Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, and Zhiquan Luo, Advances in neural information processing systems 37 (2024): 131786-131823.
Adam can converge without any modification on update rules
Yushun Zhang, Congliang Chen, Naichen Shi, Ruoyu Sun, and Zhi-Quan Luo, Advances in neural information processing systems 35 (2022): 28386-28399.

📖 Educations

2018.08 - 2025.03, Ph.D., The Chinese University of Hong Kong, Shenzhen, Shenzhen, China.
2014.09 - 2018.06, Undergraduate, Peking University, Beijing, China.

💻 Internships

2019.07 - 2023.07, Tencent AI Lab, Shenzhen, China.

🏫 Services

Reviwers of ICML, NeurIPS, ICLR, ICCV, CVPR.