Parallel and Federated Algorithms for Large-scale Machine Learning Problems
Digital Document
Document
Handle |
Handle
http://hdl.handle.net/11134/20002:860727348
|
||||||||
---|---|---|---|---|---|---|---|---|---|
Persons |
Persons
Creator (cre): Tong, Qianqian
Major Advisor (mja): Bi, Jinbo
Associate Advisor (asa): Rajasekaran, Sanguthevar
Associate Advisor (asa): Song, Dongjin
|
||||||||
Title |
Title
Title
Parallel and Federated Algorithms for Large-scale Machine Learning Problems
|
||||||||
Origin Information |
Origin Information
|
||||||||
Parent Item |
Parent Item
|
||||||||
Resource Type |
Resource Type
|
||||||||
Digital Origin |
Digital Origin
born digital
|
||||||||
Description |
Description
Stochastic optimization algorithms have been the main force in scalable machine
learning fields owing to efficient per-iteration complexity. Stochastic gradients are most widely used in these algorithms. To utilize more curvature information, one may want to capture the second-order (Hessian matrix) information beyond gradients. However, the high computational cost and expensive memory storage of matrix render second-order methods prohibitive for large-scale problems. In this work, we examine two approaches to efficiently capture the second-order information. The first approach is to use quasi-Newton methods that approximate the Hessian matrix. To learn from large datasets, modern machine learning applications rely on scalable training algorithms which typically employ stochastic updates, parallelism, or both. In this work, we propose an asynchronous parallel algorithm for stochastic quasi-Newton (AsySQN) method, which is the first one that truly parallelizes quasi- Newton method with a convergence guarantee. Empirical evaluations demonstrate the speedup in comparison with the non-parallel SQN, and the effectiveness in solving ill-conditioned problems. Another approach is to use adaptive gradient methods (AGMs), which store Hessian information as second-order momentum. AGMs have been widely used to optimize nonconvex problems in the deep learning area and we improve AGMs from two aspects. First, we observe that the adaptive learning rate (A-LR) varies significantly across the dimensions of the optimization problem and over epochs. Second, we theoretically prove that the convergence rate of AGMs depends on its hyper-parameter. We then propose new AGMs that calibrate the A-LR with an activation function and show that the proposed methods outperform existing AGMs and generalize better in multiple deep learning tasks. With the concern of privacy, we propose to further design AGMs in federated setting, which allows loads of edge computing devices to collaboratively learn a global model without data sharing. We propose a family of effective federated AGMs via calibrated learning rate, to alleviate generalization performance deterioration caused by dissimilarity of data population among devices. Our analysis shows that the proposed methods converge to a first-order stationary point under non-IID and unbalanced data settings for nonconvex optimization. To further improve test performance, we compare different calibration schemes for the adaptive learning rate with the most advanced federated algorithms and evaluate the various calibration schemes and the benefits of AGMs over the current federated learning methods. |
||||||||
Language |
Language
|
||||||||
Genre |
Genre
|
||||||||
Organizations |
Organizations
Degree granting institution (dgg): University of Connecticut
|
||||||||
Held By | |||||||||
Rights Statement |
Rights Statement
|
||||||||
Use and Reproduction |
Use and Reproduction
These Materials are provided for educational and research purposes only.
|
||||||||
Degree Name |
Degree Name
Doctor of Philosophy
|
||||||||
Degree Level |
Degree Level
Ph.D.
|
||||||||
Degree Discipline |
Degree Discipline
Computer Science and Engineering
|
||||||||
Local Identifier |
Local Identifier
S_34169538
|