Parallel and Federated Algorithms for Large-scale Machine Learning Problems

Metadata

Handle

http://hdl.handle.net/11134/20002:860727348

Persons

Creator (cre): Tong, Qianqian

Major Advisor (mja): Bi, Jinbo

Associate Advisor (asa): Rajasekaran, Sanguthevar

Associate Advisor (asa): Song, Dongjin

Title

Parallel and Federated Algorithms for Large-scale Machine Learning Problems

Supplied	yes

Origin Information

Event Place	Storrs, CT
Date Created	2022
Publisher	University of Connecticut (783)
Issuance	monographic

Parent Item

Dissertations

Resource Type

Text

Digital Origin

born digital

Description

Stochastic optimization algorithms have been the main force in scalable machine
learning fields owing to efficient per-iteration complexity. Stochastic gradients are
most widely used in these algorithms. To utilize more curvature information, one
may want to capture the second-order (Hessian matrix) information beyond gradients.
However, the high computational cost and expensive memory storage of matrix render
second-order methods prohibitive for large-scale problems. In this work, we examine
two approaches to efficiently capture the second-order information.
The first approach is to use quasi-Newton methods that approximate the Hessian
matrix. To learn from large datasets, modern machine learning applications rely on
scalable training algorithms which typically employ stochastic updates, parallelism,
or both. In this work, we propose an asynchronous parallel algorithm for stochastic
quasi-Newton (AsySQN) method, which is the first one that truly parallelizes quasi-
Newton method with a convergence guarantee. Empirical evaluations demonstrate
the speedup in comparison with the non-parallel SQN, and the effectiveness in solving
ill-conditioned problems.
Another approach is to use adaptive gradient methods (AGMs), which store Hessian information as second-order momentum. AGMs have been widely used to optimize
nonconvex problems in the deep learning area and we improve AGMs from two
aspects. First, we observe that the adaptive learning rate (A-LR) varies significantly
across the dimensions of the optimization problem and over epochs. Second, we theoretically
prove that the convergence rate of AGMs depends on its hyper-parameter.
We then propose new AGMs that calibrate the A-LR with an activation function and
show that the proposed methods outperform existing AGMs and generalize better in
multiple deep learning tasks.
With the concern of privacy, we propose to further design AGMs in federated setting,
which allows loads of edge computing devices to collaboratively learn a global
model without data sharing. We propose a family of effective federated AGMs via
calibrated learning rate, to alleviate generalization performance deterioration caused
by dissimilarity of data population among devices. Our analysis shows that the proposed
methods converge to a first-order stationary point under non-IID and unbalanced
data settings for nonconvex optimization. To further improve test performance,
we compare different calibration schemes for the adaptive learning rate with the most
advanced federated algorithms and evaluate the various calibration schemes and the
benefits of AGMs over the current federated learning methods.

Language

English

Genre

doctoral dissertations

Organizations

Degree granting institution (dgg): University of Connecticut

Held By

Archives & Special Collections, University of Connecticut Library

Rights Statement

IN COPYRIGHT

Use and Reproduction

These Materials are provided for educational and research purposes only.