LARC for Large batch size training

What is LARC

LARC (Large Batch Training of Convolutional Networks) is a technique proposed by Yang You, Igor Gitman, Boris Ginsburg in for improving the convergence of large batch size trainings. LARC uses the ratio between gradient and parameter magnitudes is used to calculate an adaptive local learning rate for each individual parameter.

See the LARC paper for calculation of learning rate. In practice, it modifies the gradients of parameters as a proxy for modifying the learning rate of the parameters.

How to enable LARC

VISSL supports the LARC implementation from NVIDIA’s Apex LARC. To use LARC, users need to set config option OPTIMIZER.use_larc=True. VISSL exposes LARC parameters that users can tune. Full list of LARC parameters exposed by VISSL:

  name: "sgd"
  use_larc: False  # supported for SGD only for now
    clip: False
    eps: 1e-08
    trust_coefficient: 0.001


LARC is currently supported for SGD optimizer only.

Using Apex

In order to use Apex, VISSL provides anaconda and pip packages of Apex (compiled with Optimzed C++ extensions/CUDA kernels). The Apex packages are provided for all versions of CUDA (9.2, 10.0, 10.1, 10.2, 11.0), PyTorch >= 1.4 and Python >=3.6 and <=3.9.

Follow VISSL’s instructions to install apex in pip and instructions to install apex in conda.