Train SimCLR model¶
VISSL reproduces the self-supervised approach A Simple Framework for Contrastive Learning of Visual Representations proposed by Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton in this paper.
How to train SimCLR model¶
VISSL provides a yaml configuration file containing the exact hyperparameter settings to reproduce the model. VISSL implements all the components including loss, data augmentations, collators etc required for this approach.
To train ResNet-50 model on 4-machines (8-nodes) on ImageNet-1K dataset with SimCLR approach using MLP-head, loss temperature of 0.1 and feature projection dimension 128:
python tools/run_distributed_engines.py config=pretrain/simclr/simclr_8node_resnet
Using Synchronized BatchNorm for training¶
For training SimCLR models, we convert all the BatchNorm layers to Global BatchNorm. For this, VISSL supports PyTorch SyncBatchNorm
module and NVIDIA’s Apex SyncBatchNorm layers. Set the config params MODEL.SYNC_BN_CONFIG.SYNC_BN_TYPE
to apex
or pytorch
.
If you want to use Apex, VISSL provides anaconda
and pip
packages of Apex (compiled with Optimzed C++ extensions/CUDA kernels). The Apex
packages are provided for all versions of CUDA (9.2, 10.0, 10.1, 10.2, 11.0), PyTorch >= 1.4 and Python >=3.6 and <=3.9
.
To use SyncBN during training, one needs to set the following parameters in configuration file:
MODEL:
SYNC_BN_CONFIG:
CONVERT_BN_TO_SYNC_BN: True
SYNC_BN_TYPE: apex
# 1) if group_size=-1 -> use the VISSL default setting. We synchronize within a
# machine and hence will set group_size=num_gpus per node. This gives the best
# speedup.
# 2) if group_size>0 -> will set group_size=value set by user.
# 3) if group_size=0 -> no groups are created and process_group=None. This means
# global sync is done.
GROUP_SIZE: 8
Using LARC for training¶
SimCLR training uses LARC from NVIDIA’s Apex LARC. To use LARC, users need to set config option
OPTIMIZER.use_larc=True
. VISSL exposed LARC parameters that users can tune. Full list of LARC parameters exposed by VISSL:
OPTIMIZER:
name: "sgd"
use_larc: False # supported for SGD only for now
larc_config:
clip: False
eps: 1e-08
trust_coefficient: 0.001
Note
LARC is currently supported for SGD optimizer only.
Vary the training loss settings¶
Users can adjust several settings from command line to train the model with different hyperparams. For example: to use a different temperature 0.2 for logits and different output projection dimension of 256:
python tools/run_distributed_engines.py config=pretrain/simclr/simclr_8node_resnet \
config.LOSS.simclr_info_nce_loss.temperature=0.2 \
config.LOSS.simclr_info_nce_loss.buffer_params.embedding_dim=256
The full set of loss params that VISSL allows modifying:
simclr_info_nce_loss:
temperature: 0.1
buffer_params:
embedding_dim: 128
world_size: 64 # automatically inferred
effective_batch_size: 4096 # automatically inferred
Training different model architecture¶
VISSL supports many backbone architectures including ResNe(X)ts, wider ResNets. Some examples below:
Train ResNet-101:
python tools/run_distributed_engines.py config=pretrain/simclr/simclr_8node_resnet \
config.MODEL.TRUNK.NAME=resnet config.MODEL.TRUNK.RESNETS.DEPTH=101
Train ResNet-50-w2 (2x wider):
python tools/run_distributed_engines.py config=pretrain/simclr/simclr_8node_resnet \
config.MODEL.TRUNK.NAME=resnet config.MODEL.TRUNK.RESNETS.DEPTH=101 \
config.MODEL.TRUNK.RESNETS.WIDTH_MULTIPLIER=2
Training with Multi-Crop data augmentation¶
The original SimCLR approach is proposed for 2 positives per image. We expand the SimCLR approach to work for more positives following the multi-crop augmentation proposed in SwAV paper. See SwAV paper https://arxiv.org/abs/2006.09882 for the multi-crop augmentation details.
Multi-crop augmentation can allow using more positives and also positives of different resolutions for SimCLR. VISSL provides
a version of SimCLR loss for multi-crop training multicrop_simclr_info_nce_loss
. In order to train SimCLR with multi-crop
augmentation say crops 2x160 + 4x96
i.e. 2 crops of resolution 160 and 4 crops of resolution 96, the training command looks like:
python tools/run_distributed_engines.py config=pretrain/simclr/simclr_8node_resnet \
+config/pretrain/simclr/transforms=multicrop_2x160_4x96
The multicrop_2x160_4x96.yaml
configuration file changes 2 things:
Transforms: Simply replace the
ImgReplicatePil
transform (which creates 2 copies of image) withImgPilToMultiCrop
which creates multi-crops of multiple resolutions.Loss: Use the loss
multicrop_simclr_info_nce_loss
instead which inherits fromsimclr_info_nce_loss
and modifies the loss to work for multi-crop input.
Varying the multi-crop augmentation settings¶
VISSL allows modifying the crops to use. Full settings exposed:
TRANSFORMS:
- name: ImgPilToMultiCrop
total_num_crops: 6 # Total number of crops to extract
num_crops: [2, 4] # Specifies the number of type of crops.
size_crops: [160, 96] # Specifies the height (height = width) of each patch
crop_scales: [[0.08, 1], [0.05, 0.14]] # Scale of the crop
Varying the multi-crop loss settings¶
The full set of loss params that VISSL allows modifying:
multicrop_simclr_info_nce_loss:
temperature: 0.1
num_crops: 2 # automatically inferred from data transforms
buffer_params:
world_size: 64 # automatically inferred
embedding_dim: 128
effective_batch_size: 4096 # automatically inferred
Training with different MLP head¶
Original SimCLR approach used 2-layer MLP head. VISSL allows attaching any different desired head. In order to modify the MLP head (more layers, different dimensions etc), see the following examples:
3-layer MLP head: Use the following head (example for ResNet model)
MODEL:
HEAD:
PARAMS: [
["mlp", {"dims": [2048, 2048], "use_relu": True}],
["mlp", {"dims": [2048, 2048], "use_relu": True}],
["mlp", {"dims": [2048, 128]}],
]
Use 2-layer MLP with hidden dimension 4096: Use the following head (example for ResNet model)
MODEL:
HEAD:
PARAMS: [
["mlp", {"dims": [2048, 4096], "use_relu": True}],
["mlp", {"dims": [4096, 128]}],
]
Vary the number of epochs¶
In order to vary the number of epochs to use for training SimCLR models, one can achieve this simply
from command line. For example, to train the SimCLR model for 100 epochs instead, pass the num_epochs
parameter from command line:
python tools/run_distributed_engines.py config=pretrain/simclr/simclr_8node_resnet \
config.OPTIMIZER.num_epochs=100
Vary the number of gpus¶
VISSL makes it extremely easy to vary the number of gpus to be used in training. For example: to train the SimCLR model on 8-gpus or 1gpu, the changes required are:
Training on 1-gpu:
python tools/run_distributed_engines.py config=pretrain/simclr/simclr_8node_resnet \
config.DISTRIBUTED.NUM_PROC_PER_NODE=1 config.DISTRIBUTED.NUM_NODES=1
Training on 8-gpus:
python tools/run_distributed_engines.py config=pretrain/simclr/simclr_8node_resnet \
config.DISTRIBUTED.NUM_PROC_PER_NODE=8 config.DISTRIBUTED.NUM_NODES=1
Note
Please adjust the learning rate following ImageNet in 1-Hour if you change the number of gpus.
Pre-trained models¶
See VISSL Model Zoo for the PyTorch pre-trained models with VISSL for SimCLR and the benchmarks.
Citations¶
@misc{chen2020simple,
title={A Simple Framework for Contrastive Learning of Visual Representations},
author={Ting Chen and Simon Kornblith and Mohammad Norouzi and Geoffrey Hinton},
year={2020},
eprint={2002.05709},
archivePrefix={arXiv},
primaryClass={cs.LG}
}