Train DINO model¶
VISSL reproduces the self-supervised approach called
DINO presented in Emerging Properties in Self-Supervised Vision Transformers which was proposed by
Mathilde Caron, Hugo Touvron, Ishan Misra, Herve Jegou, Julien Mairal, Piotr Bojanowski and Armand Joulin in this paper.
How to train DINO model¶
VISSL provides a yaml configuration file containing the exact hyperparameter settings to reproduce the model. VISSL implements all the components including loss, data augmentations, collators etc required for this approach.
To train DeiT-S/16 model on 16-gpus on ImageNet-1K dataset:
python tools/run_distributed_engines.py config=pretrain/dino/dino_16gpus_deits16
Vary the training loss settings¶
Users can adjust several settings from command line to train the model with different hyperparams. For example: to use a different temperature 0.2 for the student, the training command would look like:
python tools/run_distributed_engines.py config=pretrain/dino/dino_16gpus_deits16 \ config.LOSS.dino_loss.student_temp=0.2
The full set of loss params that VISSL allows modifying:
dino_loss: momentum: 0.996 # base momentum parameter used for teacher model student_temp: 0.1 # student temperature teacher_temp_min: 0.04 # warmup teacher temperature teacher_temp_max: 0.07 # base teacher temperature teacher_temp_warmup_iters: 37500 # 30 epochs crops_for_teacher: [0, 1] # crops used by the teacher ema_center: 0.9 # momentum parameter used for updating the teacher center normalize_last_layer: true # should we l2-normalize the last layer output_dim: 65536 # automatically inferred from model HEAD settings