Train DINO model


VISSL reproduces the self-supervised approach called DINO presented in Emerging Properties in Self-Supervised Vision Transformers which was proposed by Mathilde Caron, Hugo Touvron, Ishan Misra, Herve Jegou, Julien Mairal, Piotr Bojanowski and Armand Joulin in this paper.

How to train DINO model

VISSL provides a yaml configuration file containing the exact hyperparameter settings to reproduce the model. VISSL implements all the components including loss, data augmentations, collators etc required for this approach.

To train DeiT-S/16 model on 16-gpus on ImageNet-1K dataset:

python tools/ config=pretrain/dino/dino_16gpus_deits16

Vary the training loss settings

Users can adjust several settings from command line to train the model with different hyperparams. For example: to use a different temperature 0.2 for the student, the training command would look like:

python tools/ config=pretrain/dino/dino_16gpus_deits16 \

The full set of loss params that VISSL allows modifying:

  momentum: 0.996 # base momentum parameter used for teacher model
  student_temp: 0.1 # student temperature
  teacher_temp_min: 0.04 # warmup teacher temperature
  teacher_temp_max: 0.07 # base teacher temperature
  teacher_temp_warmup_iters: 37500 # 30 epochs
  crops_for_teacher: [0, 1] # crops used by the teacher
  ema_center: 0.9 # momentum parameter used for updating the teacher center
  normalize_last_layer: true # should we l2-normalize the last layer
  output_dim: 65536  # automatically inferred from model HEAD settings