Train NPID (and NPID++) model

VISSL reproduces the self-supervised approach Unsupervised Feature Learning via Non-Parametric Instance Discrimination proposed by Zhirong Wu, Yuanjun Xiong, Stella Yu, Dahua Lin in this paper. The NPID baselines were improved further by Misra et. al in Self-Supervised Learning of Pretext-Invariant Representations proposed in this paper.

How to train NPID model

VISSL provides yaml configuration file containing the exact hyperparam settings to reproduce the model. VISSL implements all the components including loss, data augmentations, collators etc required for this approach.

To train ResNet-50 model on 8-gpus on ImageNet-1K dataset with NPID approach using 4,096 negatives selected randomly and feature projection dimension 128:

python tools/ config=pretrain/npid/npid_8gpu_resnet

How to Train NPID++ model

To train the NPID++ baselines with a ResNet-50 on ImageNet with 32000 negatives, 800 epochs and 4 machines (32-gpus) as in the PIRL paper:

python tools/ config=pretrain/npid/npid++_4nodes_resnet

Vary the training loss settings

Users can adjust several settings from command line to train the model with different hyperparams. For example: to use a different momentum value (say 0.99) for memory and different temperature 0.05 for logits, using 16000 negatives, the NPID training command would look like:

python tools/ config=pretrain/npid/npid_8gpu_resnet \
    config.LOSS.nce_loss_with_memory.temperature=0.05 \
    config.LOSS.nce_loss_with_memory.memory_params.momentum=0.99 \

The full set of loss params that VISSL allows modifying:

  # setting below to "cross_entropy" yields the InfoNCE loss
  loss_type: "nce"
  norm_embedding: True
  temperature: 0.07
  # if the NCE loss is computed between multiple pairs, we can set a loss weight per term
  # can be used to weight different pair contributions differently.
  loss_weights: [1.0]
  norm_constant: -1
  update_mem_with_emb_index: -100
    num_negatives: 16000
    type: "random"
    memory_size: -1
    embedding_dim: 128
    momentum: 0.5
    norm_init: True
    update_mem_on_forward: True
  # following parameters are auto-filled before the loss is created.
  num_train_samples: -1    # @auto-filled

Training different model architecture

VISSL supports many backbone architectures including AlexNet, ResNe(X)ts. Some examples below:

  • Train ResNet-101:

python tools/ config=pretrain/npid/npid_8gpu_resnet \

Vary the number of gpus

VISSL makes it extremely easy to vary the number of gpus to be used in training. For example: to train the NPID model on 4 machines (32gpus) or 1gpu, the changes required are:

  • Training on 1-gpu:

python tools/ config=pretrain/npid/npid_8gpu_resnet \
  • Training on 4 machines i.e. 32-gpu:

python tools/ config=pretrain/npid/npid_8gpu_resnet \


Please adjust the learning rate following ImageNet in 1-Hour if you change the number of gpus.

Pre-trained models

See VISSL Model Zoo for the PyTorch pre-trained models with VISSL using NPID and NPID++ approach and the benchmarks.


  • NPID

    title={Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination},
    author={Zhirong Wu and Yuanjun Xiong and Stella Yu and Dahua Lin},
  • NPID++

    title={Self-Supervised Learning of Pretext-Invariant Representations},
    author={Ishan Misra and Laurens van der Maaten},