VISSL documentation¶
VISSL is a computer vision library for state-of-the-art Self-Supervised Learning research with PyTorch. VISSL aims to accelerate research cycle in self-supervised learning: from designing a new self-supervised task to evaluating the learned representations.
What is VISSL?¶

VISSL is a computer VIsion library for state-of-the-art Self-Supervised Learning research with PyTorch. VISSL aims to accelerate research cycle in self-supervised learning: from designing a new self-supervised task to evaluating the learned representations. Key features include:
Reproducible implementation of SOTA in Self-Supervision
: All existing SOTA in Self-Supervision are implemented - SwAV, SimCLR, MoCo(v2), PIRL, NPID, NPID++, DeepClusterV2, ClusterFit, RotNet, Jigsaw. Also supports supervised trainings.Benchmark suite
: Variety of benchmarks tasks including linear image classification (places205, imagenet1k, voc07, inaturalist), full finetuning, semi-supervised benchmark, nearest neighbor benchmark, object detection (Pascal VOC and COCO).Ease of Usability
: easy to use using yaml configuration system based on Hydra.Modular
: Easy to design new tasks and reuse the existing components from other tasks (objective functions, model trunk and heads, data transforms, etc.). The modular components are simple drop-in replacements in yaml config files.Scalability
: Easy to train model on 1-gpu, multi-gpu and multi-node. Several components for large scale trainings provided as simple config file plugs: Activation checkpointing, ZeRO, FP16, LARC, Stateful data sampler, data class to handle invalid images, large model backbones like RegNets, etc.Model Zoo
: Over 60 pre-trained self-supervised model weights.
We hope that VISSL will democratize self-supervised learning and accelerate advancements in self-supervised learning. We also hope that it will enable research in some important research directions like Generalizability of models etc.
Hope you enjoy using VISSL!
Installation¶
Our installation is simple and we provide pre-built binaries (pip, conda) and also instructions for building from source (pip, conda).
Requirements¶
At a high level, project requires following system dependencies.
Linux
Python>=3.6.2 and <3.9
PyTorch>=1.4
torchvision (matching PyTorch install)
CUDA (must be a version supported by the pytorch version)
OpenCV (Optional)
Installing VISSL from pre-built binaries¶
Install VISSL conda package¶
This assumes you have conda 10.2.
conda create -n vissl python=3.8
conda activate vissl
conda install -c pytorch pytorch=1.7.1 torchvision cudatoolkit=10.2
conda install -c vissl -c iopath -c conda-forge -c pytorch -c defaults apex vissl
For other versions of PyTorch, Python, CUDA, please modify the above instructions with the desired version. VISSL provides Apex packages for all combinations of pytorch, python and compatible cuda.
Install VISSL pip package¶
This example is with pytorch 1.5.1 and cuda 10.1. Please modify the PyTorch version, cuda version and accordingly apex version below for the desired settings.
Step 1: Create Virtual environment (pip)
python3 -m venv ~/venv
. ~/venv/bin/activate
Step 2: Install PyTorch, OpenCV and APEX (pip)
We use PyTorch=1.5.1 with CUDA 10.1 in the following instruction (user can chose their desired version).
There are several ways to install opencv, one possibility is as follows.
For APEX, we provide pre-built binary built with optimized C++/CUDA extensions provided by APEX.
pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
pip install opencv-python
pip install apex -f https://dl.fbaipublicfiles.com/vissl/packaging/apexwheels/py38_cu101_pyt151/download.html
Note that, for the APEX install, you need to get the versions of CUDA, PyTorch, and Python correct in the URL. We provide APEX versions with all possible combinations of Python, PyTorch, CUDA. Select the right APEX Wheels if you desire a different combination.
On Google Colab, everything until this point is already set up. You install APEX there as follows.
import sys
import torch
version_str="".join([
f"py3{sys.version_info.minor}_cu",
torch.version.cuda.replace(".",""),
f"_pyt{torch.__version__[0:5:2]}"
])
!pip install apex -f https://dl.fbaipublicfiles.com/vissl/packaging/apexwheels/{version_str}/download.html
Step 3: Install VISSL
pip install vissl
# verify installation
python -c 'import vissl'
Installing VISSL from source¶
The following instructions assume that you have desired CUDA version installed and working.
Install from source in PIP environment¶
Step 1: Create Virtual environment (pip)
python3 -m venv ~/venv
. ~/venv/bin/activate
Step 2: Install PyTorch (pip)
pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 -f https://download.pytorch.org/whl/torch_stable.html
Step 3: Install APEX (pip)
pip install apex -f https://dl.fbaipublicfiles.com/vissl/packaging/apexwheels/py37_cu101_pyt171/download.html
Step 4: Install VISSL
# clone vissl repository
cd $HOME && git clone --recursive https://github.com/facebookresearch/vissl.git && cd $HOME/vissl/
# install vissl dependencies
pip install --progress-bar off -r requirements.txt
pip install opencv-python
# update classy vision install to current master
pip uninstall -y classy_vision
pip install classy-vision@https://github.com/facebookresearch/ClassyVision/tarball/master
# install vissl dev mode (e stands for editable)
pip install -e .[dev]
# verify installation
python -c 'import vissl, apex, cv2'
Install from source in Conda environment¶
Step 1: Create Conda environment
If you don’t have anaconda, run this bash scrip to install conda.
conda create -n vissl_env python=3.7
source activate vissl_env
Step 2: Install PyTorch (conda)
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
Step 3: Install APEX (conda)
conda install -c vissl apex
Step 4: Install VISSL
Follow step4 instructions from the PIP installation above.
That’s it! You are now ready to use this code.
Optional: Install Apex from source (common for both pip and conda)
Apex installation requires that you have a latest nvcc so the c++ extensions can be compiled with latest gcc (>=7.4). Check the APEX website for more instructions.
# see https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#virtual-architecture-feature-list
# to select cuda architecture you want to build
CUDA_VER=10.1 TORCH_CUDA_ARCH_LIST="5.0;5.2;5.3;6.0;6.1;6.2;7.0;7.5" ./docker/common/install_apex.sh
Getting Started with VISSL¶
This document provides a brief introduction of usage of built-in command line tools provided by VISSL.
Quick Start with VISSL¶
We provide a quick overview for training SimCLR self-supervised model on 1-gpu with VISSL.
Install VISSL¶
For installation, please follow our installation instructions.
Setup dataset¶
We will use ImageNet-1K dataset and assume the downloaded data to look like:
imagenet_full_size
|_ train
| |_ <n0......>
| | |_<im-1-name>.JPEG
| | |_...
| | |_<im-N-name>.JPEG
| |_ ...
| |_ <n1......>
| | |_<im-1-name>.JPEG
| | |_...
| | |_<im-M-name>.JPEG
| | |_...
| | |_...
|_ val
| |_ <n0......>
| | |_<im-1-name>.JPEG
| | |_...
| | |_<im-N-name>.JPEG
| |_ ...
| |_ <n1......>
| | |_<im-1-name>.JPEG
| | |_...
| | |_<im-M-name>.JPEG
| | |_...
| | |_...
Running SimCLR Pre-training on 1-gpu¶
If VISSL is built from source¶
We provide a config to train model using the pretext SimCLR task on the ResNet50 model.
Change the DATA.TRAIN.DATA_PATHS
path to the ImageNet train dataset folder path.
python3 run_distributed_engines.py \
hydra.verbose=true \
config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_folder] \
config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
config.DATA.TRAIN.DATA_PATHS=["/path/to/my/imagenet/folder/train"] \
config=test/integration_test/quick_simclr \
config.CHECKPOINT.DIR="./checkpoints" \
config.TENSORBOARD_SETUP.USE_TENSORBOARD=true
If using pre-built conda/pip VISSL packages¶
Users need to set the dataset and obtain the builtin tool for training. Follow the steps:
Step1: Setup ImageNet1K dataset
If you installed pre-built VISSL packages, we will set the ImageNet1K dataset following our data documentation and tutorial. NOTE that we need to register the dataset with VISSL.
In your python interpretor:
>>> json_data = {
"imagenet1k_folder": {
"train": ["<img_path>", "<lbl_path>"],
"val": ["<img_path>", "<lbl_path>"]
}
}
>>> from vissl.utils.io import save_file
>>> save_file(json_data, "/tmp/configs/config/dataset_catalog.json")
>>> from vissl.data.dataset_catalog import VisslDatasetCatalog
>>> print(VisslDatasetCatalog.list())
['imagenet1k_folder']
>>> print(VisslDatasetCatalog.get("imagenet1k_folder"))
{'train': ['<img_path>', '<lbl_path>'], 'val': ['<img_path>', '<lbl_path>']}
Step2: Get the builtin tool and yaml config file
We will use the pre-built VISSL tool for training run_distributed_engines.py and the config file. Run
cd /tmp/ && mkdir -p /tmp/configs/config
wget -q -O configs/__init__.py https://dl.fbaipublicfiles.com/vissl/tutorials/configs/__init__.py
wget -q -O configs/config/quick_1gpu_resnet50_simclr.yaml https://dl.fbaipublicfiles.com/vissl/tutorials/configs/quick_1gpu_resnet50_simclr.yaml
wget -q https://dl.fbaipublicfiles.com/vissl/tutorials/run_distributed_engines.py
Step3: Train
cd /tmp/
python3 run_distributed_engines.py \
hydra.verbose=true \
config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_folder] \
config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
config.DATA.TRAIN.DATA_PATHS=["/path/to/my/imagenet/folder/train"] \
config=quick_1gpu_resnet50_simclr \
config.CHECKPOINT.DIR="./checkpoints" \
config.TENSORBOARD_SETUP.USE_TENSORBOARD=true
YAML Configuration system¶
VISSL uses Hydra for configuration management. The configuration files are simple YAML files. Hydra provides flexible yet powerful configuration system.
Users can create configs for only a specific component of their training (for example: using different datasets) and overwrite a master configuration setting for that specific component. This way, Hydra allows reusability of configs.
Hydra also allows to modify the configuration values from command line and
Hydra also offers an intuitive solution to adding new keys to a configuration.
The usage looks like:
python <binary-name>.py config=<yaml_config path>/<yaml_config_file_name>
Detecting new configuration directories in Hydra¶
VISSL provides configuration files here and uses the Hydra Plugin VisslPlugin
to automatically search for the configs
folder in VISSL.
If users want to create their own configuration directories and not use the configs
directory provided by VISSL, then users must
add their own Plugin following the VisslPlugin
.
Note
For any new folder containing configuration files, Hydra requires creating a __init__.py
empty file. Hence, if users
create a new configuration directory, they must create empty __init__.py
file.
How to use VISSL provided config files¶
For example, to train SwAV model on 8-nodes (32-gpu) with VISSL:
python tools/run_distributed_engines.py config=pretrain/swav/swav_8node_resnet
where swav_8node_resnet.yaml
is a master configuration file for SwAV training and exists at vissl/configs/config/pretrain/swav/swav_8node_resnet.yaml
.
How to add configuration files for new SSL approaches¶
Let’s say you have a new self-supervision approach that you implemented in VISSL and want to create config files for training. You can simply create a new folder and config file for your approach.
For example:
python tools/run_distributed_engines.py \
config=pretrain/my_new_approach/my_approach_config_file.yaml
In the above case, we are simply
creating the my_new_approach
folder under pretrain/
path and create a file my_approach_config_file.yaml
with the path pretrain/my_new_approach/my_approach_config_file.yaml
How to override a training component with config files¶
To replace one training component with the other, for example, replacing the training datasets, one can achieve this by simply creating a new yaml file for the dataset and use that during training.
For example:
python tools/run_distributed_engines.py \
config=pretrain/swav/swav_8node_resnet \
+config/pretrain/swav/optimization=my_new_optimization \
+config/pretrain/swav/my_new_dataset=my_new_dataset_file_name \
In the above case, we are overriding optimization and data settings for the SwAV training. For overriding, we simply
create the my_new_dataset
sub-folder under pretrain/swav
path and create a file my_new_dataset_file_name.yaml
with the path pretrain/swav/my_new_dataset_file_name.yaml
How to override single values in config files¶
If you want to override single value of an existing key in the config, you can achieve that with: my_key=my_new_value
For example:
python tools/run_distributed_engines.py \
config=pretrain/swav/swav_8node_resnet \
config.MODEL.WEIGHTS_INIT.PARAMS_FILE=<my_weights_path.torch>
How to add new keys to the dictionary in config files¶
If you want to add single key to a dictionary in the config, you can achieve that with +my_new_key_name=my_value
. Note the use of +
.
For example:
python tools/run_distributed_engines.py \
config=pretrain/swav/swav_8node_resnet \
+config.MY_NEW_KEY=MY_VALUE \
+config.LOSS.simclr_info_nce_loss.MY_NEW_KEY=MY_VALUE
Using Tensorboard in VISSL¶
VISSL provides integration of Tensorboard to facilitate self-supervised training and experimentation. VISSL logs many useful scalars and non-scalars to Tensorboard that provide useful insights into an ongoing training.:
Scalars:
Training Loss
Learning Rate
Average Training iteration time
Batch size per gpu
Number of images per sec per gpu
Training ETA
GPU memory used
Peak GPU memory allocated
Non-scalars:
Model parameters (at the start of every epoch and/or after N iterations)
Model parameter gradients (at the start of every epoch and/or after N iterations)
How to use Tensorboard in VISSL¶
Using Tensorboard is very easy in VISSL and can be achieved by setting some configuration options. User needs to set TENSORBOARD_SETUP.USE_TENSORBOARD=true
and adjust the values of other config parameters as desired. Full set of
parameters exposed by VISSL for Tensorboard:
HOOKS:
TENSORBOARD_SETUP:
# whether to use tensorboard for the visualization
USE_TENSORBOARD: False
# log directory for tensorboard events
LOG_DIR: "."
EXPERIMENT_LOG_DIR: "tensorboard"
# flush logs every n minutes
FLUSH_EVERY_N_MIN: 5
# whether to log the model parameters to tensorboard
LOG_PARAMS: True
# whether ttp log the model parameters gradients to tensorboard
LOG_PARAMS_GRADIENTS: True
# if we want to log the model parameters every few iterations, set the iteration
# frequency. -1 means the params will be logged only at the end of epochs.
LOG_PARAMS_EVERY_N_ITERS: 310
Note
Please install tensorboard manually: if pip environment: pip install tensorboard
or if using conda and you prefer conda install of tensorboard: conda install -c conda-forge tensorboard
.
Example usage¶
For example, to use Tensorboard during SwAV training, the command would look like:
python tools/run_distributed_engines.py config=pretrain/swav/swav_8node_resnet \
config.TENSORBOARD_SETUP.USE_TENSORBOARD=true \
config.TENSORBOARD_SETUP.LOG_PARAMS=true \
config.TENSORBOARD_SETUP.LOG_PARAMS_GRADIENTS=true \
config.TENSORBOARD_SETUP.LOG_DIR=/tmp/swav_tensorboard_events/
Compatibility with Other Libraries¶
VISSL provides several helpful scripts to convert VISSL models to models that are compatible with other libraries like Detectron2 and ClassyVision compatible models.
VISSL also provides scripts to convert models from other sources like Caffe2 models in the paper to VISSL compatible models.
TorchVision models are directly compatible with VISSL and don’t require any conversion.
Converting Models VISSL -> {Detectron2, ClassyVision, TorchVision}¶
We provide scripts to convert VISSL models to Detectron2 and ClassyVision compatible models.
Converting to Detectron2¶
All the ResNe(X)t models in VISSL can be converted to Detectron2 weights using following command:
python extra_scripts/convert_vissl_to_detectron2.py \
--input_model_file <input_model>.pth \
--output_model <d2_model>.torch \
--weights_type torch \
--state_dict_key_name classy_state_dict
Converting to ClassyVision¶
All the ResNe(X)t models in VISSL can be converted to Detectron2 weights using following command:
python extra_scripts/convert_vissl_to_classy_vision.py \
--input_model_file <input_model>.pth \
--output_model <d2_model>.torch \
--state_dict_key_name classy_state_dict
Converting to TorchVision¶
All the ResNe(X)t models in VISSL can be converted to Torchvision weights using following command:
python extra_scripts/convert_vissl_to_torchvision.py \
--model_url_or_file <input_model>.pth \
--output_dir /path/to/output/dir/ \
--output_name <my_converted_model>.torch
Converting Caffe2 models -> VISSL¶
We provide conversion of all the Caffe2 models in the paper.
ResNet-50 models to VISSL¶
Jigsaw model:
python extra_scripts/convert_caffe2_to_torchvision_resnet.py \
--c2_model <model>.pkl \
--output_model <pth_model>.torch \
--jigsaw True --bgr2rgb True
Colorization model:
python extra_scripts/convert_caffe2_to_torchvision_resnet.py \
--c2_model <model>.pkl \
--output_model <pth_model>.torch \
--bgr2rgb False
Supervised model:
python extra_scripts/convert_caffe2_to_pytorch_rn50.py \
--c2_model <model>.pkl \
--output_model <pth_model>.torch \
--bgr2rgb True
AlexNet models to VISSL¶
AlexNet Jigsaw models:
python extra_scripts/convert_caffe2_to_vissl_alexnet.py \
--weights_type caffe2 \
--model_name jigsaw \
--bgr2rgb True \
--input_model_weights <model.pkl> \
--output_model <pth_model>.torch
AlexNet Colorization models:
python extra_scripts/convert_caffe2_to_vissl_alexnet.py \
--weights_type caffe2 \
--model_name colorization \
--input_model_weights <model.pkl> \
--output_model <pth_model>.torch
AlexNet Supervised models:
python extra_scripts/convert_caffe2_to_vissl_alexnet.py \
--weights_type caffe2 \
--model_name supervised \
--bgr2rgb True \
--input_model_weights <model.pkl> \
--output_model <pth_model>.torch
Converting Models ClassyVision -> VISSL¶
We provide scripts to convert ClassyVision models to VISSL compatible models.
python extra_scripts/convert_classy_vision_to_vissl_resnet.py \
--input_model_file <input_model>.pth \
--output_model <d2_model>.torch \
--depth 50
Converting Official RotNet and DeepCluster models -> VISSL¶
AlexNet RotNet model:
python extra_scripts/convert_caffe2_to_vissl_alexnet.py \
--weights_type torch \
--model_name rotnet \
--input_model_weights <model> \
--output_model <pth_model>.torch
AlexNet DeepCluster model:
python extra_scripts/convert_alexnet_models.py \
--weights_type torch \
--model_name deepcluster \
--input_model_weights <model> \
--output_model <pth_model>.torch
Contributing to VISSL¶
We want to make contributing to this project as easy and transparent as possible.
Our Development Process¶
Minor changes and improvements will be released on an ongoing basis. Larger changes (e.g., changesets implementing a new SSL approach, benchmark, new scaling feature etc) will be released on a more periodic basis.
Issues¶
We use GitHub issues to track public bugs and questions. Please make sure to follow one of the issue templates when reporting any issues.
Facebook has a bounty program for the safe disclosure of security bugs. In those cases, please go through the process outlined on that page and do not file a public issue.
Pull Requests¶
We actively welcome your pull requests.
However, if you’re adding any significant features (e.g. > 50 lines), please make sure to have a corresponding issue to discuss your motivation and proposals, before sending a PR. We do not always accept new features, and we take the following factors into consideration:
Whether the same feature can be achieved without modifying VISSL. VISSL is designed to be extensible so that it’s easy to extend any modular component and train custom models. If some part is not as extensible, you can also bring up the issue to make it more extensible.
Whether the feature is potentially useful to a large audience, or only to a small portion of users.
Whether the proposed solution has a good design / interface.
Whether the proposed solution adds extra mental/practical overhead to users who don’t need such feature.
Whether the proposed solution breaks existing APIs.
When sending a PR, please do:
Fork the repo and create your branch from
master
.If a PR contains multiple orthogonal changes, split it to several PRs.
If you’ve added code that should be tested, add tests.
If you’ve changed APIs, update the documentation.
Ensure the test suite passes. Follow cpu test instructions and integration tests.
Make sure your code follows our coding practices (see next section).
If you haven’t already, complete the Contributor License Agreement (“CLA”).
Coding Style¶
Please follow our coding practices and choose either option to properly format your code before submitting PRs.
Contact¶
If you want to contact the team about something else than code, like an idea for collaboration, drop us an email at vissl@fb.com.
Training: Step-by-step execution¶
We demonstrate step-by-step execution of how training works in VISSL in the following flowchart.

Feature Extraction: Step-by-step execution¶
We demonstrate step-by-step how feature extraction happens in VISSL in the following flowchart.

Benchmark on VOC07: Step-by-step execution¶
We demonstrate step-by-step execution of SVM training benchmark in VISSL in the following flowchart.

Nearest Neighbor Benchmark: Step-by-step execution¶
We demonstrate step-by-step execution of Nearest Neighbor benchmark in VISSL in the following flowchart.

Train models on CPU¶
VISSL supports training any model on CPUs. Typically, this involves correctly setting the MACHINE.DEVICE=cpu
and adjusting the distributed settings accordingly. For example, the config settings will look like:
MACHINE:
DEVICE: cpu
DISTRIBUTED:
BACKEND: gloo # set to "gloo" for cpu only trianing
NUM_NODES: 1 # no change needed
NUM_PROC_PER_NODE: 2 # user sets this to number of gpus to use
INIT_METHOD: tcp # set to "file" if desired
RUN_ID: auto # Set to file_path if using file method. No change needed for tcp and a free port on machine is automatically detected.
Train anything on 1-gpu¶
If you have a configuration file (any vissl compatible file) for any training, that you want to run on 1-gpu only (for example: train SimCLR on 1 gpu, etc), you don’t need to modify the config file. VISSL provides a helper script that takes care of all the adjustments.
This can facilitate debugging by allowing users to insert pdb
in their code. VISSL also takes care of auto-scaling the Learning rate for various schedules (cosine, multistep, step etc.) if you have enabled the auto_scaling (see config.OPTIMIZER.param_schedulers.lr.auto_lr_scaling
). You can simply achieve this by using the low_resource_1gpu_train_wrapper.sh
script. An example usage:
cd $HOME/vissl
./dev/low_resource_1gpu_train_wrapper.sh config=test/integration_test/quick_swav
Train on SLURM cluster¶
VISSL supports SLURM by default for training models. VISSL code automatically detects if the training environment is SLURM based on SLURM environment variables like SLURM_NODEID
, SLURMD_NODENAME
, SLURM_STEP_NODELIST
.
VISSL also provides a helper bash script dev/launch_slurm.sh that allows launching a given training on SLURM. Users can modify this script to meet their needs.
The bash script takes the following inputs:
# number of machines to distribute training on
NODES=${NODES-1}
# number of gpus per machine to use for training
NUM_GPU=${NUM_GPU-8}
# gpus type: P100 | V100 | V100_32G etc. User should set this based on their machine
GPU_TYPE=${GPU_TYPE-V100}
# name of the training. for example: simclr_2node_resnet50_in1k. This is helpful to clearly recognize the training
EXPT_NAME=${EXPT_NAME}
# how much CPU memory to use
MEM=${MEM-250g}
# number of CPUs used for each trainer (i.e. each gpu)
CPU=${CPU-8}
# directory where all the training artifacts like checkpoints etc will be written
OUTPUT_DIR=${OUTPUT_DIR}
# partition of the cluster on which training should run. User should determine this parameter for their cluster
PARTITION=${PARTITION-learnfair}
# any helpful comment that slurm dashboard can display
COMMENT=${COMMENT-vissl_training}
GITHUB_REPO=${GITHUB_REPO-vissl}
# what branch of VISSL should be used. specify your custom branch
BRANCH=${BRANCH-master}
# automatically determined and used for distributed training.
# each training run must have a unique id and vissl defaults to date
RUN_ID=$(date +'%Y%m%d')
# number of dataloader workers to use per gpu
NUM_DATA_WORKERS=${NUM_DATA_WORKERS-8}
# multi-processing method to use in PyTorch. Options: forkserver | fork | spawn
MULTI_PROCESSING_METHOD=${MULTI_PROCESSING_METHOD-forkserver}
# specify the training configuration to run. For example: to train swav for 100epochs
# config=pretrain/swav/swav_8node_resnet config.OPTIMIZER.num_epochs=100
CFG=( "$@" )
To run the script for training SwAV on 8 machines where each machine has 8-gpus and for 100epochs, the script can be run as:
cd $HOME/vissl && NODES=8 \
NUM_GPU=8 \
GPU_TYPE=V100 \
MEM=200g \
CPU=8 \
EXPT_NAME=swav_100ep_rn50_in1k \
OUTPUT_DIR=/tmp/swav/ \
PARTITION=learnfair \
BRANCH=master \
NUM_DATA_WORKERS=4 \
MULTI_PROCESSING_METHOD=forkserver \
./dev/launch_slurm.sh \
config=pretrain/swav/swav_8node_resnet config.OPTIMIZER.num_epochs=100
Train RotNet model¶
VISSL reproduces the self-supervised approach Unsupervised Representation Learning by Predicting Image Rotations proposed by Spyros Gidaris, Praveer Singh, Nikos Komodakis in https://arxiv.org/abs/1803.07728.
How to train RotNet model¶
VISSL provides yaml configuration file containing the exact hyperparam settings to reproduce the model. VISSL implements all the components including data augmentations, collators etc required for this approach.
To train ResNet-50 model on 8-gpus on ImageNet-1K dataset using 4 rotation angles:
python tools/run_distributed_engines.py config=pretrain/rotnet/rotnet_8gpu_resnet
Training different model architecture¶
VISSL supports many backbone architectures including AlexNet, ResNe(X)ts. Some examples below:
Train AlexNet model
python tools/run_distributed_engines.py config=pretrain/rotnet/rotnet_8gpu_resnet \
config.MODEL.TRUNK.NAME=alexnet_rotnet
Train ResNet-101:
python tools/run_distributed_engines.py config=pretrain/rotnet/rotnet_8gpu_resnet \
config.MODEL.TRUNK.NAME=resnet config.MODEL.TRUNK.TRUNK_PARAMS.RESNETS.DEPTH=101
Vary the number of gpus¶
VISSL makes it extremely easy to vary the number of gpus to be used in training. For example: to train the RotNet model on 4 machines (32gpus) or 1gpu, the changes required are:
Training on 1-gpu:
python tools/run_distributed_engines.py config=pretrain/rotnet/rotnet_8gpu_resnet \
config.DISTRIBUTED.NUM_PROC_PER_NODE=1
Training on 4 machines i.e. 32-gpu:
python tools/run_distributed_engines.py config=pretrain/rotnet/rotnet_8gpu_resnet \
config.DISTRIBUTED.NUM_PROC_PER_NODE=8 config.DISTRIBUTED.NUM_NODES=4
Note
Please adjust the learning rate following ImageNet in 1-Hour if you change the number of gpus.
Pre-trained models¶
See VISSL Model Zoo for the PyTorch pre-trained models with VISSL using RotNet approach and the benchmarks.
Citation¶
@misc{gidaris2018unsupervised,
title={Unsupervised Representation Learning by Predicting Image Rotations},
author={Spyros Gidaris and Praveer Singh and Nikos Komodakis},
year={2018},
eprint={1803.07728},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Train Jigsaw model¶
VISSL reproduces the self-supervised approach Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles proposed by Mehdi Noroozi and Paolo Favaro in https://arxiv.org/abs/1603.09246.
How to train Jigsaw model¶
VISSL provides yaml configuration file containing the exact hyperparam settings to reproduce the model. VISSL implements all the components including data augmentations, collators etc required for this approach.
To train ResNet-50 model on 8-gpus on ImageNet-1K dataset using 2000 permutations.
python tools/run_distributed_engines.py config=pretrain/jigsaw/jigsaw_8gpu_resnet
Training with different permutations¶
In order to adjust the permutations and retrain, you can do so from command line. For example: to train for 10K permutations instead, VISSL provides the configuration files with necessary changes related to 10K permutation. Run:
python tools/run_distributed_engines.py config=pretrain/jigsaw/jigsaw_8gpu_resnet \
+config/pretrain/jigsaw/permutations=perm10K
Similarly, you can train for 100 permutations and create new config files for a different permutations settings following the above configs as examples.
Vary the number of gpus¶
VISSL makes it extremely easy to vary the number of gpus to be used in training. For example: to train the Jigsaw model on 4 machines (32gpus) or 1gpu, the changes required are:
Training on 1-gpu:
python tools/run_distributed_engines.py config=pretrain/jigsaw/jigsaw_8gpu_resnet \
config.DISTRIBUTED.NUM_PROC_PER_NODE=1
Training on 4 machines i.e. 32-gpu:
python tools/run_distributed_engines.py config=pretrain/jigsaw/jigsaw_8gpu_resnet \
config.DISTRIBUTED.NUM_PROC_PER_NODE=8 config.DISTRIBUTED.NUM_NODES=4
Note
Please adjust the learning rate following ImageNet in 1-Hour if you change the number of gpus.
Pre-trained models¶
See VISSL Model Zoo for the PyTorch pre-trained models with VISSL using Jigsaw approach and the benchmarks.
Permutations¶
Following Goyal et al we use the exact permutation files for Jigsaw training available here and refer users to directly use the files from the above source.
Citation¶
@misc{noroozi2017unsupervised,
title={Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles},
author={Mehdi Noroozi and Paolo Favaro},
year={2017},
eprint={1603.09246},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Train NPID (and NPID++) model¶
VISSL reproduces the self-supervised approach Unsupervised Feature Learning via Non-Parametric Instance Discrimination proposed by Zhirong Wu, Yuanjun Xiong, Stella Yu, Dahua Lin in this paper. The NPID baselines were improved further by Misra et. al in Self-Supervised Learning of Pretext-Invariant Representations proposed in this paper.
How to train NPID model¶
VISSL provides yaml configuration file containing the exact hyperparam settings to reproduce the model. VISSL implements all the components including loss, data augmentations, collators etc required for this approach.
To train ResNet-50 model on 8-gpus on ImageNet-1K dataset with NPID approach using 4,096 negatives selected randomly and feature projection dimension 128:
python tools/run_distributed_engines.py config=pretrain/npid/npid_8gpu_resnet
How to Train NPID++ model¶
To train the NPID++ baselines with a ResNet-50 on ImageNet with 32000 negatives, 800 epochs and 4 machines (32-gpus) as in the PIRL paper:
python tools/run_distributed_engines.py config=pretrain/npid/npid++_4nodes_resnet
Vary the training loss settings¶
Users can adjust several settings from command line to train the model with different hyperparams. For example: to use a different momentum value (say 0.99) for memory and different temperature 0.05 for logits, using 16000 negatives, the NPID training command would look like:
python tools/run_distributed_engines.py config=pretrain/npid/npid_8gpu_resnet \
config.LOSS.nce_loss_with_memory.temperature=0.05 \
config.LOSS.nce_loss_with_memory.memory_params.momentum=0.99 \
config.LOSS.nce_loss_with_memory.negative_sampling_params.num_negatives=16000
The full set of loss params that VISSL allows modifying:
nce_loss_with_memory:
# setting below to "cross_entropy" yields the InfoNCE loss
loss_type: "nce"
norm_embedding: True
temperature: 0.07
# if the NCE loss is computed between multiple pairs, we can set a loss weight per term
# can be used to weight different pair contributions differently.
loss_weights: [1.0]
norm_constant: -1
update_mem_with_emb_index: -100
negative_sampling_params:
num_negatives: 16000
type: "random"
memory_params:
memory_size: -1
embedding_dim: 128
momentum: 0.5
norm_init: True
update_mem_on_forward: True
# following parameters are auto-filled before the loss is created.
num_train_samples: -1 # @auto-filled
Training different model architecture¶
VISSL supports many backbone architectures including AlexNet, ResNe(X)ts. Some examples below:
Train ResNet-101:
python tools/run_distributed_engines.py config=pretrain/npid/npid_8gpu_resnet \
config.MODEL.TRUNK.NAME=resnet config.MODEL.TRUNK.TRUNK_PARAMS.RESNETS.DEPTH=101
Vary the number of gpus¶
VISSL makes it extremely easy to vary the number of gpus to be used in training. For example: to train the NPID model on 4 machines (32gpus) or 1gpu, the changes required are:
Training on 1-gpu:
python tools/run_distributed_engines.py config=pretrain/npid/npid_8gpu_resnet \
config.DISTRIBUTED.NUM_PROC_PER_NODE=1
Training on 4 machines i.e. 32-gpu:
python tools/run_distributed_engines.py config=pretrain/npid/npid_8gpu_resnet \
config.DISTRIBUTED.NUM_PROC_PER_NODE=8 config.DISTRIBUTED.NUM_NODES=4
Note
Please adjust the learning rate following ImageNet in 1-Hour if you change the number of gpus.
Pre-trained models¶
See VISSL Model Zoo for the PyTorch pre-trained models with VISSL using NPID and NPID++ approach and the benchmarks.
Citations¶
NPID
@misc{wu2018unsupervised,
title={Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination},
author={Zhirong Wu and Yuanjun Xiong and Stella Yu and Dahua Lin},
year={2018},
eprint={1805.01978},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
NPID++
@misc{misra2019selfsupervised,
title={Self-Supervised Learning of Pretext-Invariant Representations},
author={Ishan Misra and Laurens van der Maaten},
year={2019},
eprint={1912.01991},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Train ClusterFit model¶
VISSL reproduces the self-supervised approach ClusterFit: Improving Generalization of Visual Representations proposed by Xueting Yan, Ishan Misra, Abhinav Gupta, Deepti Ghadiyaram, Dhruv Mahajan in this paper.
How to train ClusterFit model¶
VISSL provides yaml configuration file containing the exact hyperparam settings to reproduce the model. VISSL implements all the components including data augmentations, collators etc required for this approach.
ClusterFit approach involves 2 steps:
Step1: Using a pre-trained model (could be trained any way), the features are extracted on the training dataset (like ImageNet). The extracted features are clustered via k-means into N clusters (for example: 16000 clusters). For faster clustering, libraries like FAISS can be used (supported in VISSL). The cluster centroids are treated as the labels for the images and used for training in the next step.
Step2: The model is trained (scratch initialization) but using the labels generated in Step 1.
To train ResNet-50 model on 8-gpus on ImageNet-1K dataset and using RotNet model to extract features:
# Step1: Extract features
python tools/run_distributed_engines.py config=pretrain/clusterfit/cluster_features_resnet_8gpu_rotation_in1k \
config.MODEL.WEIGHTS_INIT.PARAMS_FILE=<vissl_compatible_weights.torch>
# Step2: Train clusterFit model
python tools/run_distributed_engines.py config=pretrain/clusterfit/clusterfit_resnet_8gpu_imagenet \
config.DATA.TRAIN.LABEL_PATHS=[<labels_file_from_step1.npy>]
The full set of hyperparams supported by VISSL for ClusterFit Step-1 include:
CLUSTERFIT:
NUM_CLUSTERS: 16000
# currently we only support faiss backend for clustering.
CLUSTER_BACKEND: faiss
# how many iterations to use for faiss
N_ITER: 50
FEATURES:
DATA_PARTITION: TRAIN
DATASET_NAME: imagenet1k
LAYER_NAME: res5
How to use other pre-trained models in VISSL¶
VISSL supports Torchvision models out of the box. Generally, for loading any non-VISSL model, one needs to correctly set the following configuration options:
WEIGHTS_INIT:
# path to the .torch weights files
PARAMS_FILE: ""
# name of the state dict. checkpoint = {"classy_state_dict": {layername:value}}. Options:
# 1. classy_state_dict - if model is trained and checkpointed with VISSL.
# checkpoint = {"classy_state_dict": {layername:value}}
# 2. "" - if the model_file is not a nested dictionary for model weights i.e.
# checkpoint = {layername:value}
# 3. key name that your model checkpoint uses for state_dict key name.
# checkpoint = {"your_key_name": {layername:value}}
STATE_DICT_KEY_NAME: "classy_state_dict"
# specify what layer should not be loaded. Layer names with this key are not copied
# By default, set to BatchNorm stats "num_batches_tracked" to be skipped.
SKIP_LAYERS: ["num_batches_tracked"]
####### If loading a non-VISSL trained model, set the following two args carefully #########
# to make the checkpoint compatible with VISSL, if you need to remove some names
# from the checkpoint keys, specify the name
REMOVE_PREFIX: ""
# In order to load the model (if not trained with VISSL) with VISSL, there are 2 scenarios:
# 1. If you are interested in evaluating the model features and freeze the trunk.
# Set APPEND_PREFIX="trunk.base_model." This assumes that your model is compatible
# with the VISSL trunks. The VISSL trunks start with "_feature_blocks." prefix. If
# your model doesn't have these prefix you can append them. For example:
# For TorchVision ResNet trunk, set APPEND_PREFIX="trunk.base_model._feature_blocks."
# 2. where you want to load the model simply and finetune the full model.
# Set APPEND_PREFIX="trunk."
# This assumes that your model is compatible with the VISSL trunks. The VISSL
# trunks start with "_feature_blocks." prefix. If your model doesn't have these
# prefix you can append them.
# For TorchVision ResNet trunk, set APPEND_PREFIX="trunk._feature_blocks."
# NOTE: the prefix is appended to all the layers in the model
APPEND_PREFIX: ""
Vary the number of gpus¶
VISSL makes it extremely easy to vary the number of gpus to be used in training. For example: to train the RotNet model on 4 machines (32gpus) or 1gpu, the changes required are:
Training on 1-gpu:
python tools/run_distributed_engines.py config=pretrain/rotnet/rotnet_8gpu_resnet config.DISTRIBUTED.NUM_PROC_PER_NODE=1
Training on 4 machines i.e. 32-gpu:
python tools/run_distributed_engines.py config=pretrain/rotnet/rotnet_8gpu_resnet config.DISTRIBUTED.NUM_PROC_PER_NODE=8 config.DISTRIBUTED.NUM_NODES=4
Note
Please adjust the learning rate following ImageNet in 1-Hour if you change the number of gpus.
Pre-trained models¶
See VISSL Model Zoo for the PyTorch pre-trained models with VISSL using RotNet approach and the benchmarks.
Citation¶
@misc{yan2019clusterfit,
title={ClusterFit: Improving Generalization of Visual Representations},
author={Xueting Yan and Ishan Misra and Abhinav Gupta and Deepti Ghadiyaram and Dhruv Mahajan},
year={2019},
eprint={1912.03330},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Train PIRL model¶
Author: imisra@fb.com
VISSL reproduces the self-supervised approach Self-Supervised Learning of Pretext-Invariant Representations proposed by Ishan Misra and Laurens van der Maaten in this paper.
How to train PIRL model¶
VISSL provides yaml configuration file containing the exact hyperparam settings to reproduce the model. VISSL implements all the components including loss, data augmentations, collators etc required for this approach.
To train ResNet-50 model on 4-machines (8-nodes) on ImageNet-1K dataset with PIRL approach using 32,000 negatives selected randomly and feature projection dimension 128:
python tools/run_distributed_engines.py config=pretrain/pirl/pirl_jigsaw_4node_resnet50
Vary the training loss settings¶
Users can adjust several settings from command line to train the model with different hyperparams. For example: to use a different momentum value (say 0.99) for memory and different temperature 0.05 for logits, using 16000 negatives, the NPID training command would look like:
python tools/run_distributed_engines.py config=pretrain/npid/npid_8gpu_resnet \
config.LOSS.nce_loss_with_memory.temperature=0.05 \
config.LOSS.nce_loss_with_memory.memory_params.momentum=0.99 \
config.LOSS.nce_loss_with_memory.negative_sampling_params.num_negatives=16000
The full set of loss params that VISSL allows modifying:
nce_loss_with_memory:
# setting below to "cross_entropy" yields the InfoNCE loss
loss_type: "nce"
norm_embedding: True
temperature: 0.07
# if the NCE loss is computed between multiple pairs, we can set a loss weight per term
# can be used to weight different pair contributions differently.
loss_weights: [1.0]
norm_constant: -1
update_mem_with_emb_index: -100
negative_sampling_params:
num_negatives: 16000
type: "random"
memory_params:
memory_size: -1
embedding_dim: 128
momentum: 0.5
norm_init: True
update_mem_on_forward: True
# following parameters are auto-filled before the loss is created.
num_train_samples: -1 # @auto-filled
Training different model architecture¶
VISSL supports many backbone architectures including ResNe(X)ts, wider ResNets. Some examples below:
Train ResNet-101:
python tools/run_distributed_engines.py config=pretrain/pirl/pirl_jigsaw_4node_resnet50 \
config.MODEL.TRUNK.NAME=resnet config.MODEL.TRUNK.TRUNK_PARAMS.RESNETS.DEPTH=101
Train ResNet-50-w2 (2x wider):
python tools/run_distributed_engines.py config=pretrain/pirl/pirl_jigsaw_4node_resnet50 \
config.MODEL.TRUNK.NAME=resnet config.MODEL.TRUNK.TRUNK_PARAMS.RESNETS.DEPTH=101 \
config.MODEL.TRUNK.TRUNK_PARAMS.RESNETS.WIDTH_MULTIPLIER=2
Training with Gaussian Blur augmentation¶
Gaussian Blur augmentation has being a crucial transformation for better performance in approaches like SimCLR, SwAV, etc. The original PIRL method didn’t use Gaussian Blur augmentation however PIRL author (imisra@fb.com) provide configuration for how to use the Gaussian Blur for training PIRL models. The command to run:
python tools/run_distributed_engines.py config=pretrain/pirl/pirl_jigsaw_4node_resnet50 \
+config/pretrain/pirl/transforms=photo_gblur
Please consult the photo_gblur.yaml config for the transformation composition.
Training with MLP head¶
Recent self-supervised approaches like SimCLR, MoCo, SwAV have benefitted significantly from using an MLP head. Original PIRL work didn’t use MLP head but PIRL author (imisra@fb.com) provide configuration for using MLP head in PIRL and also open source the models. The command to run:
python tools/run_distributed_engines.py config=pretrain/pirl/pirl_jigsaw_4node_resnet50 \
+config/pretrain/pirl/models=resnet50_mlphead
Similarly, to train a ResNet-50-w2 (ie. 2x wider ResNet-50) with PIRL using MLP head:
python tools/run_distributed_engines.py config=pretrain/pirl/pirl_jigsaw_4node_resnet50 \
+config/pretrain/pirl/models=resnet50_w2_mlphead
Similarly, to train a ResNet-50-w4 (ie. 4x wider ResNet-50) with PIRL using MLP head:
python tools/run_distributed_engines.py config=pretrain/pirl/pirl_jigsaw_4node_resnet50 \
+config/pretrain/pirl/models=resnet50_w4_mlphead
Vary the number of epochs¶
In order to vary the number of epochs to use for training PIRL models, one can achieve this simply from command line. For example, to train the PIRL model for 100 epochs instead, pass the num_epochs parameter from command line:
python tools/run_distributed_engines.py config=pretrain/pirl/pirl_jigsaw_4node_resnet50 \
config.OPTIMIZER.num_epochs=100
Vary the number of gpus¶
VISSL makes it extremely easy to vary the number of gpus to be used in training. For example: to train the PIRL model on 8-gpus or 1gpu, the changes required are:
Training on 1-gpu:
python tools/run_distributed_engines.py config=pretrain/pirl/pirl_jigsaw_4node_resnet50 \
config.DISTRIBUTED.NUM_PROC_PER_NODE=1 config.DISTRIBUTED.NUM_NODES=1
Training on 8-gpus:
python tools/run_distributed_engines.py config=pretrain/pirl/pirl_jigsaw_4node_resnet50 \
config.DISTRIBUTED.NUM_PROC_PER_NODE=8 config.DISTRIBUTED.NUM_NODES=1
Note
Please adjust the learning rate following ImageNet in 1-Hour if you change the number of gpus.
Pre-trained models¶
See VISSL Model Zoo for the PyTorch pre-trained models with VISSL for PIRL and the benchmarks.
Citations¶
@misc{misra2019selfsupervised,
title={Self-Supervised Learning of Pretext-Invariant Representations},
author={Ishan Misra and Laurens van der Maaten},
year={2019},
eprint={1912.01991},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Train SimCLR model¶
VISSL reproduces the self-supervised approach A Simple Framework for Contrastive Learning of Visual Representations proposed by Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton in this paper.
How to train SimCLR model¶
VISSL provides yaml configuration file containing the exact hyperparam settings to reproduce the model. VISSL implements all the components including loss, data augmentations, collators etc required for this approach.
To train ResNet-50 model on 4-machines (8-nodes) on ImageNet-1K dataset with SimCLR approach using MLP-head, loss temperature of 0.1 and feature projection dimension 128:
python tools/run_distributed_engines.py config=pretrain/simclr/simclr_8node_resnet
Using Synchronized BatchNorm for training¶
For training SimCLR models, we convert all the BatchNorm layers to Global BatchNorm. For this, VISSL supports PyTorch SyncBatchNorm
module and NVIDIA’s Apex SyncBatchNorm layers. Set the config params MODEL.SYNC_BN_CONFIG.SYNC_BN_TYPE
to apex
or pytorch
.
If you want to use Apex, VISSL provides anaconda
and pip
packages of Apex (compiled with Optimzed C++ extensions/CUDA kernels). The Apex
packages are provided for all versions of CUDA (9.2, 10.0, 10.1, 10.2, 11.0), PyTorch >= 1.4 and Python >=3.6 and <=3.9
.
To use SyncBN during training, one needs to set the following parameters in configuration file:
MODEL:
SYNC_BN_CONFIG:
CONVERT_BN_TO_SYNC_BN: True
SYNC_BN_TYPE: apex
# 1) if group_size=-1 -> use the VISSL default setting. We synchronize within a
# machine and hence will set group_size=num_gpus per node. This gives the best
# speedup.
# 2) if group_size>0 -> will set group_size=value set by user.
# 3) if group_size=0 -> no groups are created and process_group=None. This means
# global sync is done.
GROUP_SIZE: 8
Using LARC for training¶
SimCLR training uses LARC from NVIDIA’s Apex LARC. To use LARC, users need to set config option
OPTIMIZER.use_larc=True
. VISSL exposed LARC parameters that users can tune. Full list of LARC parameters exposed by VISSL:
OPTIMIZER:
name: "sgd"
use_larc: False # supported for SGD only for now
larc_config:
clip: False
eps: 1e-08
trust_coefficient: 0.001
Note
LARC is currently supported for SGD optimizer only.
Vary the training loss settings¶
Users can adjust several settings from command line to train the model with different hyperparams. For example: to use a different temperature 0.2 for logits and different output projection dimension of 256:
python tools/run_distributed_engines.py config=pretrain/simclr/simclr_8node_resnet \
config.LOSS.simclr_info_nce_loss.temperature=0.2 \
config.LOSS.simclr_info_nce_loss.buffer_params.embedding_dim=256
The full set of loss params that VISSL allows modifying:
simclr_info_nce_loss:
temperature: 0.1
buffer_params:
embedding_dim: 128
world_size: 64 # automatically inferred
effective_batch_size: 4096 # automatically inferred
Training different model architecture¶
VISSL supports many backbone architectures including ResNe(X)ts, wider ResNets. Some examples below:
Train ResNet-101:
python tools/run_distributed_engines.py config=pretrain/simclr/simclr_8node_resnet \
config.MODEL.TRUNK.NAME=resnet config.MODEL.TRUNK.TRUNK_PARAMS.RESNETS.DEPTH=101
Train ResNet-50-w2 (2x wider):
python tools/run_distributed_engines.py config=pretrain/simclr/simclr_8node_resnet \
config.MODEL.TRUNK.NAME=resnet config.MODEL.TRUNK.TRUNK_PARAMS.RESNETS.DEPTH=101 \
config.MODEL.TRUNK.TRUNK_PARAMS.RESNETS.WIDTH_MULTIPLIER=2
Training with Multi-Crop data augmentation¶
The original SimCLR approach is proposed for 2 positives per image. We expand the SimCLR approach to work for more positives following the multi-crop augmentation proposed in SwAV paper. See SwAV paper https://arxiv.org/abs/2006.09882 for the multi-crop augmentation details.
Multi-crop augmentation can allow using more positives and also positives of different resolutions for SimCLR. VISSL provides
a version of SimCLR loss for multi-crop training multicrop_simclr_info_nce_loss
. In order to train SimCLR with multi-crop
augmentation say crops 2x160 + 4x96
i.e. 2 crops of resolution 160 and 4 crops of resolution 96, the training command looks like:
python tools/run_distributed_engines.py config=pretrain/simclr/simclr_8node_resnet \
+config/pretrain/simclr/transforms=multicrop_2x160_4x96
The multicrop_2x160_4x96.yaml
configuration file changes 2 things:
Transforms: Simply replace the
ImgReplicatePil
transform (which creates 2 copies of image) withImgPilToMultiCrop
which creates multi-crops of multiple resolutions.Loss: Use the loss
multicrop_simclr_info_nce_loss
instead which inherits fromsimclr_info_nce_loss
and modifies the loss to work for multi-crop input.
Varying the multi-crop augmentation settings¶
VISSL allows modifying the crops to use. Full settings exposed:
TRANSFORMS:
- name: ImgPilToMultiCrop
total_num_crops: 6 # Total number of crops to extract
num_crops: [2, 4] # Specifies the number of type of crops.
size_crops: [160, 96] # Specifies the height (height = width) of each patch
crop_scales: [[0.08, 1], [0.05, 0.14]] # Scale of the crop
Varying the multi-crop loss settings¶
The full set of loss params that VISSL allows modifying:
multicrop_simclr_info_nce_loss:
temperature: 0.1
num_crops: 2 # automatically inferred from data transforms
buffer_params:
world_size: 64 # automatically inferred
embedding_dim: 128
effective_batch_size: 4096 # automatically inferred
Training with different MLP head¶
Original SimCLR approach used 2-layer MLP head. VISSL allows attaching any different desired head. In order to modify the MLP head (more layers, different dimensions etc), see the following examples:
3-layer MLP head: Use the following head (example for ResNet model)
MODEL:
HEAD:
PARAMS: [
["mlp", {"dims": [2048, 2048], "use_relu": True}],
["mlp", {"dims": [2048, 2048], "use_relu": True}],
["mlp", {"dims": [2048, 128]}],
]
Use 2-layer MLP with hidden dimension 4096: Use the following head (example for ResNet model)
MODEL:
HEAD:
PARAMS: [
["mlp", {"dims": [2048, 4096], "use_relu": True}],
["mlp", {"dims": [4096, 128]}],
]
Vary the number of epochs¶
In order to vary the number of epochs to use for training SimCLR models, one can achieve this simply
from command line. For example, to train the SimCLR model for 100 epochs instead, pass the num_epochs
parameter from command line:
python tools/run_distributed_engines.py config=pretrain/simclr/simclr_8node_resnet \
config.OPTIMIZER.num_epochs=100
Vary the number of gpus¶
VISSL makes it extremely easy to vary the number of gpus to be used in training. For example: to train the SimCLR model on 8-gpus or 1gpu, the changes required are:
Training on 1-gpu:
python tools/run_distributed_engines.py config=pretrain/simclr/simclr_8node_resnet \
config.DISTRIBUTED.NUM_PROC_PER_NODE=1 config.DISTRIBUTED.NUM_NODES=1
Training on 8-gpus:
python tools/run_distributed_engines.py config=pretrain/simclr/simclr_8node_resnet \
config.DISTRIBUTED.NUM_PROC_PER_NODE=8 config.DISTRIBUTED.NUM_NODES=1
Note
Please adjust the learning rate following ImageNet in 1-Hour if you change the number of gpus.
Pre-trained models¶
See VISSL Model Zoo for the PyTorch pre-trained models with VISSL for SimCLR and the benchmarks.
Citations¶
@misc{chen2020simple,
title={A Simple Framework for Contrastive Learning of Visual Representations},
author={Ting Chen and Simon Kornblith and Mohammad Norouzi and Geoffrey Hinton},
year={2020},
eprint={2002.05709},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Train MoCo model¶
Author: lefaudeux@fb.com
VISSL reproduces the self-supervised approach MoCo Momentum Contrast for Unsupervised Visual Representation Learning proposed by Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick in this paper. The MoCo baselines were improved further by Xinlei Chen, Haoqi Fan, Ross Girshick, Kaiming He in “Improved Baselines with Momentum Contrastive Learning” proposed in this paper.
VISSL closely follows the implementation provided by MoCo authors themselves https://github.com/facebookresearch/moco.
How to train MoCo (and MoCo v2 model) model¶
VISSL provides yaml configuration file containing the exact hyperparam settings to reproduce the model. VISSL implements all the components including loss, data augmentations, collators etc required for this approach.
To train ResNet-50 model on 8-gpus on ImageNet-1K dataset with MoCo-v2 approach using feature projection dimension 128:
python tools/run_distributed_engines.py config=pretrain/moco/moco_1node_resnet
By default, VISSL provides configuration file for MoCo-v2 model as this has better baselines numbers. To train MoCo baseline instead, users should make 2 changes to the moco configuration file:
change the
config.DATA.TRAIN.TRANSFORMS
by removing theImgPilGaussianBlur
transform.change the
config.MODEL.HEAD.PARAMS=[["mlp", {"dims": [2048, 128]}]]
i.e. replace the MLP-head with fc-head.
Vary the training loss settings¶
Users can adjust several settings from command line to train the model with different hyperparams. For example: to use a different momentum value (say 0.99) for memory and different temperature 0.5 for logits, the MoCo training command would look like:
python tools/run_distributed_engines.py config=pretrain/moco/moco_1node_resnet \
config.LOSS.moco_loss.temperature=0.5 \
config.LOSS.moco_loss.momentum=0.99
The full set of loss params that VISSL allows modifying:
moco_loss:
embedding_dim: 128
queue_size: 65536
momentum: 0.999
temperature: 0.2
Training different model architecture¶
VISSL supports many backbone architectures including ResNe(X)ts, wider ResNets. Some examples below:
Train ResNet-101:
python tools/run_distributed_engines.py config=pretrain/moco/moco_1node_resnet \
config.MODEL.TRUNK.NAME=resnet config.MODEL.TRUNK.TRUNK_PARAMS.RESNETS.DEPTH=101
Train ResNet-50-w2 (2x wider ResNet-50):
python tools/run_distributed_engines.py config=pretrain/moco/moco_1node_resnet \
config.MODEL.TRUNK.NAME=resnet config.MODEL.TRUNK.TRUNK_PARAMS.RESNETS.DEPTH=50 \
config.MODEL.TRUNK.TRUNK_PARAMS.RESNETS.WIDTH_MULTIPLIER=2
Vary the number of gpus¶
VISSL makes it extremely easy to vary the number of gpus to be used in training. For example: to train the MoCo model on 4 machines (32-gpus) or 1gpu, the changes required are:
Training on 1-gpu:
python tools/run_distributed_engines.py config=pretrain/moco/moco_1node_resnet \
config.DISTRIBUTED.NUM_PROC_PER_NODE=1
Training on 4 machines i.e. 32-gpu:
python tools/run_distributed_engines.py config=pretrain/moco/moco_1node_resnet \
config.DISTRIBUTED.NUM_PROC_PER_NODE=8 config.DISTRIBUTED.NUM_NODES=4
Note
Please adjust the learning rate following ImageNet in 1-Hour if you change the number of gpus. However, MoCo doesn’t work very well with this rule as per the authors in the paper.
Note
If you change the number of gpus for MoCo training, MoCo models require longer training in order to reproduce results. Hence, we recommend users to consult MoCo paper.
Pre-trained models¶
See VISSL Model Zoo for the PyTorch pre-trained models with VISSL using MoCo-v2 approach and the benchmarks.
Citations¶
MoCo
@misc{he2020momentum,
title={Momentum Contrast for Unsupervised Visual Representation Learning},
author={Kaiming He and Haoqi Fan and Yuxin Wu and Saining Xie and Ross Girshick},
year={2020},
eprint={1911.05722},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
MoCo-v2
@misc{chen2020improved,
title={Improved Baselines with Momentum Contrastive Learning},
author={Xinlei Chen and Haoqi Fan and Ross Girshick and Kaiming He},
year={2020},
eprint={2003.04297},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Train DeepCluster V2 model¶
Author: mathilde@fb.com
VISSL reproduces the self-supervised approach called DeepClusterV2
which is an improved version of original DeepCluster approach. The DeepClusterV2 approach was proposed in work Unsupervised learning of visual features by contrasting cluster assignments by
Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin in this paper. DeepClusterV2
combines the benefits of DeepCluster and NPID approaches.
How to train DeepClusterV2 model¶
VISSL provides yaml configuration file containing the exact hyperparam settings to reproduce the model. VISSL implements all the components including loss, data augmentations, collators etc required for this approach.
To train ResNet-50 model on 8-gpus on ImageNet-1K dataset using feature projection dimension 128 for memory:
python tools/run_distributed_engines.py config=pretrain/deepcluster_v2/deepclusterv2_2crops_resnet
Using Synchronized BatchNorm for training¶
For training DeepClusterV2 models, we convert all the BatchNorm layers to Global BatchNorm. For this, VISSL supports PyTorch SyncBatchNorm
module and NVIDIA’s Apex SyncBatchNorm layers. Set the config params MODEL.SYNC_BN_CONFIG.SYNC_BN_TYPE
to apex
or pytorch
.
If you want to use Apex, VISSL provides anaconda
and pip
packages of Apex (compiled with Optimzed C++ extensions/CUDA kernels). The Apex
packages are provided for all versions of CUDA (9.2, 10.0, 10.1, 10.2, 11.0), PyTorch >= 1.4 and Python >=3.6 and <=3.9
.
To use SyncBN during training, one needs to set the following parameters in configuration file:
MODEL:
SYNC_BN_CONFIG:
CONVERT_BN_TO_SYNC_BN: True
SYNC_BN_TYPE: apex
# 1) if group_size=-1 -> use the VISSL default setting. We synchronize within a
# machine and hence will set group_size=num_gpus per node. This gives the best
# speedup.
# 2) if group_size>0 -> will set group_size=value set by user.
# 3) if group_size=0 -> no groups are created and process_group=None. This means
# global sync is done.
GROUP_SIZE: 8
Using Mixed Precision for training¶
DeepClusterV2 approach leverages mixed precision training by default for better training speed and reducing the model memory requirement. For this, we use NVIDIA Apex Library with Apex AMP level O1.
To use Mixed precision training, one needs to set the following parameters in configuration file:
MODEL:
AMP_PARAMS:
USE_AMP: True
# Use O1 as it is robust and stable than O3. If you want to use O3, we recommend
# the following setting:
# {"opt_level": "O3", "keep_batchnorm_fp32": True, "master_weights": True, "loss_scale": "dynamic"}
AMP_ARGS: {"opt_level": "O1"}
Using LARC for training¶
DeepClusterV2 training uses LARC from NVIDIA’s Apex LARC. To use LARC, users need to set config option
OPTIMIZER.use_larc=True
. VISSL exposed LARC parameters that users can tune. Full list of LARC parameters exposed by VISSL:
OPTIMIZER:
name: "sgd"
use_larc: False # supported for SGD only for now
larc_config:
clip: False
eps: 1e-08
trust_coefficient: 0.001
Note
LARC is currently supported for SGD optimizer only.
Vary the training loss settings¶
Users can adjust several settings from command line to train the model with different hyperparams. For example: to use a different temperature 0.2 for logits, projection dimension 256 for the embedding, the training command would look like:
python tools/run_distributed_engines.py config=pretrain/deepcluster_v2/deepclusterv2_2crops_resnet \
config.LOSS.deepclusterv2_loss.temperature=0.2 \
config.LOSS.deepclusterv2_loss.memory_params.embedding_dim=256
The full set of loss params that VISSL allows modifying:
deepclusterv2_loss:
DROP_LAST: True # automatically inferred from DATA.TRAIN.DROP_LAST
BATCHSIZE_PER_REPLICA: 256 # automatically inferred from DATA.TRAIN.BATCHSIZE_PER_REPLICA
num_crops: 2 # automatically inferred from DATA.TRAIN.TRANSFORMS
temperature: 0.1
num_clusters: [3000, 3000, 3000]
kmeans_iters: 10
memory_params:
crops_for_mb: [0]
embedding_dim: 128
# following parameters are auto-filled before the loss is created.
num_train_samples: -1 # automatically inferred
Training different model architecture¶
VISSL supports many backbone architectures including ResNe(X)ts, wider ResNets. Some examples below:
Train ResNet-101:
python tools/run_distributed_engines.py config=pretrain/deepcluster_v2/deepclusterv2_2crops_resnet \
config.MODEL.TRUNK.NAME=resnet config.MODEL.TRUNK.TRUNK_PARAMS.RESNETS.DEPTH=101
Train ResNet-50-w2 (2x wider):
python tools/run_distributed_engines.py config=pretrain/deepcluster_v2/deepclusterv2_2crops_resnet \
config.MODEL.TRUNK.NAME=resnet config.MODEL.TRUNK.TRUNK_PARAMS.RESNETS.DEPTH=101 \
config.MODEL.TRUNK.TRUNK_PARAMS.RESNETS.WIDTH_MULTIPLIER=2
Training with Multi-Crop data augmentation¶
DeepClusterV2 can be trained for for more positives following the multi-crop augmentation proposed in SwAV paper. See SwAV paper https://arxiv.org/abs/2006.09882 for the multi-crop augmentation details.
Multi-crop augmentation can allow using more positives and also positives of different resolutions. In order to train DeepClusterV2 with multi-crop
augmentation say crops 2x160 + 4x96
i.e. 2 crops of resolution 160 and 4 crops of resolution 96, the training command looks like:
python tools/run_distributed_engines.py config=pretrain/deepcluster_v2/deepclusterv2_2crops_resnet \
+config/pretrain/deepcluster_v2/transforms=multicrop_2x160_4x96
The multicrop_2x160_4x96.yaml
configuration file changes the number of crop settings to 6 crops.
Varying the multi-crop augmentation settings¶
VISSL allows modifying the crops to use. Full settings exposed:
TRANSFORMS:
- name: ImgPilToMultiCrop
total_num_crops: 6 # Total number of crops to extract
num_crops: [2, 4] # Specifies the number of type of crops.
size_crops: [160, 96] # Specifies the height (height = width) of each patch
crop_scales: [[0.08, 1], [0.05, 0.14]] # Scale of the crop
Training with different MLP head¶
By default, the original DeepClusterV2 approach used the 2-layer MLP-head similar to SimCLR approach. VISSL allows attaching any different desired head. In order to modify the MLP head (more layers, different dimensions etc), see the following examples:
3-layer MLP head: Use the following head (example for ResNet model)
MODEL:
HEAD:
PARAMS: [
["mlp", {"dims": [2048, 2048], "use_relu": True}],
["mlp", {"dims": [2048, 2048], "use_relu": True}],
["mlp", {"dims": [2048, 128]}],
]
Use 2-layer MLP with hidden dimension 4096: Use the following head (example for ResNet model)
MODEL:
HEAD:
PARAMS: [
["mlp", {"dims": [2048, 4096], "use_relu": True}],
["mlp", {"dims": [4096, 128]}],
]
Vary the number of epochs¶
In order to vary the number of epochs to use for training DeepClusterV2 models, one can achieve this simply
from command line. For example, to train the DeepClusterV2 model for 100 epochs instead, pass the num_epochs
parameter from command line:
python tools/run_distributed_engines.py config=pretrain/deepcluster_v2/deepclusterv2_2crops_resnet \
config.OPTIMIZER.num_epochs=100
Vary the number of gpus¶
VISSL makes it extremely easy to vary the number of gpus to be used in training. For example: to train the DeepClusterV2 model on 4 machines (32gpus) or 1gpu, the changes required are:
Training on 1-gpu:
python tools/run_distributed_engines.py config=pretrain/deepcluster_v2/deepclusterv2_2crops_resnet \
config.DISTRIBUTED.NUM_PROC_PER_NODE=1 config.DISTRIBUTED.NUM_NODES=1
Training on 4 machines i.e. 32-gpu:
python tools/run_distributed_engines.py config=pretrain/deepcluster_v2/deepclusterv2_2crops_resnet \
config.DISTRIBUTED.NUM_PROC_PER_NODE=8 config.DISTRIBUTED.NUM_NODES=4
Note
Please adjust the learning rate following ImageNet in 1-Hour if you change the number of gpus.
Pre-trained models¶
See VISSL Model Zoo for the PyTorch pre-trained models with VISSL using DeepClusterV2 approach and the benchmarks.
Citations¶
DeepClusterV2
@misc{caron2020unsupervised,
title={Unsupervised Learning of Visual Features by Contrasting Cluster Assignments},
author={Mathilde Caron and Ishan Misra and Julien Mairal and Priya Goyal and Piotr Bojanowski and Armand Joulin},
year={2020},
eprint={2006.09882},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Train SwAV model¶
Author: mathilde@fb.com
VISSL reproduces the self-supervised approach called SwAV
Unsupervised learning of visual features by contrasting cluster assignments which was proposed by
Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin in this paper. SwAV clusters the features while enforcing consistency between
cluster assignments produced for different augmentations (or “views”) of the same image, instead of comparing features directly as in contrastive learning.
How to train SwAV model¶
VISSL provides yaml configuration file containing the exact hyperparam settings to reproduce the model. VISSL implements all the components including loss, data augmentations, collators etc required for this approach.
To train ResNet-50 model on 8-gpus on ImageNet-1K dataset using feature projection dimension 128 for memory:
python tools/run_distributed_engines.py config=pretrain/swav/swav_8node_resnet
Using Synchronized BatchNorm for training¶
For training SwAV models, we convert all the BatchNorm layers to Global BatchNorm. For this, VISSL supports PyTorch SyncBatchNorm
module and NVIDIA’s Apex SyncBatchNorm layers. Set the config params MODEL.SYNC_BN_CONFIG.SYNC_BN_TYPE
to apex
or pytorch
.
If you want to use Apex, VISSL provides anaconda
and pip
packages of Apex (compiled with Optimzed C++ extensions/CUDA kernels). The Apex
packages are provided for all versions of CUDA (9.2, 10.0, 10.1, 10.2, 11.0), PyTorch >= 1.4 and Python >=3.6 and <=3.9
.
To use SyncBN during training, one needs to set the following parameters in configuration file:
MODEL:
SYNC_BN_CONFIG:
CONVERT_BN_TO_SYNC_BN: True
SYNC_BN_TYPE: apex
# 1) if group_size=-1 -> use the VISSL default setting. We synchronize within a
# machine and hence will set group_size=num_gpus per node. This gives the best
# speedup.
# 2) if group_size>0 -> will set group_size=value set by user.
# 3) if group_size=0 -> no groups are created and process_group=None. This means
# global sync is done.
GROUP_SIZE: 8
Using Mixed Precision for training¶
SwAV approach leverages mixed precision training by default for better training speed and reducing the model memory requirement. For this, we use NVIDIA Apex Library with Apex AMP level O1.
To use Mixed precision training, one needs to set the following parameters in configuration file:
MODEL:
AMP_PARAMS:
USE_AMP: True
# Use O1 as it is robust and stable than O3. If you want to use O3, we recommend
# the following setting:
# {"opt_level": "O3", "keep_batchnorm_fp32": True, "master_weights": True, "loss_scale": "dynamic"}
AMP_ARGS: {"opt_level": "O1"}
Using LARC for training¶
SwAV training uses LARC from NVIDIA’s Apex LARC. To use LARC, users need to set config option
OPTIMIZER.use_larc=True
. VISSL exposed LARC parameters that users can tune. Full list of LARC parameters exposed by VISSL:
OPTIMIZER:
name: "sgd"
use_larc: False # supported for SGD only for now
larc_config:
clip: False
eps: 1e-08
trust_coefficient: 0.001
Note
LARC is currently supported for SGD optimizer only.
Vary the training loss settings¶
Users can adjust several settings from command line to train the model with different hyperparams. For example: to use a different temperature 0.2 for logits, epsilon of 0.04, the training command would look like:
python tools/run_distributed_engines.py config=pretrain/swav/swav_8node_resnet \
config.LOSS.swav_loss.temperature=0.2 \
config.LOSS.swav_loss.epsilon=0.04
The full set of loss params that VISSL allows modifying:
swav_loss:
temperature: 0.1
use_double_precision: False
normalize_last_layer: True
num_iters: 3
epsilon: 0.05
temp_hard_assignment_iters: 0
crops_for_assign: [0, 1]
embedding_dim: 128 # automatically inferred from HEAD params
num_crops: 2 # automatically inferred from data transforms
num_prototypes: [3000] # automatically inferred from model HEAD settings
# for dumping the debugging info in case loss becomes NaN
output_dir: "" # automatically inferred and set to checkpoint dir
queue:
start_iter: 0
queue_length: 0 # automatically adjusted to ensure queue_length % global batch size = 0
local_queue_length: 0 # automatically inferred to queue_length // world_size
Training different model architecture¶
VISSL supports many backbone architectures including ResNe(X)ts, wider ResNets. Some examples below:
Train ResNet-101:
python tools/run_distributed_engines.py config=pretrain/swav/swav_8node_resnet \
config.MODEL.TRUNK.NAME=resnet config.MODEL.TRUNK.TRUNK_PARAMS.RESNETS.DEPTH=101
Train ResNet-50-w2 (2x wider):
python tools/run_distributed_engines.py config=pretrain/swav/swav_8node_resnet \
config.MODEL.TRUNK.NAME=resnet config.MODEL.TRUNK.TRUNK_PARAMS.RESNETS.DEPTH=101 \
config.MODEL.TRUNK.TRUNK_PARAMS.RESNETS.WIDTH_MULTIPLIER=2
Train RegNetY-400MF:
python tools/run_distributed_engines.py config=pretrain/swav/swav_8node_resnet \
config.MODEL.TRUNK.NAME=regnet config.MODEL.TRUNK.TRUNK_PARAMS.REGNET.name=regnet_y_400mf
Train RegNetY-256GF:
python tools/run_distributed_engines.py config=pretrain/swav/swav_8node_resnet \
config.MODEL.TRUNK.NAME=regnet \
config.MODEL.TRUNK.TRUNK_PARAMS.REGNET.depth=27 \
config.MODEL.TRUNK.TRUNK_PARAMS.REGNET.w_0=640 \
config.MODEL.TRUNK.TRUNK_PARAMS.REGNET.w_a=230.83 \
config.MODEL.TRUNK.TRUNK_PARAMS.REGNET.w_m=2.53 \
config.MODEL.TRUNK.TRUNK_PARAMS.REGNET.group_width=373 \
config.MODEL.HEAD.PARAMS=[["swav_head", {"dims": [10444, 10444, 128], "use_bn": False, "num_clusters": [3000]}]]
Training with Multi-Crop data augmentation¶
SwAV is trained using the multi-crop augmentation proposed in SwAV paper.
Multi-crop augmentation can allow using more positives and also positives of different resolutions. In order to train SwAV with multi-crop
augmentation say crops 2x224 + 4x96
i.e. 2 crops of resolution 224 and 4 crops of resolution 96, the training command looks like:
python tools/run_distributed_engines.py config=pretrain/swav/swav_8node_resnet \
+config/pretrain/swav/transforms=multicrop_2x224_4x96
The multicrop_2x224_4x96.yaml
configuration file changes the number of crop settings to 6 crops and the right resolution.
Varying the multi-crop augmentation settings¶
VISSL allows modifying the crops to use. Full settings exposed:
TRANSFORMS:
- name: ImgPilToMultiCrop
total_num_crops: 6 # Total number of crops to extract
num_crops: [2, 4] # Specifies the number of type of crops.
size_crops: [160, 96] # Specifies the height (height = width) of each patch
crop_scales: [[0.08, 1], [0.05, 0.14]] # Scale of the crop
Training with different MLP head¶
By default, the original SwAV approach used the 2-layer MLP-head similar to SimCLR approach. VISSL allows attaching any different desired head. In order to modify the MLP head (more layers, different dimensions etc), see the following examples:
3-layer MLP head: Use the following head (example for ResNet model)
MODEL:
HEAD:
PARAMS: [
["swav_head", {"dims": [2048, 2048, 2048, 128], "use_bn": True, "num_clusters": [3000]}],
]
Use 2-layer MLP with hidden dimension 4096: Use the following head (example for ResNet model)
MODEL:
HEAD:
PARAMS: [
["swav_head", {"dims": [2048, 4096, 128], "use_bn": True, "num_clusters": [3000]}],
]
Vary the number of epochs¶
In order to vary the number of epochs to use for training SwAV models, one can achieve this simply
from command line. For example, to train the SwAV model for 100 epochs instead, pass the num_epochs
parameter from command line:
python tools/run_distributed_engines.py config=pretrain/swav/swav_8node_resnet \
config.OPTIMIZER.num_epochs=100
Vary the number of gpus¶
VISSL makes it extremely easy to vary the number of gpus to be used in training. For example: to train the SwAV model on 4 machines (32gpus) or 1gpu, the changes required are:
Training on 1-gpu:
python tools/run_distributed_engines.py config=pretrain/swav/swav_8node_resnet \
config.DISTRIBUTED.NUM_PROC_PER_NODE=1 config.DISTRIBUTED.NUM_NODES=1
Training on 4 machines i.e. 32-gpu:
python tools/run_distributed_engines.py config=pretrain/swav/swav_8node_resnet \
config.DISTRIBUTED.NUM_PROC_PER_NODE=8 config.DISTRIBUTED.NUM_NODES=4
Note
Please adjust the learning rate following ImageNet in 1-Hour if you change the number of gpus.
Pre-trained models¶
See VISSL Model Zoo for the PyTorch pre-trained models with SwAV using DeepClusterV2 approach and the benchmarks.
Citations¶
DeepClusterV2
@misc{caron2020unsupervised,
title={Unsupervised Learning of Visual Features by Contrasting Cluster Assignments},
author={Mathilde Caron and Ishan Misra and Julien Mairal and Priya Goyal and Piotr Bojanowski and Armand Joulin},
year={2020},
eprint={2006.09882},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Benchmark: Linear Image Classification¶
VISSL provides a standardized benchmark suite to evaluate the feature representation quality of self-supervised pretrained models. A popular evaluation protocol is to freeze the model traink and train linear classifiers on several layers of the model on some target datasets (like ImageNet-1k, Places205, VOC07, iNaturalist2018). In VISSL, we support all the linear evals on all the datasets. We also provide standard set of hyperparams for various approaches in order to reproduce the model performance in SSL literature. For reproducibility, see VISSL Model Zoo.
Note
To run the benchmark, we recommend using the standard set of hyperparams provided by VISSL as these hyperparams reproduce results of large number of self-supervised approaches. Users are however free to modify the hyperparams to suit their evaluation criterion.
Eval Config settings using MLP head¶
Set the following in the config file to enable the feature evaluation
Attaching MLP head to many layers of trunk¶
To attach linear classifier (
FC
) on the trunk output, example for a ResNet-50 model:
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_ONLY: True
SHOULD_FLATTEN_FEATS: False
LINEAR_EVAL_FEAT_POOL_OPS_MAP: [
["res5avg", ["Identity", []]],
]
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
HEAD:
PARAMS: [
["mlp", {"dims": [2048, 1000]}],
]
Attaching MLP head to trunk output¶
To attach a linear classifier at multiple layers of model following Zhang et. al style which has BN -> FC
as the head, use eval_mlp
head. For example, for a ResNet-50 model,
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_ONLY: True
SHOULD_FLATTEN_FEATS: False
LINEAR_EVAL_FEAT_POOL_OPS_MAP: [
["conv1", ["AvgPool2d", [[10, 10], 10, 4]]],
["res2", ["AvgPool2d", [[16, 16], 8, 0]]],
["res3", ["AvgPool2d", [[13, 13], 5, 0]]],
["res4", ["AvgPool2d", [[8, 8], 3, 0]]],
["res5", ["AvgPool2d", [[6, 6], 1, 0]]],
["res5avg", ["Identity", []]],
]
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
HEAD:
PARAMS: [
["eval_mlp", {"in_channels": 64, "dims": [9216, 1000]}],
["eval_mlp", {"in_channels": 256, "dims": [9216, 1000]}],
["eval_mlp", {"in_channels": 512, "dims": [8192, 1000]}],
["eval_mlp", {"in_channels": 1024, "dims": [9216, 1000]}],
["eval_mlp", {"in_channels": 2048, "dims": [8192, 1000]}],
["eval_mlp", {"in_channels": 2048, "dims": [2048, 1000]}],
]
Eval Config settings using SVM training¶
For SVM trainings, we only care about extracting the features from the model. We dump the features on disk and train SVMs. To extract the features:
Features from several layers of the trunk¶
For example, for a ResNet-50 model, to train features from many layers of the model, the example config:
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_ONLY: True
EXTRACT_TRUNK_FEATURES_ONLY: True # only extract the features and we will train SVM on these
LINEAR_EVAL_FEAT_POOL_OPS_MAP: [
["res4", ["AvgPool2d", [[8, 8], 3, 0]]],
["res5", ["AvgPool2d", [[6, 6], 1, 0]]],
["res5avg", ["Identity", []]],
]
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
Features from the trunk output¶
For example, for a ResNet-50 model, to train features from model trunk output, the example config:
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_ONLY: True
EXTRACT_TRUNK_FEATURES_ONLY: True
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
Below, we provide instruction on how to run each benchmark.
Benchmark: ImageNet-1k¶
The configuration setting for this benchmark is provided here .
python tools/run_distributed_engines.py \
config=benchmark/inear_image_classification/imagenet1k/eval_resnet_8gpu_transfer_in1k_linear \
config.MODEL.WEIGHTS_INIT.PARAMS_FILE=<my_weights.torch>
Benchmark: Places205¶
The configuration setting for this benchmark is provided here .
python tools/run_distributed_engines.py \
config=benchmark/inear_image_classification/places205/eval_resnet_8gpu_transfer_places205_linear \
config.MODEL.WEIGHTS_INIT.PARAMS_FILE=<my_weights.torch>
Benchmark: iNaturalist2018¶
The configuration setting for this benchmark is provided here .
python tools/run_distributed_engines.py \
config=benchmark/inear_image_classification/inaturalist18/eval_resnet_8gpu_transfer_inaturalist18_linear \
config.MODEL.WEIGHTS_INIT.PARAMS_FILE=<my_weights.torch>
Benchmark: Linear SVM on VOC07¶
VISSL provides train_svm.py
tool that will first extract features and then train/test SVMs on these features.
The configuration setting for this benchmark is provided here .
python tools/train_svm.py \
config=benchmark/inear_image_classification/voc07/eval_resnet_8gpu_transfer_voc07_svm \
config.MODEL.WEIGHTS_INIT.PARAMS_FILE=<my_weights.torch>
Note
Please see VISSL documentation on how to run a given training on 1-gpu, multi-gpu or multi-machine.
Note
Please see VISSL documentation on how to use the builtin datasets.
Note
Please see VISSL documentation on how to use YAML comfiguration system in VISSL to override specific components like model of a config file. For example, in the above file, user can replace ResNet-50 model with a different architecture like RegNetY-256 etc. easily.
Benchmark task: Full-finetuning¶
Using a self-supervised model to initialize a network and further tune the weights on the target task is a very common evaluation protocol. This benchmark requires only initializing the model and no other settings in MODEL.FEATURE_EVAL_SETTINGS
are needed unlike other benchmark tasks.
Benchmark: ImageNet-1k¶
VISSL provides the YAML configuration setting for this benchmark here which can be run as below.
python tools/run_distributed_engines.py \
config=benchmark/imagenet1k_fulltune/eval_resnet_8gpu_transfer_in1k_fulltune \
config.MODEL.WEIGHTS_INIT.PARAMS_FILE=<my_weights.torch>
Benchmark: Places205¶
VISSL provides the YAML configuration setting for this benchmark here which can be run as below.
python tools/run_distributed_engines.py \
config=benchmark/places205_fulltune/eval_resnet_8gpu_transfer_places205_fulltune \
config.MODEL.WEIGHTS_INIT.PARAMS_FILE=<my_weights.torch>
Note
Please see VISSL documentation on how to run a given training on 1-gpu, multi-gpu or multi-machine.
Note
Please see VISSL documentation on how to use the builtin datasets if you want to run this benchmark on a different target task..
Note
Please see VISSL documentation on how to use YAML comfiguration system in VISSL to override specific components like model of a config file. For example, in the above file, user can replace ResNet-50 model with a different architecture like RegNetY-256 etc. easily.
Benchmark: Nearest Neighbor k-means¶
VISSL supports Nearest Neighbor evaluation task using k-means. We closely follow the benchmark setup from Zhirong Wu et al. https://github.com/zhirongw/lemniscate.pytorch#nearest-neighbor. For the Nearest neighbor evaluation, the process involves 2 steps:
Step1: Extract the relevant features from the model for both training and validation set.
Step2: Perform k-means clustering and evaluation on these features
VISSL provides a dedicated tool tools/nearest_neighbor_test.py
that performs both Step-1 and Step-2 above.
Note
To run the benchmark, we recommend using the standard set of hyperparams provided by VISSL as these hyperparams reproduce results of large number of self-supervised approaches. Users are however free to modify the hyperparams to suit their evaluation criterion.
Eval Config settings using MLP head¶
Set the following in the config file to enable the feature evaluation properly.
kNN on many layers of the trunk¶
For the Step1, if we want to extract features from many layers of the trunk, the config setting should look like below. For example for a ResNet-50 model:
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_ONLY: True # only freeze the trunk
EXTRACT_TRUNK_FEATURES_ONLY: True # we extract features from the trunk only
SHOULD_FLATTEN_FEATS: False # don't flatten the features and return as is
LINEAR_EVAL_FEAT_POOL_OPS_MAP: [
["res4", ["AvgPool2d", [[8, 8], 3, 0]]],
["res5", ["AvgPool2d", [[6, 6], 1, 0]]],
]
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
kNN on the trunk output¶
If we want to perform kNN only on the trunk output, the configuration setting should look like below. For example, for a ResNet-50 model:
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_ONLY: True # only freeze the trunk
EXTRACT_TRUNK_FEATURES_ONLY: True # we extract features from the trunk only
SHOULD_FLATTEN_FEATS: False # don't flatten the features and return as is
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
kNN on the model head output (self-supervised head)¶
For a given self-supervised approach, we want to perform kNN on the output of the model head. This is very common where the model head is a projection head and projects the trunk features into a low-dimensional space.
The config settings should look like below. The example below is for SimCLR head + ResNet-50. Users can replace the MODEL.HEAD.PARAMS
with the head settings used in the respective
self-supervised model training.
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_AND_HEAD: True # both head and trunk will be frozen (including BN in eval mode)
EVAL_TRUNK_AND_HEAD: True # initialized the model head as well from weights
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
HEAD:
# SimCLR 2-layer model head structure
PARAMS: [
["mlp", {"dims": [2048, 2048], "use_relu": True}],
["mlp", {"dims": [2048, 128]}],
]
Benchmark: ImageNet-1k¶
VISSL provides configuration settings for the benchmark here.
To run the benchmark:
python tools/nearest_neighbor_test.py config=benchmark/nearest_neighbor/eval_resnet_8gpu_in1k_kNN \
config.MODEL.WEIGHTS_INIT.PARAMS_FILE=<my_weights.torch>
Benchmark: Places205¶
VISSL provides configuration settings for the benchmark here.
To run the benchmark:
python tools/nearest_neighbor_test.py config=benchmark/nearest_neighbor/eval_resnet_8gpu_places205_kNN \
config.MODEL.WEIGHTS_INIT.PARAMS_FILE=<my_weights.torch>
Note
Please see VISSL documentation on how to run a given training on 1-gpu, multi-gpu or multi-machine.
Note
Please see VISSL documentation on how to use the builtin datasets.
Note
Please see VISSL documentation on how to use YAML comfiguration system in VISSL to override specific components like model of a config file. For example, in the above file, user can replace ResNet-50 model with a different architecture like RegNetY-256 etc. easily.
Benchmark task: Full finetuning on Imagenet 1% , 10% subsets¶
Evaluating a self-supervised pre-trained model on the target dataset which represents 1% or 10% of Imagenet dataset has become a very common evaluation criterion. VISSL provides the benchmark settings for this benchmark here.
1% and 10% Data subsets¶
VISSL uses the 1% and 10% datasets from the SimCLR work. Users can download the datasets from here.
Users can use the DATA.TRAIN.DATA_SOURCES=[disk_filelist]
to load the images in these files. Users should replace each line with the valid full image path for that image.
Users should also extract the labels out from these datasets using the image_id
of each image.
Once the user has the valid image and labels files (.npy), users should set the dataset paths in
VISSL dataset_catalog.json for the datasets
google-imagenet1k-per01
and google-imagenet1k-per10
Benchmark: 1% ImageNet¶
Users can run the benchmark on 1% ImageNet subsets from SimCLR with the following command:
python tools/run_distributed_engines.py \
config=benchmark/semi_supervised/imagenet1k/eval_resnet_8gpu_transfer_in1k_semi_sup_fulltune \
+config/benchmark/semi_supervised/imagenet1k/dataset=simclr_in1k_per01
Benchmark: 10% ImageNet¶
Users can run the benchmark on 10% ImageNet subsets from SimCLR with the following command:
python tools/run_distributed_engines.py \
config=benchmark/semi_supervised/imagenet1k/eval_resnet_8gpu_transfer_in1k_semi_sup_fulltune \
+config/benchmark/semi_supervised/imagenet1k/dataset=simclr_in1k_per10
Note
Please see VISSL documentation on how to run a given training on 1-gpu, multi-gpu or multi-machine.
Note
Please see VISSL documentation on how to use the builtin datasets if you want to run this benchmark on a different target task..
Note
Please see VISSL documentation on how to use YAML comfiguration system in VISSL to override specific components like model of a config file. For example, in the above file, user can replace ResNet-50 model with a different architecture like RegNetY-256 etc. easily.
Benchmark task: Object Detection¶
Object Detection is a very common benchmark for evaluating feature representation quality. In VISSL, we use Detectron2 for the object detection benchmark.
This benchmark involves 2 steps:
Step1: Converting the self-supervised model weights so they are compatible with
Detectron2
.Step2: Using the converted weights in Step1, run the benchmark.
Converting weights to Detectron2¶
VISSL provides a script to convert the weight of VISSL compatible models to Detectron2. We recommend users to adapt this script to suit their needs (different model architecture etc).
To run the script, follow the command:
python extra_scripts/convert_vissl_to_detectron2.py \
--input_model_file <input_model_path>.torch \
--output_model <converted_d2_model_path>.torch \
--weights_type torch \
--state_dict_key_name classy_state_dict
The script above converts ResNe(X)ts models in VISSL to the models compatible with ResNe(X)ts in Detectron2.
Benchmark: Faster R-CNN on VOC07¶
VISSL provides the YAML configuration files for Detectron2
for the benchmark task of Object detection using Faster R-CNN
on VOC07.
The configuration files are available here.
To run the benchmark, VISSL provides a python script that closely follows MoCo object detection.
Please make sure to install Detectron2 following the Detectron2 Installation instructions.
To run the benchmark:
python tools/object_detection_benchmark.py \
--config-file ../configs/config/benchmark/object_detection/voc07/rn50_transfer_voc07_detectron2_e2e.yaml \
--num-gpus 8 MODEL.WEIGHTS <converted_d2_model_path>.torch
Note
We recommend users to consult Detectron2 documentation for how to use the configuration files and how to run the trainings.
PIRL object detection¶
To reproduce the object detection benchmark, the LR and warmup iterations are different. Use the following command:
python tools/object_detection_benchmark.py \
--config-file ../configs/config/benchmark/object_detection/voc07/pirl_npid/rn50_transfer_voc07_pirl_npid.yaml \
--num-gpus 8 MODEL.WEIGHTS <converted_d2_model_path>.torch
Benchmark: Faster R-CNN on VOC07+12¶
VISSL provides the YAML configuration files for Detectron2
for the benchmark task of Object detection using Faster R-CNN
on VOC07+12.
The configuration files are available here.
To run the benchmark, VISSL provides a python script that closely follows MoCo object detection.
Please make sure to install Detectron2 following the Detectron2 Installation instructions.
For the VOC07+12 benchmark, most self-supervised approaches use their set of hyperparams. VISSL provides the settings used in
Scaling and Benchmarking Self-Supervised Visual Representation Learning¶
python tools/object_detection_benchmark.py \
--config-file ../configs/config/benchmark/object_detection/voc0712/iccv19/rn50_transfer_voc0712_detectron2_e2e.yaml \
--num-gpus 8 MODEL.WEIGHTS <converted_d2_model_path>.torch
MoCoV2¶
python tools/object_detection_benchmark.py \
--config-file ../configs/config/benchmark/object_detection/voc0712/mocoV2/rn50_transfer_voc0712_detectron2_e2e.yaml \
--num-gpus 8 MODEL.WEIGHTS <converted_d2_model_path>.torch
Benchmark: Mask R-CNN on COCO¶
Benchmarking on COCO introduces many variants (model architecture, FPN or not, C4). We provide config files for all the variants here and encourage users to pick the settings most suitable for their needs.
Benchmarking on COCO is not as widely adopted (compared to VOC07 and voc0712 evaluation) in self-supervision literature. This benchmark has been demonstrated extensively in MoCoV2 paper and we encourage users to refer to the paper.
An example run:
python tools/object_detection_benchmark.py \
--config-file ../configs/config/benchmark/object_detection/COCOInstance/sbnExtraNorm_precBN_r50_c4_coco.yaml \
--num-gpus 8 MODEL.WEIGHTS <converted_d2_model_path>.torch
How to Extract Features¶
Given a pre-trained models, VISSL makes it easy to extract the features for the model on the datasets. VISSL seamlessly supports TorchVision models and in general, to load non-VISSL models, please follow our documentation for loading models.
To extract the features for a model that VISSL can load, users need 2 things:
config file: the configuration file should clearly specify what layers of the model should features be extracted from.
set the correct engine_name: in VISSL, we have two types of engine - a) training, b) feature extraction. Users must set
engine_name=extract_features
in the yaml config file.
Note
The SVM training and Nearest Neighbor benchmark workflows don’t require setting the :code`engine_name` because the provided
tools train_svm
and nearest_neighbor_test
explicitly add the feature extraction step.
Config File for Feature Extraction¶
Using the following examples, set the config options for your desired use case of feature extraction. Following examples are for ResNet-50 but users can use their model.
Extract features from several layers of the trunk¶
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_ONLY: True
EXTRACT_TRUNK_FEATURES_ONLY: True
SHOULD_FLATTEN_FEATS: False
LINEAR_EVAL_FEAT_POOL_OPS_MAP: [
["conv1", ["AvgPool2d", [[10, 10], 10, 4]]],
["res2", ["AvgPool2d", [[16, 16], 8, 0]]],
["res3", ["AvgPool2d", [[13, 13], 5, 0]]],
["res4", ["AvgPool2d", [[8, 8], 3, 0]]],
["res5", ["AvgPool2d", [[6, 6], 1, 0]]],
["res5avg", ["Identity", []]],
]
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
Extract features of the trunk output¶
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_ONLY: True
EXTRACT_TRUNK_FEATURES_ONLY: True
SHOULD_FLATTEN_FEATS: False
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
Extract features of the model head output (self-supervised head)¶
For a given self-supervised approach, to extract the features of the model head (this is very common use case where the model head is a projection head and projects the trunk features into a low-dimensional space),
The config settings should look like below. The example below is for SimCLR head + ResNet-50. Users can replace the MODEL.HEAD.PARAMS
with the head settings used in the respective
self-supervised model training.
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_AND_HEAD: True
EVAL_TRUNK_AND_HEAD: True
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
HEAD:
PARAMS: [
["mlp", {"dims": [2048, 2048], "use_relu": True}],
["mlp", {"dims": [2048, 128]}],
]
How to extract features¶
Once users have the desired config file, user can extract features using the following command. VISSL also provides the config files here that users can modify/adapt to their needs.
python tools/run_distributed_engines.py \
config=feature_extraction/extract_resnet_in1k_8gpu \
+config/feature_extraction/trunk_only=rn50_layers \
config.MODEL.WEIGHTS_INIT.PARAMS_FILE=<my_weights.torch>
Summary: Feature Eval Config Settings¶
This doc describes summary of how to set :code:MODEL.FEATURE_EVAL_SETTINGS
parameter for different evaluations.
In order to evaluate the model, you need to set :code:MODEL.FEATURE_EVAL_SETTINGS
in yaml config file. Various options determine how the model is evaluated and also what part of the model is initialized from weights or what part of the model is frozen.
Below we provide instructions for setting the :code:MODEL.FEATURE_EVAL_SETTINGS
for evaluating a pre-trained model on several benchmark tasks. Below are only some example scenarios but hopefully provide an idea for any different use case one might have in mind.
Linear Image Classification with MLP heads¶
Attach MLP heads to several layers of the trunk¶
If you want Zhang et. al style which has
BN -> FC
as the head, useeval_mlp
head. Example:
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_ONLY: True
SHOULD_FLATTEN_FEATS: False
LINEAR_EVAL_FEAT_POOL_OPS_MAP: [
["conv1", ["AvgPool2d", [[10, 10], 10, 4]]],
["res2", ["AvgPool2d", [[16, 16], 8, 0]]],
["res3", ["AvgPool2d", [[13, 13], 5, 0]]],
["res4", ["AvgPool2d", [[8, 8], 3, 0]]],
["res5", ["AvgPool2d", [[6, 6], 1, 0]]],
["res5avg", ["Identity", []]],
]
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
HEAD:
PARAMS: [
["eval_mlp", {"in_channels": 64, "dims": [9216, 1000]}],
["eval_mlp", {"in_channels": 256, "dims": [9216, 1000]}],
["eval_mlp", {"in_channels": 512, "dims": [8192, 1000]}],
["eval_mlp", {"in_channels": 1024, "dims": [9216, 1000]}],
["eval_mlp", {"in_channels": 2048, "dims": [8192, 1000]}],
["eval_mlp", {"in_channels": 2048, "dims": [2048, 1000]}],
]
WEIGHTS_INIT:
PARAMS_FILE: ""
STATE_DICT_KEY_NAME: classy_state_dict
If you want
FC
layer only in the head, usemlp
head. Example:
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_ONLY: True
SHOULD_FLATTEN_FEATS: False
LINEAR_EVAL_FEAT_POOL_OPS_MAP: [
["res5avg", ["Identity", []]],
]
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
HEAD:
PARAMS: [
["mlp", {"dims": [2048, 1000]}],
]
WEIGHTS_INIT:
PARAMS_FILE: ""
STATE_DICT_KEY_NAME: classy_state_dict
Attach MLP head to the trunk output¶
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_ONLY: True
SHOULD_FLATTEN_FEATS: False
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
HEAD:
PARAMS: [
["eval_mlp", {"in_channels": 2048, "dims": [2048, 1000]}],
]
WEIGHTS_INIT:
PARAMS_FILE: ""
STATE_DICT_KEY_NAME: classy_state_dict
Linear Image Classification with SVM trainings¶
Train SVM on several layers of the trunk¶
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_ONLY: True
EXTRACT_TRUNK_FEATURES_ONLY: True # only extract the features and we will train SVM on these
LINEAR_EVAL_FEAT_POOL_OPS_MAP: [
["res4", ["AvgPool2d", [[8, 8], 3, 0]]],
["res5", ["AvgPool2d", [[6, 6], 1, 0]]],
["res5avg", ["Identity", []]],
]
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
Train SVM on the trunk output¶
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_ONLY: True
EXTRACT_TRUNK_FEATURES_ONLY: True
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
Nearest Neighbor¶
knn test on trunk output¶
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_ONLY: True # only freeze the trunk
EXTRACT_TRUNK_FEATURES_ONLY: True # we extract features from the trunk only
SHOULD_FLATTEN_FEATS: False # don't flatten the features and return as is
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
WEIGHTS_INIT:
PARAMS_FILE: ""
STATE_DICT_KEY_NAME: classy_state_dict
knn test on model head output (self-supervised head)¶
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_AND_HEAD: True # both head and trunk will be frozen (including BN in eval mode)
EVAL_TRUNK_AND_HEAD: True # initialized the model head as well from weights
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
HEAD:
# SimCLR model head structure
PARAMS: [
["mlp", {"dims": [2048, 2048], "use_relu": True}],
["mlp", {"dims": [2048, 128]}],
]
WEIGHTS_INIT:
PARAMS_FILE: ""
STATE_DICT_KEY_NAME: classy_state_dict
knn test on several layers of the trunk¶
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_ONLY: True # only freeze the trunk
EXTRACT_TRUNK_FEATURES_ONLY: True # we extract features from the trunk only
SHOULD_FLATTEN_FEATS: False # don't flatten the features and return as is
LINEAR_EVAL_FEAT_POOL_OPS_MAP: [
["res4", ["AvgPool2d", [[8, 8], 3, 0]]],
["res5", ["AvgPool2d", [[6, 6], 1, 0]]],
]
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
WEIGHTS_INIT:
PARAMS_FILE: ""
STATE_DICT_KEY_NAME: classy_state_dict
Feature Extraction¶
You need to set engine_name: extract_features
in the config file or pass the engine_name=extract_features
as an additional input from the command line.
Extract features from several layers of the trunk¶
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_ONLY: True
EXTRACT_TRUNK_FEATURES_ONLY: True
SHOULD_FLATTEN_FEATS: False
LINEAR_EVAL_FEAT_POOL_OPS_MAP: [
["conv1", ["AvgPool2d", [[10, 10], 10, 4]]],
["res2", ["AvgPool2d", [[16, 16], 8, 0]]],
["res3", ["AvgPool2d", [[13, 13], 5, 0]]],
["res4", ["AvgPool2d", [[8, 8], 3, 0]]],
["res5", ["AvgPool2d", [[6, 6], 1, 0]]],
["res5avg", ["Identity", []]],
]
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
Extract features of the trunk output¶
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_ONLY: True
EXTRACT_TRUNK_FEATURES_ONLY: True
SHOULD_FLATTEN_FEATS: False
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
Extract features of the model head output (self-supervised head)¶
MODEL:
FEATURE_EVAL_SETTINGS:
EVAL_MODE_ON: True
FREEZE_TRUNK_AND_HEAD: True
EVAL_TRUNK_AND_HEAD: True
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
HEAD:
PARAMS: [
["mlp", {"dims": [2048, 2048], "use_relu": True}],
["mlp", {"dims": [2048, 128]}],
]
Full finetuning¶
Since this only requires to initialize the model from the pre-trained model weights, there’s
no need for the FEATURE_EVAL_SETTINGS
params. Simply set MODEL.WEIGHTS_INIT
params.
How to Load Pretrained Models¶
VISSL supports Torchvision models out of the box. Generally, for loading any non-VISSL model, one needs to correctly set the following configuration options:
WEIGHTS_INIT:
# path to the .torch weights files
PARAMS_FILE: ""
# name of the state dict. checkpoint = {"classy_state_dict": {layername:value}}. Options:
# 1. classy_state_dict - if model is trained and checkpointed with VISSL.
# checkpoint = {"classy_state_dict": {layername:value}}
# 2. "" - if the model_file is not a nested dictionary for model weights i.e.
# checkpoint = {layername:value}
# 3. key name that your model checkpoint uses for state_dict key name.
# checkpoint = {"your_key_name": {layername:value}}
STATE_DICT_KEY_NAME: "classy_state_dict"
# specify what layer should not be loaded. Layer names with this key are not copied
# By default, set to BatchNorm stats "num_batches_tracked" to be skipped.
SKIP_LAYERS: ["num_batches_tracked"]
####### If loading a non-VISSL trained model, set the following two args carefully #########
# to make the checkpoint compatible with VISSL, if you need to remove some names
# from the checkpoint keys, specify the name
REMOVE_PREFIX: ""
# In order to load the model (if not trained with VISSL) with VISSL, there are 2 scenarios:
# 1. If you are interested in evaluating the model features and freeze the trunk.
# Set APPEND_PREFIX="trunk.base_model." This assumes that your model is compatible
# with the VISSL trunks. The VISSL trunks start with "_feature_blocks." prefix. If
# your model doesn't have these prefix you can append them. For example:
# For TorchVision ResNet trunk, set APPEND_PREFIX="trunk.base_model._feature_blocks."
# 2. where you want to load the model simply and finetune the full model.
# Set APPEND_PREFIX="trunk."
# This assumes that your model is compatible with the VISSL trunks. The VISSL
# trunks start with "_feature_blocks." prefix. If your model doesn't have these
# prefix you can append them.
# For TorchVision ResNet trunk, set APPEND_PREFIX="trunk._feature_blocks."
# NOTE: the prefix is appended to all the layers in the model
APPEND_PREFIX: ""
Training¶
The training in VISSL is composed of following components: Trainer, train task and train step
Trainer¶
The main entry point for any training or feature extraction workflows in VISSL if the trainer. It performs following:
The trainer constructs a
train_task
which prepares all the components of the training (optimizer, loss, meters, model etc) using the settings specified by user in the yaml config file. Read below for details about train task.Setup the distributed training. VISSL support both GPU and CPU only training.
Initialize the
torch.distributed.init_process_group
if the distributed is not already initialized. The init_method, backend are specified by user in the yaml config file. See VISSL defaults.yaml file for description on how to setinit_method
,backend
.
We also set the global cuda device index using torch.cuda.set_device or cpu device
Executed the training or feature extraction workflows depending on :code`engine_name` set by users.
Training workflow¶
The training workflows executes the following steps. We get the training loop to use (vissl default is standard_train_step
but the user can create their own training loop and specify the name TRAINER.TRAIN_STEP_NAME
). The training happens:
Execute any hooks at the start of training (mostly resets the variable like iteration num phase_num etc)
For each epoch (train or test), run the hooks at the start of an epoch. Mostly involves setting things like timer, setting dataloader epoch etc
Execute the training loop (1 training iteration) involving forward, loss, backward, optimizer update, metrics collection etc.
At the end of epoch, sync meters and execute hooks at the end of phase. Involves things like checkpointing model, logging timers, logging to tensorboard etc
Feature extraction workflow¶
Set engine_name=extract_features
in the config file to enable feature extraction.
Extract workflow supports multi-gpu feature extraction. Since we are only extracting features, only the model is built (and initialized from some model weights file if specified by user). The model is set to the eval mode fully. The features are extracted for whatever data splits (train, val, test) etc that user wants.
Train Task¶
A task prepares and holds all the components of a training like optimizer, datasets, dataloaders, losses, meters etc. Task also contains the variable like training iteration, epoch number etc. that are updated during the training.
We prepare every single component according to the parameter settings user wants and specified in the yaml config file.
Task also supports 2 additional things:
converts the model BatchNorm layers to the synchronized batchnorm. Set the
MODEL.SYNC_BN_CONFIG.CONVERT_BN_TO_SYNC_BN=true
sets mixed precision (apex and pytorch both supported). Set the
MODEL.AMP_PARAMS.USE_AMP=true
and select the desired AMP settings.
Train Loop¶
VISSL implements a default training loop (single iteration step) that is used for self-supervised training of all VISSL reference approaches, for feature extraction and for supervised workflows. Users con implement their own training loop.
The training loop performs: data read, forward, loss computation, backward, optimizer step, parameter updates.
Various intermediate steps are also performed:
logging the training loss, training eta, LR, etc to loggers
logging to tensorboard,
performing any self-supervised method specific operations (like in MoCo approach, the momentum encoder is updated), computing the scores in swav
checkpointing model if user wants to checkpoint in the middle of an epoch
To select the training loop:
TRAINER:
# default training loop. User can define their own loop and use that instead.
TRAIN_STEP_NAME: "standard_train_step"
Building Models¶
The model in VISSL is split into trunk
that computes features and head
that computes outputs (projections, classifications etc).
VISSL supports several types of Heads and several types of trunks. Overall, the following use cases are supported by VISSL models:
Model producing single output as in standard supervised ImageNet training
Model producing multiple outputs (Multi-task)
Model producing multiple outputs from different features (layers) from the trunk (useful in linear evaluation of features from several model layers)
Model that accepts multiple inputs (e.g. image and patches as in PIRL appraoch).
Model where the trunk is frozen and head is trained
Model that supports multiple resolutions inputs as in SwAV
Model that is completely frozen and features are extracted.
Trunks¶
VISSL supports many trunks including AlexNet (variants for approaches like Jigsaw, Colorization, RotNet, DeepCluster etc), ResNets, ResNeXt, RegNets, EfficientNet.
To set the trunk, user needs to specify the trunk name in MODEL.TRUNK.NAME
.
Examples of trunks:
Using ResNe(X)ts trunk:
MODEL:
TRUNK:
NAME: resnet
TRUNK_PARAMS:
RESNETS:
DEPTH: 50
WIDTH_MULTIPLIER: 1
NORM: BatchNorm # BatchNorm | LayerNorm
GROUPS: 1
ZERO_INIT_RESIDUAL: False
WIDTH_PER_GROUP: 64
# Colorization model uses stride=1 for last layer to retain higher spatial resolution
# for the pixel-wise task. Torchvision default is stride=2 and all other models
# use this so we set the default as 2.
LAYER4_STRIDE: 2
Using RegNets trunk: We follow RegNets defined in ClassyVision directly and users can either use a pre-defined ClassyVision RegNet config or define their own.
for example, to create a new RegNet config for RegNet-256Gf model (1.3B params):
MODEL: TRUNK: NAME: regnet TRUNK_PARAMS: REGNET: depth: 27 w_0: 640 w_a: 230.83 w_m: 2.53 group_width: 373
To use a pre-defined RegNet config in classy vision example: RegNetY-16gf
MODEL: TRUNK: NAME: regnet_y_16gf
Heads¶
This function creates the heads needed by the module. The head is specified by setting MODEL.HEAD.PARAMS
in the configuration file.
The MODEL.HEAD.PARAMS
is a list of Pairs containing parameters for (multiple) heads.
Pair[0] = Name of Head.
Pair[1] = kwargs passed to head constructor.
Example of [“name”, kwargs] MODEL.HEAD.PARAMS=["mlp", {"dims": [2048, 128]}]
Types of Heads one can specify¶
Case1: Simple Head containing single module - Single Input, Single output
MODEL:
HEAD:
PARAMS: [
["mlp", {"dims": [2048, 128]}]
]
Case2: Complex Head containing chain of head modules - Single Input, Single output
MODEL:
HEAD:
PARAMS: [
["mlp", {"dims": [2048, 1000], "use_bn": False, "use_relu": False}],
["siamese_concat_view", {"num_towers": 9}],
["mlp", {"dims": [9000, 128]}]
]
Case3: Multiple Heads (example 2 heads) - Single input, multiple output: can be used for multi-task learning
MODEL:
HEAD:
PARAMS: [
# head 0
[
["mlp", {"dims": [2048, 128]}]
],
# head 1
[
["mlp", {"dims": [2048, 1000], "use_bn": False, "use_relu": False}],
["siamese_concat_view", {"num_towers": 9}],
["mlp", {"dims": [9000, 128]}],
]
]
Case4: Multiple Heads (example 5 simple heads) - Single input, multiple output:: For example, used in linear evaluation of models
MODEL:
HEAD:
PARAMS: [
["eval_mlp", {"in_channels": 64, "dims": [9216, 1000]}],
["eval_mlp", {"in_channels": 256, "dims": [9216, 1000]}],
["eval_mlp", {"in_channels": 512, "dims": [8192, 1000]}],
["eval_mlp", {"in_channels": 1024, "dims": [9216, 1000]}],
["eval_mlp", {"in_channels": 2048, "dims": [8192, 1000]}],
]
Applying heads on multiple trunk features¶
By default, the head operates on the trunk output (single or multiple output). However, one can explicitly specify the input
to heads mapping in the list MODEL.MULTI_INPUT_HEAD_MAPPING
. This is used in PIRL training.
Assumptions:
This assumes that the same trunk is used to extract features for the different types of inputs.
One head only operates on one kind of input, Every individual head can contain several layers as in Case2 above.
MODEL.MULTI_INPUT_HEAD_MAPPING
specifies Input -> Trunk Features mapping. Like in the single input case, the heads can operate on features from different layers. In this case, we specify MODEL.MULTI_INPUT_HEAD_MAPPING
to be a list like:
MODEL:
MULTI_INPUT_HEAD_MAPPING: [
["input_key", [list of features heads is applied on]]
]
For example: for a model that applies two heads on images and one head on patches:
MODEL:
MULTI_INPUT_HEAD_MAPPING: [
["images", ["res5", "res4"]],
["patches", ["res3"]
],
Using Optimizers¶
VISSL support all PyTorch optimizers (SGD, Adam etc) and ClassyVision optimizers.
Creating Optimizers¶
The optimizers can be easily created from the configuration files. The user needs to set the optimizer name in OPTIMIZER.name
. Users can configure other settings like #epochs, etc as follows:
OPTIMIZER:
name: sgd
weight_decay: 0.0001
momentum: 0.9
nesterov: False
# for how many epochs to do training. only counts training epochs.
num_epochs: 90
# whether to regularize batch norm. if set to False, weight decay of batch norm params is 0.
regularize_bn: False
# whether to regularize bias parameter. if set to False, weight decay of bias params is 0.
regularize_bias: True
Using different LR for Head and trunk¶
VISSL supports using a different LR and weight decay for head and trunk. User needs to set the config option OPTIMIZER.head_optimizer_params.use_different_values=True
in order to enable
this functionality.
OPTIMIZER:
head_optimizer_params:
# if the head should use a different LR than the trunk. If yes, then specify the
# param_schedulers.lr_head settings. Otherwise if set to False, the
# param_scheduelrs.lr will be used automatically.
use_different_lr: False
# if the head should use a different weight decay value than the trunk.
use_different_wd: False
# if using different weight decay value for the head, set here. otherwise, the
# same value as trunk will be automatically used.
weight_decay: 0.0001
Using LARC¶
VISSL supports the LARC implementation from NVIDIA’s Apex LARC. To use LARC, users need to set config option
OPTIMIZER.use_larc=True
. VISSL exposes LARC parameters that users can tune. Full list of LARC parameters exposed by VISSL:
OPTIMIZER:
name: "sgd"
use_larc: False # supported for SGD only for now
larc_config:
clip: False
eps: 1e-08
trust_coefficient: 0.001
Note
LARC is currently supported for SGD optimizer only.
Creating LR Schedulers¶
Users can use different types of Learning rate schedules for the training of their models. We closely follow the LR schedulers supported by ClassyVision and also custom learning rate schedules in VISSL.
How to set learning rate¶
Below we provide some examples of how to setup various types of Learning rate schedules. Note that these are merely some examples and you should set your desired parameter values.
Cosine
OPTIMIZER:
param_schedulers:
lr:
name: cosine
start_value: 0.15 # LR for batch size 256
end_value: 0.0000
Multi-Step
OPTIMIZER:
param_schedulers:
lr:
name: multistep
values: [0.01, 0.001]
milestones: [1]
update_interval: epoch # update LR after every epoch
Linear Warmup + Cosine
OPTIMIZER:
param_schedulers:
lr:
name: composite
schedulers:
- name: linear
start_value: 0.6
end_value: 4.8
- name: cosine
start_value: 4.8
end_value: 0.0048
interval_scaling: [rescaled, fixed]
update_interval: step
lengths: [0.1, 0.9] # 100ep
Cosine with restarts
OPTIMIZER:
param_schedulers:
lr:
name: cosine_warm_restart
start_value: 0.15 # LR for batch size 256
end_value: 0.00015
restart_interval_length: 0.5
wave_type: half # full | half
Linear warmup + cosine with restarts
OPTIMIZER:
param_schedulers:
lr:
name: composite
schedulers:
- name: linear
start_value: 0.6
end_value: 4.8
- name: cosine_warm_restart
start_value: 4.8
end_value: 0.0048
# wave_type: half
# restart_interval_length: 0.5
wave_type: full
restart_interval_length: 0.334
interval_scaling: [rescaled, rescaled]
update_interval: step
lengths: [0.1, 0.9] # 100ep
Multiple linear warmups and cosine
OPTIMIZER:
param_schedulers:
lr:
schedulers:
- name: linear
start_value: 0.6
end_value: 4.8
- name: cosine
start_value: 4.8
end_value: 0.0048
- name: linear
start_value: 0.0048
end_value: 2.114
- name: cosine
start_value: 2.114
end_value: 0.0048
update_interval: step
interval_scaling: [rescaled, rescaled, rescaled, rescaled]
lengths: [0.0256, 0.48722, 0.0256, 0.46166] # 1ep IG-500M
Auto-scaling of Learning Rate¶
VISSL supports automatically scaling LR as per https://arxiv.org/abs/1706.02677. To turn this automatic scaling on, set config.OPTIMIZER.param_schedulers.lr.auto_lr_scaling.auto_scale=true
.
scaled_lr
is calculated: for a given
base_lr_batch_size
= batch size for which the base learning rate is specified,base_value
= base learning rate value that will be scaled, the current batch size is used to determine how to scale the base learning rate value.
scaled_lr = ((batchsize_per_gpu * world_size) * base_value ) / base_lr_batch_size
For different types of learning rate schedules, the LR scaling is handles as below:
1. cosine:
end_value = scaled_lr * (end_value / start_value)
start_value = scaled_lr and
2. multistep:
gamma = values[1] / values[0]
values = [scaled_lr * pow(gamma, idx) for idx in range(len(values))]
3. step_with_fixed_gamma
base_value = scaled_lr
4. linear:
end_value = scaled_lr
5. inverse_sqrt:
start_value = scaled_lr
6. constant:
value = scaled_lr
7. composite:
recursively call to scale each composition. If the composition consists of a linear
schedule, we assume that a linear warmup is applied. If the linear warmup is
applied, it's possible the warmup is not necessary if the global batch_size is smaller
than the base_lr_batch_size and in that case, we remove the linear warmup from the
schedule.
Using PyTorch and VISSL Losses¶
VISSL supports all PyTorch loss functions and also implements several loss functions that are specific to self-supervised approaches like MoCo, PIRL, SwAV, SimCLR etc. Using any loss is very easy in VISSL and involves simply editing the configuration files to specify the loss name and the parameters of that loss. See all the VISSL custom losses here.
To use a certain loss, users need to simply set LOSS.name=<my_loss_name>
and set the parameter values that loss requires.
Examples:
Using Cross entropy loss for training and testing
LOSS:
name: "CrossEntropyLoss"
# ----------------------------------------------------------------------------------- #
# Standard PyTorch Cross-Entropy Loss. Use the loss name exactly as in PyTorch.
# pass any variables that the loss takes.
# ----------------------------------------------------------------------------------- #
CrossEntropyLoss:
ignore_index: -1
Using SwAV loss for training, sim
LOSS:
name: swav_loss
swav_loss:
temperature: 0.1
use_double_precision: False
normalize_last_layer: True
num_iters: 3
epsilon: 0.05
crops_for_assign: [0, 1]
temp_hard_assignment_iters: 0
num_crops: 2 # automatically inferred from data transforms
num_prototypes: [3000] # automatically inferred from model HEAD settings
embedding_dim: 128 # automatically inferred from HEAD params
# for dumping the debugging info in case loss becomes NaN
output_dir: "" # automatically inferred and set to checkpoint dir
queue:
local_queue_length: 0 # automatically inferred to queue_length // world_size
queue_length: 0 # automatically adjusted to ensure queue_length % global batch size = 0
start_iter: 0
Using Meters¶
VISSL supports PyTorch meters and implements some custom meters that like Mean Average Precision meter. Meters in VISSL support single target multiple outputs. This is especially useful and relvant during evaluation of self-supervised models where we want to calculate feature quality of several layers of the model. See all the VISSL custom meters here.
To use a certain meter, users need to simply set METERS.name=<my_meter_name>
and set the parameter values that meter requires.
Examples:
Using Accuracy meter to compute Top-k accuracy for training and testing
METERS:
name: accuracy_list_meter
accuracy_list_meter:
num_meters: 1 # number of outputs model has. also auto inferred
topk_values: [1, 5] # for each meter, what topk are computed.
Using Mean AP meter:
METERS:
name: mean_ap_list_meter
mean_ap_list_meter:
num_classes: 9605 # openimages v6 dataset classes
num_meters: 1
Hooks¶
Hooks are the helper functions that can be executed at several parts of a training process as described below:
on_start
: These hooks are executed before the training starts.on_phase_start
: executed at the beginning of every epoch (including test, train epochs)on_forward
: executed after every forward passon_loss_and_meter
: executed after loss and meters are calculatedson_backward
: executed after every backward pass of the modelon_update
: executed after model parameters are updated by the optimizeron_step
: executed after one single training (or test) iteration finisheson_phase_end
: executed after the epoch (train or test) finisheson_end
: executed at the very end of training.
Hooks are executed by inserting task.run_hooks(SSLClassyHookFunctions.<type>.name)
at several steps of the training.
How to enable certain hooks in VISSL¶
VISSL supports many hooks. Users can configure which hooks to use from simple configuration files. The hooks in VISSL can be categorized into following buckets:
Tensorboard hook
: to enable this hook, setTENSORBOARD_SETUP.USE_TENSORBOARD=true
and configure the tensorboard settingsModel Complexity hook
: this hook performs one single forward pass of the model on the synthetic input and computes the #FLOPs, #params and #activations in the model. To enable this hook, setMODEL.MODEL_COMPLEXITY.COMPUTE_COMPLEXITY=true
and configure it.Self-supervised Loss hooks
: VISSL has hooks specific to self-supervised approaches like MoCo, SwAV etc. These hooks are handy in performing some intermediate operations required in self-supervision. For example:MoCoHook
is called after every forward pass of the model and updates the momentum encoder network. Users don’t need to do anything special for using these hooks. If the user configuration file has the loss function for an approach, VISSL will automatically enable the hooks for the approach.Logging, checkpoint, training variable update hooks
: These hooks are used by default in VISSL and perform operations like logging the training progress (loss, LR, eta etc) on stdout, save checkpoints etc.
Using Data¶
To use a dataset in VISSL, the only requirements are:
the dataset name should be registered with
VisslDatasetCatalog
in VISSL. Only name is important and the paths are not. The paths can be specifed in the configuration file. Users can either edit thedataset_catalog.json
or specify the paths in the configuration file.the dataset should be from a supported data source.
Reading data from several sources¶
VISSL allows reading data from multiple sources (disk, etc) and in multiple formats (a folder path, a .npy
file).
The GenericSSLDataset class is defined to support reading data from multiple data sources. For example: data = [dataset1, dataset2]
and the minibatches generated will have the corresponding data from each dataset.
For this reason, we also support labels from multiple sources. For example targets = [dataset1 targets, dataset2 targets]
.
Source of the data (disk_filelist
| disk_folder
):
disk_folder
: this is simply the root folder path to the downloaded data.disk_filelist
: These are numpy (or .pkl) files: (1) file containing images information (2) file containing corresponding labels for images. We provide scripts that can be used to prepare these two files for a dataset of choice.
To use a dataset, VISSL takes following inputs in the configuration file for each dataset split (train, test):
DATASET_NAMES
: names of the datasets that are registered withVisslDatasetCatalog
. Registering dataset name is important. Example:DATASET_NAMES=[imagenet1k_folder, my_new_dataset_filelist]
DATA_SOURCES
: the sources of dataset. Options:disk_folder | disk_filelist
. This specifies where the data lives. Users can extend it for their purposes. ExampleDATA_SOURCES=[disk_folder, disk_filelist]
DATA_PATHS
: the paths to the dataset. The paths could be folder path (example Imagenet1k folder) or .npy filepaths. For the folder paths, VISSL usesImageFolder
from PyTorch. ExampleDATA_PATHS=[<imagenet1k_folder_path>, <numpy_file_path_for_new_dataset>]
LABEL_SOURCES
: just like images, the targets can also come from several sources. Example:LABEL_SOURCES=[disk_folder]
for Imagenet1k. Example:DATA_SOURCES=[disk_folder, disk_filelist]
LABEL_PATHS
: similar toDATA_PATHS
but for labels. ExampleLABEL_PATHS=[<imagenet1k_folder_path>, <numpy_file_path_for_new_dataset_labels>]
LABEL_TYPE
: choose fromstandard | sample_index
.sample_index
is a common practice in self-supervised learning andsample_index`=id of the sample in the data. :code:`standard
label type is used for supervised learning and user specifis the annotated labels to use.
Using dataset_catalog.json
¶
In order to use a dataset with VISSL, the dataset name must be registered with VisslDatasetCatalog
. VISSL maintains a dataset_catalog.json which is parsed by VisslDatasetCatalog
and the datasets
are registered with VISSL, ready-to-use.
Users can edit the template dataset_catalog.json file to specify their datasets paths. The json file can be fully decided by user and can have any number of supported datasets (one or more). User can give the string names to dataset as per their choice.
Template for a dataset entry in dataset_catalog.json
¶
"data_name": {
"train": [
"<images_path_or_folder>", "<labels_path_or_folder>"
],
"val": [
"<images_path_or_folder>", "<labels_path_or_folder>"
],
}
The images_path_or_folder
and labels_path_or_folder
can be directories or filepaths (numpy, pickle.)
User can mix match the source of image, labels i.e. labels can be filelist and images can be folder path. The yaml configuration files require specifying LABEL_SOURCES
and DATA_SOURCES
which allows the code to figure out how to ingest various sources.
Note
Filling the dataset_catalog.json
is a one time process only and provides the benefits of simply accessing any dataset with the dataset name in the configuration files for the rest of the trainings.
Using Builtin datasets¶
VISSL supports several Builtin datasets as indicated in the dataset_catalog.json
file. Users can specify paths to those datasets.
Expected dataset structure for ImageNet, Places205, Places365¶
{imagenet, places205, places365}
train/
<n0......>/
<im-1-name>.JPEG
...
<im-N-name>.JPEG
...
<n1......>/
<im-1-name>.JPEG
...
<im-M-name>.JPEG
...
...
val/
<n0......>/
<im-1-name>.JPEG
...
<im-N-name>.JPEG
...
<n1......>/
<im-1-name>.JPEG
...
<im-M-name>.JPEG
...
...
Expected dataset structure for Pascal VOC [2007, 2012]¶
VOC20{07,12}/
Annotations/
ImageSets/
Main/
trainval.txt
test.txt
JPEGImages/
Expected dataset structure for COCO2014¶
coco/
annotations/
instances_train2014.json
instances_val2014.json
train2014/
# image files that are mentioned in the corresponding json
val2014/
# image files that are mentioned in the corresponding json
Dataloader¶
VISSL uses PyTorch torch.utils.data.DataLoader
and allows setting all the dataloader option as below. The dataloader is wrapped with DataloaderAsyncGPUWrapper or DataloaderSyncGPUWrapper depending on whether user wants to copy data to gpu async or not.
The settings for the Dataloader
in VISSL are:
dataset (GenericSSLDataset): the dataset object for which dataloader is constructed
dataset_config (dict): configuration of the dataset. it should be DATA.TRAIN or DATA.TEST settings
num_dataloader_workers (int): number of workers per gpu (or cpu) training
pin_memory (bool): whether to pin memory or not
multi_processing_method (str): method to use. options: forkserver | fork | spawn
device (torch.device): training on cuda or cpu
get_sampler (get_sampler): function that is used to get the sampler
worker_init_fn (None default): any function that should be executed during initialization of dataloader workers
Using Data Collators¶
VISSL supports PyTorch default collator torch.utils.data.dataloader.default_collate
and also many custom data collators used in self-supervision. The use any collator, user has to simply specify the DATA.TRAIN.COLLATE_FUNCTION
to be the name of the collator to use. See all custom VISSL collators implemented here.
An example for specifying collator for SwAV training:
DATA:
TRAIN:
COLLATE_FUNCTION: multicrop_collator
Using Data Transforms¶
VISSL supports all PyTorch TorchVision
transforms as well as many transforms required by Self-supervised approaches including MoCo, SwAV, PIRL, SimCLR, BYOL, etc. Using Transforms is very intuitive and easy in VISSL. Users specify the list of transforms they want to apply on the data in the order of application.
This involves using the transform name and the key:value to specify the parameter values for the transform. See the full list of transforms implemented by VISSL here
An example of transform for SwAV:
DATA:
TRAIN:
TRANSFORMS:
- name: ImgPilToMultiCrop
total_num_crops: 6
size_crops: [224, 96]
num_crops: [2, 4]
crop_scales: [[0.14, 1], [0.05, 0.14]]
- name: RandomHorizontalFlip
p: 0.5
- name: ImgPilColorDistortion
strength: 1.0
- name: ImgPilGaussianBlur
p: 0.5
radius_min: 0.1
radius_max: 2.0
- name: ToTensor
- name: Normalize
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
Using Data Sampler¶
VISSL supports 2 types of samplers:
PyTorch default
torch.utils.data.distributed.DistributedSampler
VISSL sampler StatefulDistributedSampler that is written specifically for large scale dataset trainings. See the documentation for the sampler.
By default, the PyTorch default sampler is used unless user specifies DATA.TRAIN.USE_STATEFUL_DISTRIBUTED_SAMPLER=true
in which case StatefulDistributedSampler
will be used.
Add custom Train loop¶
VISSL implements a default training loop (single iteration step) that is used for self-supervised training of all VISSL reference approaches, for feature extraction and for supervised workflows. Users con implement their own training loop.
The training loop performs: data read, forward, loss computation, backward, optimizer step, parameter updates.
Various intermediate steps are also performed:
logging the training loss, training eta, LR, etc to loggers
logging to tensorboard,
performing any self-supervised method specific operations (like in MoCo approach, the momentum encoder is updated), computing the scores in swav
checkpointing model if user wants to checkpoint in the middle of an epoch
Users can implement their custom training loop by following the steps:
Step1: Create your
my_new_training_loop
module undervissl/trainer/train_steps/my_new_training_loop.py
following the template:
from vissl.trainer.train_steps import register_train_step
@register_train_step("my_new_training_loop")
def my_new_training_loop(task):
"""
add documentation on what this training loop does and how it varies from
standard training loop in vissl.
"""
# implement the training loop. It should take care of running the dataloader
# iterator to get the input sample
...
...
return task
Step2: New train loop is ready to use. Set the
TRAINER.TRAIN_STEP_NAME=my_new_training_loop
Add new Hooks¶
Hooks are the helper functions that can be executed at several parts of a training process as described below:
on_start
: These hooks are executed before the training starts.on_phase_start
: executed at the beginning of every epoch (including test, train epochs)on_forward
: executed after every forward passon_loss_and_meter
: executed after loss and meters are calculatedson_backward
: executed after every backward pass of the modelon_update
: executed after model parameters are updated by the optimizeron_step
: executed after one single training (or test) iteration finisheson_phase_end
: executed after the epoch (train or test) finisheson_end
: executed at the very end of training.
Hooks are executed by inserting task.run_hooks(SSLClassyHookFunctions.<type>.name)
at several steps of the training.
Users can add new hooks by following the steps below:
Step1: Create your new hook in
vissl/hooks/my_hook.py
following the template.
from classy_vision.hooks.classy_hook import ClassyHook
class MyAwesomeNewHook(ClassyHook):
"""
Logs the number of paramaters, forward pass FLOPs and activations of the model.
Adapted from: https://github.com/facebookresearch/ClassyVision/blob/master/classy_vision/hooks/model_complexity_hook.py#L20 # NOQA
"""
# define all the functions that your hook should execute. If the hook
# executes nothing for a particular function, mark it as a noop.
# Example: if the hook only does something for `on_start', then set:
# on_phase_start = ClassyHook._noop
# on_forward = ClassyHook._noop
# on_loss_and_meter = ClassyHook._noop
# on_backward = ClassyHook._noop
# on_update = ClassyHook._noop
# on_step = ClassyHook._noop
# on_phase_end = ClassyHook._noop
# on_end = ClassyHook._noop
def on_start(self, task: "tasks.ClassyTask") -> None:
# implement what your hook should execute at the beginning of training
...
def on_phase_start(self, task: "tasks.ClassyTask") -> None:
# implement what your hook should execute at the beginning of each epoch
# (training or test epoch). Use `task.train' boolean to detect if the current
# epoch is train or test
...
def on_forward(self, task: "tasks.ClassyTask") -> None:
# implement what your hook should execute after the model forward pass is done
# should handle the train or test phase.
# Use `task.train' boolean to detect if the current epoch is train or test
...
def on_loss_and_meter(self, task: "tasks.ClassyTask") -> None:
# implement what your hook should execute after the loss and meters are
# calculated
...
def on_backward(self, task: "tasks.ClassyTask") -> None:
# implement what your hook should execute after backward pass is done. Note
# that the model parameters are not yet updated
...
def on_update(self, task: "tasks.ClassyTask") -> None:
# implement what your hook should execute after the model parameters are updated
# by the optimizer following LR and weight decay
...
def on_step(self, task: "tasks.ClassyTask") -> None:
# implement what your hook should execute after a training / test iteration
# is done
...
def on_phase_end(self, task: "tasks.ClassyTask") -> None:
# implement what your hook should execute after a phase (train or test)
# is done
...
def on_end(self, task: "tasks.ClassyTask") -> None:
# implement what your hook should execute at the end of training
# (or testing, feature extraction)
...
Step2: Inform VISSL on how/when to use the hook in
default_hook_generator
method invissl/hooks/__init__.py
. We recommend adding some configuration params likeMONITOR_PERF_STATS
invissl/config/defaults.yaml
so that you can set the usage of hook easily from the config file.Step3: Test your hook is working by simply running a config file and setting the config parameters you added in Step2 above.
Add new Optimizers¶
VISSL makes it easy to add new optimizers. VISSL depends on ClassyVision .
Follow the steps below to add a new optimizer to VISSL.
Step1: Create your new optimizer
my_new_optimizer
undervissl/optimizers/my_new_optimizer.py
following the template:
from classy_vision.optim import ClassyOptimizer, register_optimizer
@register_optimizer("my_new_optimizer")
class MyNewOptimizer(ClassyOptimizer):
"""
Add documentation on how the optimizer optimizes and also
link to any papers or techincal reports that propose/use the
the optimizer (if applicable)
"""
def __init__(self, param1, param2, ...):
super().__init__()
# implement what the optimizer init should do and what variable it should store
...
def prepare(self, param_groups):
"""
Prepares the optimizer for training.
Deriving classes should initialize the underlying PyTorch
:class:`torch.optim.Optimizer` in this call. The param_groups argument
follows the same format supported by PyTorch (list of parameters, or
list of param group dictionaries).
Warning:
This should called only after the model has been moved to the correct
device (gpu or cpu).
"""
@classmethod
def from_config(cls, config: Dict[str, Any]) -> "SGD":
"""
Instantiates a MyNewOptimizer from a configuration.
Args:
config: A configuration for a MyNewOptimizer.
See :func:`__init__` for parameters expected in the config.
Returns:
A MyNewOptimizer instance.
"""
Step2: Enable the automatic import of all the modules. Add the following lines of code to
vissl/optimizers/__init__.py
. Skip this step if already exists.
from pathlib import Path
from classy_vision.generic.registry_utils import import_all_modules
FILE_ROOT = Path(__file__).parent
# automatically import any Python files in the optimizers/ directory
import_all_modules(FILE_ROOT, "vissl.optimizers")
Step3: Enable the registry of the new optimizers in VISSL. Add the following line to
vissl/trainer/__init__.py
. Skip this step if already exists.
import vissl.optimizers # NOQA
Step4: The optimizer is now ready to use. Set the configuration param
OPTIMIZER.name=my_new_optimizer
and set the values of the params this optimizer takes.
Add new LR schedulers¶
VISSL allows adding new Learning rate schedulers easily. Follow the steps below:
Step1: Create a class for your
my_new_lr_scheduler
undervissl/optimizers/param_scheduler/my_new_lr_scheduler.py
following the template:
from classy_vision.optim.param_scheduler import (
ClassyParamScheduler,
UpdateInterval,
register_param_scheduler,
)
@register_param_scheduler("my_new_lr_scheduler")
class MyNewLRScheduler(ClassyParamScheduler):
"""
Add documentation on how the LR schedulers works and also
link to any papers or techincal reports that propose/use the
the scheduler (if applicable)
Args:
document all the inputs that the scheduler takes
Example:
show one example of how to use the lr scheduler
"""
def __init__(
self, param1, param2, ... , update_interval: UpdateInterval = UpdateInterval.STEP
):
super().__init__(update_interval=update_interval)
# implement what the init of LR scheduler should do, any variables
# to initialize etc.
...
...
@classmethod
def from_config(cls, config: Dict[str, Any]) -> "MyNewLRScheduler":
"""
Instantiates a MyNewLRScheduler from a configuration.
Args:
config: A configuration for a MyNewLRScheduler.
See :func:`__init__` for parameters expected in the config.
Returns:
A MyNewLRScheduler instance.
"""
return cls(param1=config.param1, param2=config.param2, ...)
def __call__(self, where: float):
# implement what the LR value should be give the `where' which indicates
# how far the training is. `where' values are [0, 1)
...
...
return lr_value
Step2: The new LR scheduler is ready to use. Give it a try by setting configuration param
OPTIMIZER.param_schedulers.lr.name=my_new_lr_scheduler
.
Add new Losses to VISSL¶
VISSL allows adding new losses easily. Follow the steps below to add a new loss:
Step1: Create a new loss
my_new_loss
invissl/losses/my_new_loss.py
following the template
import pprint
from classy_vision.losses import ClassyLoss, register_loss
@register_loss("my_new_loss")
class MyNewLoss(ClassyLoss):
"""
Add documentation for what the loss does
Config params:
document what parameters should be expected for the loss in the defaults.yaml
and how to set those params
"""
def __init__(self, loss_config: AttrDict, device: str = "gpu"):
super(MyNewLoss, self).__init__()
self.loss_config = loss_config
# implement what the init function should do
...
@classmethod
def from_config(cls, loss_config: AttrDict):
"""
Instantiates MyNewLoss from configuration.
Args:
loss_config: configuration for the loss
Returns:
MyNewLoss instance.
"""
return cls(loss_config)
def __repr__(self):
# implement what information about loss params should be
# printed by print(loss). This is helpful for debugging
repr_dict = {"name": self._get_name(), ....}
return pprint.pformat(repr_dict, indent=2)
def forward(self, output, target):
# implement how the loss should be calculated. The output should be
# torch.Tensor or List[torch.Tensor] and target should be torch.Tensor
...
...
return loss
Step2: Register the loss and loss params with VISSL Configuration. Add the params that the loss takes in VISSL defaults.yaml as follows:
LOSS:
my_new_loss:
param1: value1
param2: value2
...
Step3: Loss is ready to use. Simply set the configuration param
LOSS.name=my_new_loss
Add new Meters¶
VISSL allows adding new meters easily. Follow the steps below to add a new loss:
Step1: Create a new loss
my_new_meter
invissl/meters/my_new_meter.py
following the template
from classy_vision.meters import ClassyMeter, register_meter
@register_meter("my_new_meter")
class MyNewMeter(ClassyMeter):
"""
Add docuementation on what this meter does
Args:
add documentation about each meter parameter
"""
def __init__(self, meters_config: AttrDict):
# implement what the init method should do like
# setting variable to update etc.
self.reset()
@classmethod
def from_config(cls, meters_config: AttrDict):
"""
Get the MyNewMeter instance from the user defined config
"""
return cls(meters_config)
@property
def name(self):
"""
Name of the meter
"""
return "my_new_meter"
@property
def value(self):
"""
Value of the meter which has been globally synced. This is the value printed and
recorded by user.
"""
# implement how the value should be calculated/finalized/returned to user
....
return {"my_metric_name": value, ....}
def sync_state(self):
"""
Globally syncing the state of each meter across all the trainers.
Should perform distributed communications like all_gather etc
to correctly gather the global values to compute the metric
"""
# implement what Communications should be done to globally sync the state
...
# update the meter variables to store these global gathered values
...
def reset(self):
"""
Reset the meter. Should reset all the meter variables, values.
"""
self._scores = torch.zeros(0, self.num_classes, dtype=torch.float32)
self._targets = torch.zeros(0, self.num_classes, dtype=torch.int8)
self._total_sample_count = torch.zeros(1)
self._curr_sample_count = torch.zeros(1)
def __repr__(self):
# implement what information about meter params should be
# printed by print(meter). This is helpful for debugging
return repr({"name": self.name, "value": self.value})
def set_classy_state(self, state):
"""
Set the state of meter. This is the state loaded from a checkpoint when the model
is resumed
"""
# implement how to set the state of the meter
....
def get_classy_state(self):
"""
Returns the states of meter that will be checkpointed. This should include
the variables that are global, updated and affect meter value.
"""
return {
"name": self.name,
...
}
def update(self, model_output, target):
"""
Update the meter every time meter is calculated
"""
# implement how to update the meter values
...
def validate(self, model_output, target):
"""
Validate that the input to meter is valid
"""
# implement how to enforce the validity of the meter inputs
....
Step2: Register the meter and meter params with VISSL Configuration. Add the params that the meter takes in VISSL defaults.yaml as follows:
METERS:
my_new_meter:
param1: value1
param2: value2
...
Step3: Meter is ready to use. Simply set the configuration param
METERS.name=my_new_meter
Add new Models¶
VISSL allows adding new models (head and trunks easily) and combine different trunks and heads to train a new model. Follow the steps below on how to add new heads or trunks.
Adding New Heads¶
To add a new model head, follow the steps:
Step1: Add the new head
my_new_head
undervissl/models/heads/my_new_head.py
following the template:
import torch
import torch.nn as nn
from vissl.models.heads import register_model_head
@register_model_head("my_new_head")
class MyNewHead(nn.Module):
"""
Add documentation on what this head does and also link any papers where the head is used
"""
def __init__(self, model_config: AttrDict, param1: val, ....):
"""
Args:
add documentation on what are the parameters to the head
"""
super().__init__()
# implement what the init of head should do. Example, it can construct the layers in the head
# like FC etc., initialize the parameters or anything else
....
# the input to the model should be a torch Tensor or list of torch tensors.
def forward(self, batch: torch.Tensor or List[torch.Tensor]):
"""
add documentation on what the head input structure should be, shapes expected
and what the output should be
"""
# implement the forward pass of the head
Step2: The new head is ready to use. Test it by setting the new head in the configuration file.
MODEL:
HEAD:
PARAMS: [
...
["my_new_head", {"param1": val, ...}]
...
]
Adding New Trunks¶
To add a new trunk (a new architecture like vision transformers, etc.), follow the steps:
Step1: Add your new trunk
my_new_trunk
undervissl/data/trunks/my_new_trunk.py
following the template:
import torch
import torch.nn as nn
from vissl.models.trunks import register_model_trunk
@register_model_trunk("my_new_trunk")
class MyNewTrunk(nn.Module):
"""
documentation on what the trunk does and links to technical reports
using this trunk (if applicable)
"""
def __init__(self, model_config: AttrDict, model_name: str):
super(MyNewTrunk, self).__init__()
self.model_config = model_config
# get the params trunk takes from the config
trunk_config = self.model_config.TRUNK.TRUNK_PARAMS.MyNewTrunk
# implement the model trunk and construct all the layers that the trunk uses
model_layer1 = ??
model_layer2 = ??
...
...
# give a name to the layers of your trunk so that these features
# can be used for other purposes: like feature extraction etc.
# the name is fully upto user descretion. User may chose to
# only name one layer which is the last layer of the model.
self._feature_blocks = nn.ModuleDict(
[
("my_layer1_name", model_layer1),
("my_layer1_name", model_layer2),
...
]
)
def forward(
self, x: torch.Tensor, out_feat_keys: List[str] = None
) -> List[torch.Tensor]:
# implement the forward pass of the model. See the forward pass of resnext.py
# for reference.
# The output would be a list. The list can have one tensor (the trunk output)
# or mutliple tensors (corresponding to several features of the trunk)
...
...
return output
Step2: Inform VISSL about the parameters of the trunk. Register the params with VISSL Configuration by adding the params in VISSL defaults.yaml as follows:
MODEL:
TRUNK:
MyNewTrunk:
param1: value1
param2: value2
...
Step3: The trunk is ready to use. Set the trunk name and params in your config file
MODEL.TRUNK.NAME=my_new_trunk
Using Custom Datasets¶
VISSL allows adding custom datasets easily. Using a new custom dataset has 2 requirements:
Requirement1: The dataset name must be registered with
VisslDatasetCatalog
.Requirement2: Users should ensure that the data source is supported by VISSL. By default, VISSL supports reading data from disk. If user data is loaded from a different data source, please add the new data source following the documentation.
Follow the steps below to register and use the new dataset:
Step1: Register the dataset with VISSL. Given user dataset with dataset name
my_new_dataset_name
and path to the dataset train and test splits, users can register the dataset following:
from vissl.data.dataset_catalog import VisslDatasetCatalog
VisslDatasetCatalog.register_data(name="my_new_dataset_name", data_dict={"train": ... , "test": ...})
Note
VISSL also supports registering the dataset via a custom json
file, or registering dict
with bunch of datasets.
Step2 (Optional): If the dataset requires a new data source other than disk or supported disk formats (
disk_folder
ordisk_filelist
), please add the new data source to VISSL. Follow our documentation on Adding new dataset.Step3: Test your dataset
DATA:
TRAIN:
DATA_SOURCES: [my_data_source]
DATASET_NAMES: [my_new_dataset_name]
Add new Data Source¶
VISSL supports data loading from disk
as the default data source. If users dataset lives in their custom data storage solution my_data_source
instead of disk
, then users can extend VISSL to work with their data storage. Follow the steps below:
Step1: Implement your custom data source under
vissl/data/my_data_source.py
following the template:
from vissl.data.data_helper import get_mean_image
from torch.utils.data import Dataset
class MyNewSourceDataset(Dataset):
"""
add documentation on how this dataset works
Args:
add docstrings for the parameters
"""
def __init__(self, cfg, data_source, path, split, dataset_name):
super(MyNewSourceDataset, self).__init__()
assert data_source in [
"disk_filelist",
"disk_folder",
"my_data_source"
], "data_source must be either disk_filelist or disk_folder or my_data_source"
self.cfg = cfg
self.split = split
self.dataset_name = dataset_name
self.data_source = data_source
self._path = path
# implement anything that data source init should do
....
....
self._num_samples = ?? # set the length of the dataset
def num_samples(self):
"""
Size of the dataset
"""
return self._num_samples
def __len__(self):
"""
Size of the dataset
"""
return self.num_samples()
def __getitem__(self, idx: int):
"""
implement how to load the data corresponding to idx element in the dataset
from your data source
"""
....
....
# is_success should be True or False indicating whether loading data was successful or failed
# loaded data should be Image.Image if image data
return loaded_data, is_success
Step2: Register the new data source with VISSL. Extend the
DATASET_SOURCE_MAP
dict invissl/data/__init__.py
.
DATASET_SOURCE_MAP = {
"disk_filelist": DiskImageDataset,
"disk_folder": DiskImageDataset,
"synthetic": SyntheticImageDataset,
"my_data_source": MyNewSourceDataset,
}
Step3: Register the name of the datasets you plan to load using the new data source. There are 2 ways to do this:
See our documentation on “Using dataset_catalog.json” to update the
configs/dataset_catalog.json
file.Insert a python call following:
# insert the following call in your python code from vissl.data.dataset_catalog import VisslDatasetCatalog VisslDatasetCatalog.register_data(name="my_dataset_name", data_dict={"train": ... , "test": ...})
Step4: Test using your dataset
DATA:
TRAIN:
DATA_SOURCES: [my_data_source]
DATASET_NAMES: [my_dataset_name]
Add new Dataloader¶
VISSL currently supports PyTorch torch.utils.data.DataLoader
. If users would like to add a custom dataloader
of their own, we recommend the following steps.
Step1: Create your custom dataloader class
MyNewDataLoader
invissl/data/my_loader.py
. The Dataloader should implement all the variables and member that PyTorch Dataloader uses.Step2: Import your new
MyNewDataLoader
invissl/data/__init__.py
and extend the functionget_loader(...)
to use yourMyNewDataLoader
. To control this from configuration file, we recommend users to add some config file options invissl/defaults.yaml
file underDATA.TRAIN.dataloader_name
.
We welcome PRs following our Contributing guidelines.
Add new Data Transforms¶
Adding new transforms and using them is quite easy in VISSL. Follow the steps below:
Step1: Create your transform under
vissl/data/ssl_transforms/my_new_transform.py
. The transform should follow the template:
@register_transform("MyNewTransform")
class MyNewTransform(ClassyTransform):
"""
add documentation for what your transform does
"""
def __init__(self, param1, param2, ...):
"""
Args:
param1: add doctring
param2: add doctring
...
"""
self.param1 = param1
self.param2 = param2
# implement anything that the transform init should do
...
# the input image should either be Image.Image PIL instance or torch.Tensor
def __call__(self, image: {Image.Image or torch.Tensor}):
# implement the transformation logic code.
return img
@classmethod
def from_config(cls, config: Dict[str, Any]) -> "MyNewTransform":
"""
Instantiates MyNewTransform from configuration.
Args:
config (Dict): arguments for for the transform
Returns:
MyNewTransform instance.
"""
param1 = config.param1
param2 = config.param2
....
return cls(param1=param1, param2=param2, ...)
Step2: Use your transform in the config file by editing the
DATA.TRAIN.TRANSFORMS
value:
DATA:
TRANSFORMS:
...
...
- name: MyNewTransform
param1: value1
param2: value2
....
....
Add new Data Collators¶
VISSL allows implementing new data collators easily. Follow the steps below:
Step1: Create the new data collator
my_new_collator.py
undervissl/data/collators/my_new_collator.py
following the template.
import torch
from vissl.data.collators import register_collator
@register_collator("my_new_collator")
def my_new_collator(batch, param1 (Optional), ...):
"""
add documentation on what new collator does
Input:
add documentation on what input type should the collator expect. i.e
what should the `batch' look like.
Output:
add documentation on what the collator returns i.e. what does the
collated data `output_batch' look like.
"""
# implement the collator
...
...
output_batch = {
"data": ... ,
"label": ... ,
"data_valid": ... ,
"data_idx": ... ,
}
return output_batch
Step2: Use your new collator via the configuration files
DATA:
TRAIN:
COLLATE_FUNCTION: my_new_collator
COLLATE_FUNCTION_PARAMS: {...} # optional, specify params if collator requires any
Activation checkpointing to reduce model memory¶
Authors: m1n@fb.com, lefaudeux@fb.com
Activation checkpointing is a very powerful technique to reduce the memory requirement of a model. This is especially useful when training very large models with billions of parameters.
How it works?¶
Activation checkpointing trades compute for memory. It discards intermediate activations during the forward pass, and recomputes them during the backward pass. In our experiments, using activation checkpointing, we observe negligible compute overhead in memory-bound settings while getting big memory savings.
In summary, This technique offers 2 benefits:
saves gpu memory that can be used to fit large models
allows increasing training batch size for a given model
We recommend users to read the documentation available here for further details on activation checkpointing.
How to use activation checkpointing in VISSL?¶
VISSL integrates activation checkpointing implementation directly from PyTorch available here. Using activation checkpointing in VISSL is extremely easy and doable with simple settings in the configuration file. The settings required are as below:
MODEL:
ACTIVATION_CHECKPOINTING:
# whether to use activation checkpointing or not
USE_ACTIVATION_CHECKPOINTING: True
# how many times the model should be checkpointed. User should tune this parameter
# and find the number that offers best memory saving and compute tradeoff.
NUM_ACTIVATION_CHECKPOINTING_SPLITS: 8
DISTRIBUTED:
# if True, does the gradient reduction in DDP manually. This is useful during the
# activation checkpointing and sometimes saving the memory from the pytorch gradient
# buckets.
MANUAL_GRADIENT_REDUCTION: True
LARC for Large batch size training¶
What is LARC¶
LARC (Large Batch Training of Convolutional Networks) is a technique proposed by Yang You, Igor Gitman, Boris Ginsburg in https://arxiv.org/abs/1708.03888 for improving the convergence of large batch size trainings. LARC uses the ratio between gradient and parameter magnitudes is used to calculate an adaptive local learning rate for each individual parameter.
See the LARC paper for calculation of learning rate. In practice, it modifies the gradients of parameters as a proxy for modifying the learning rate of the parameters.
How to enable LARC¶
VISSL supports the LARC implementation from NVIDIA’s Apex LARC. To use LARC, users need to set config option
OPTIMIZER.use_larc=True
. VISSL exposes LARC parameters that users can tune. Full list of LARC parameters exposed by VISSL:
OPTIMIZER:
name: "sgd"
use_larc: False # supported for SGD only for now
larc_config:
clip: False
eps: 1e-08
trust_coefficient: 0.001
Note
LARC is currently supported for SGD optimizer only.
Using Apex¶
In order to use Apex, VISSL provides anaconda
and pip
packages of Apex (compiled with Optimzed C++ extensions/CUDA kernels). The Apex
packages are provided for all versions of CUDA (9.2, 10.0, 10.1, 10.2, 11.0), PyTorch >= 1.4 and Python >=3.6 and <=3.9
.
Follow VISSL’s instructions to install apex in pip and instructions to install apex in conda.
Handling invalid images in dataloader¶
How VISSL solves it¶
Self-supervised approaches like SimCLR, SwAV etc that perform some form of contrastive learning contrast the features or cluster of one image with the other. During the dataloading time, or in the training dataset itself, it’s possible that there are invalid images. By default, in VISSL, when the dataloader encounters an invalid image, a gray image is returned instead. Using gray images for the purpose of contrastive learning can lead to inferior model accuracy especially if there are a lot of invalid images.
To solve this issue, VISSL provides a custom base dataset class called QueueDataset
that maintains 2 queues in CPU memory. One queue is used to enqueue valid seen images from previous minibatches and the other queue is used to dequeue. The code:QueueDataset is implemented such that the same minibatch will never have the duplicate images. If we can’t dequeue a valid image, we return None from the dequeue.
In short, code:QueueDataset enables using the previously used valid images from the training in the current minibatch in place of invalid images.
Enabling QueueDataset¶
VISSL makes it convenient for users to use the code:QueueDataset with simple configuration settings. To use the code:QueueDataset, users
need to set DATA.TRAIN.ENABLE_QUEUE_DATASET=true
and DATA.TEST.ENABLE_QUEUE_DATASET=true
.
Tuning the queue size of QueueDataset¶
VISSL exposes the queue settings to configuration file that users can tune. The configuration settings are:
DATA:
TRAIN:
ENABLE_QUEUE_DATASET: True
TEST:
ENABLE_QUEUE_DATASET: True
Note
If users encounter CPU out-of-memory issue, they might want to reduce the queue size
Resume training from iteration: Stateful data sampler¶
Issue with PyTorch DataSampler for large data training¶
PyTorch default torch.utils.data.distributed.DistributedSampler is the default sampler used for many trainings. However, it becomes limiting to use this sampler in case of large batch size trainings for 2 reasons:
Using PyTorch
DataSampler
, each trainer shuffles the full data (assuming shuffling is used) and then each trainer gets a view of this shuffled data. If the dataset is large (100 millions, 1 billion or more), generating very large permutationon each trainer can lead to large CPU memory consumption per machine. Hence, it becomes difficult to use the PyTorch defaultDataSampler
when user wants to train on large data and for several epochs (for example: 10 epochs of 100M images).When using PyTorch
DataSampler
and the training is resumed, the sampler will serve the full dataset. However, in case of large data trainings (like 1 billion images or more), one mostly trains for 1 epoch only. In such cases, when the training resumes from the middle of the epoch, the sampler will serve the full 1 billion images which is not what we want.
To solve both the above issues, VISSL provides a custom samplier StatefulDistributedSampler
which inherits from the PyTorch DistributedSampler and fixes the above issues in following manner:
Sampler creates the view of the data per trainer and then shuffles only the data that trainer is supposed to view. This keeps the CPU memory requirement expected.
Sampler adds a member
start_iter
which tracks what iteration number of the given epoch model is at. When the training is used, thestart_iter
will be properly set to the last iteration number and the sampler will serve only the remainder of data.
How to use VISSL custom DataSampler¶
Using VISSL provided custom samplier StatefulDistributedSampler
is extremely easy and involves simply setting the correct configuration options as below:
DATA:
TRAIN:
USE_STATEFUL_DISTRIBUTED_SAMPLER: True
TEST:
USE_STATEFUL_DISTRIBUTED_SAMPLER: True
Note
Users can use StatefulDistributedSampler
for only training dataset and use PyTorch default DataSampler
if desired i.e. it is not mandatory to use the same sampler type for all data splits.
Mixed precision training (fp16)¶
Many self-supervised approaches leverage mixed precision training by default for better training speed and reducing the model memory requirement. For this, we use NVIDIA Apex Library with AMP.
Users can tune the AMP level to the levels supported by NVIDIA. See this for details on Apex amp levels.
To use Mixed precision training, one needs to set the following parameters in configuration file:
MODEL:
AMP_PARAMS:
USE_AMP: True
# Use O1 as it is robust and stable than O3. If you want to use O3, we recommend
# the following setting:
# {"opt_level": "O3", "keep_batchnorm_fp32": True, "master_weights": True, "loss_scale": "dynamic"}
AMP_ARGS: {"opt_level": "O1"}
Using Apex¶
In order to use Apex, VISSL provides anaconda
and pip
packages of Apex (compiled with Optimzed C++ extensions/CUDA kernels). The Apex
packages are provided for all versions of CUDA (9.2, 10.0, 10.1, 10.2, 11.0), PyTorch >= 1.4 and Python >=3.6 and <=3.9
.
Follow VISSL’s instructions to install apex in pip and instructions to install apex in conda.
Train on multiple-gpus¶
VISSL supports training any model on 1-gpu or more. Typically, a single machine can have 2, 4 or 8-gpus. If users want to train on >1 gpus within the single machine, it’s very easy.
Typically for single machine training, this involves correctly setting the number of gpus to use via DISTRIBUTED.NUM_PROC_PER_NODE
.
The config will look like:
DISTRIBUTED:
BACKEND: nccl # set to "gloo" if desired
NUM_NODES: 1 # no change needed
NUM_PROC_PER_NODE: 2 # user sets this to number of gpus to use
INIT_METHOD: tcp # set to "file" if desired
RUN_ID: auto # Set to file_path if using file method. No change needed for tcp and a free port on machine is automatically detected.
The list of all the options exposed by VISSL:
DISTRIBUTED:
# backend for communication across gpus. Use nccl by default. For cpu training, set
# "gloo" as the backend.
BACKEND: "nccl"
# whether to output the NCCL info during training. This allows to debug how
# nccl communication is configured.
NCCL_DEBUG: False
# tuning parameter to speed up all reduce by specifying number of nccl threads to use.
# by default, we use whatever the default is set by nccl or user system.
NCCL_SOCKET_NTHREADS: ""
# whether model buffers are BN buffers are broadcast in every forward pass
BROADCAST_BUFFERS: True
# number of machines to use in training. Each machine can have many gpus. NODES count
# number of unique hosts.
NUM_NODES: 1
# set this to the number of gpus per machine. This ensrures that each gpu of the
# node has a process attached to it.
NUM_PROC_PER_NODE: 8
# this could be: tcp | env | file or any other pytorch supported methods
INIT_METHOD: "tcp"
# every training run should have a unique id. Following are the options:
# 1. If using INIT_METHOD=env, RUN_ID="" is fine.
# 2. If using INIT_METHOD=tcp,
# - if you use > 1 machine, set port yourself. RUN_ID="localhost:{port}".
# - If using 1 machine, set RUN_ID=auto and a free port will be automatically selected
# 3. IF using INIT_METHOD=file, RUN_ID={file_path}
RUN_ID: "auto"
Train on multiple machines¶
VISSL allows scaling a training beyond 1-machine in order to speed up training. VISSL makes it extremely easy to scale up training. Typically for single machine training, this involves correctly setting the following options:
Number of gpus to use
Number of nodes
INIT_METHOD for PyTorch distributed training which determines how gpus will communicate for all reduce operations.
Putting togethe the above, if user wants to train on 2 machines where each machine has 8-gpus, the config will look like:
DISTRIBUTED:
BACKEND: nccl
NUM_NODES: 2 # user sets this to number of machines to use
NUM_PROC_PER_NODE: 8 # user sets this to number of gpus to use per machine
INIT_METHOD: tcp # recommended if feasible otherwise
RUN_ID: localhost:{port} # select the free port
The list of all the options exposed by VISSL:
DISTRIBUTED:
# backend for communication across gpus. Use nccl by default. For cpu training, set
# "gloo" as the backend.
BACKEND: "nccl"
# whether to output the NCCL info during training. This allows to debug how
# nccl communication is configured.
NCCL_DEBUG: False
# tuning parameter to speed up all reduce by specifying number of nccl threads to use.
# by default, we use whatever the default is set by nccl or user system.
NCCL_SOCKET_NTHREADS: ""
# whether model buffers are BN buffers are broadcast in every forward pass
BROADCAST_BUFFERS: True
# number of machines to use in training. Each machine can have many gpus. NODES count
# number of unique hosts.
NUM_NODES: 1
# set this to the number of gpus per machine. This ensrures that each gpu of the
# node has a process attached to it.
NUM_PROC_PER_NODE: 8
# this could be: tcp | env | file or any other pytorch supported methods
INIT_METHOD: "tcp"
# every training run should have a unique id. Following are the options:
# 1. If using INIT_METHOD=env, RUN_ID="" is fine.
# 2. If using INIT_METHOD=tcp,
# - if you use > 1 machine, set port yourself. RUN_ID="localhost:{port}".
# - If using 1 machine, set RUN_ID=auto and a free port will be automatically selected
# 3. IF using INIT_METHOD=file, RUN_ID={file_path}
RUN_ID: "auto"
Using SLURM¶
VISSL supports SLURM by default for training models. VISSL code automatically detects if the training environment is SLURM based on SLURM environment variables like SLURM_NODEID
, SLURMD_NODENAME
, SLURM_STEP_NODELIST
.
VISSL also provides a helper bash script dev/launch_slurm.sh that allows launching a given training on SLURM. Users can modify this script to meet their needs.
The bash script takes the following inputs:
# number of machines to distribute training on
NODES=${NODES-1}
# number of gpus per machine to use for training
NUM_GPU=${NUM_GPU-8}
# gpus type: P100 | V100 | V100_32G etc. User should set this based on their machine
GPU_TYPE=${GPU_TYPE-V100}
# name of the training. for example: simclr_2node_resnet50_in1k. This is helpful to clearly recognize the training
EXPT_NAME=${EXPT_NAME}
# how much CPU memory to use
MEM=${MEM-250g}
# number of CPUs used for each trainer (i.e. each gpu)
CPU=${CPU-8}
# directory where all the training artifacts like checkpoints etc will be written
OUTPUT_DIR=${OUTPUT_DIR}
# partition of the cluster on which training should run. User should determine this parameter for their cluster
PARTITION=${PARTITION-learnfair}
# any helpful comment that slurm dashboard can display
COMMENT=${COMMENT-vissl_training}
GITHUB_REPO=${GITHUB_REPO-vissl}
# what branch of VISSL should be used. specify your custom branch
BRANCH=${BRANCH-master}
# automatically determined and used for distributed training.
# each training run must have a unique id and vissl defaults to date
RUN_ID=$(date +'%Y%m%d')
# number of dataloader workers to use per gpu
NUM_DATA_WORKERS=${NUM_DATA_WORKERS-8}
# multi-processing method to use in PyTorch. Options: forkserver | fork | spawn
MULTI_PROCESSING_METHOD=${MULTI_PROCESSING_METHOD-forkserver}
# specify the training configuration to run. For example: to train swav for 100epochs
# config=pretrain/swav/swav_8node_resnet config.OPTIMIZER.num_epochs=100
CFG=( "$@" )
To run the script for training SwAV on 8 machines where each machine has 8-gpus and for 100epochs, the script can be run as:
cd $HOME/vissl && NODES=8 \
NUM_GPU=8 \
GPU_TYPE=V100 \
MEM=200g \
CPU=8 \
EXPT_NAME=swav_100ep_rn50_in1k \
OUTPUT_DIR=/tmp/swav/ \
PARTITION=learnfair \
BRANCH=master \
NUM_DATA_WORKERS=4 \
MULTI_PROCESSING_METHOD=forkserver \
./dev/launch_slurm.sh \
config=pretrain/swav/swav_8node_resnet config.OPTIMIZER.num_epochs=100
ZeRO: Optimizer state and gradient sharding¶
Author: lefaudeux@fb.com
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models is a technique developed by Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He in this paper. When training models with billions of parameters, GPU memory becomes a bottleneck. ZeRO can offer 4x to 8x memory reductions in memory thus allowing to fit larger models in memory.
How ZeRO works?¶
Memory requirement of a model can be broken down roughly into:
activations memory
model parameters
parameters momentum buffers (optimizer state)
parameters gradients
ZeRO shards the optimizer state and the parameter gradients onto different devices and reduces the memory needed per device.
How to use ZeRO in VISSL?¶
VISSL uses FAIRScale library which implements ZeRO in PyTorch. Using VISSL in ZeRO involves no code changes and can simply be done by setting some configuration options in the yaml files.
In order to use ZeRO, user needs to set OPTIMIZER.name=zero
and nest the desired optimizer (for example SGD) settings in OPTIMIZER.base_optimizer
.
An example for using ZeRO with LARC and SGD optimization:
OPTIMIZER:
name: zero
base_optimizer:
name: sgd
use_larc: False
larc_config:
clip: False
trust_coefficient: 0.001
eps: 0.00000001
weight_decay: 0.000001
momentum: 0.9
nesterov: False
Note
ZeRO works seamlessly with LARC and mixed precision training. Using ZeRO with activation checkpointing is not yet enabled primarily due to manual gradient reduction need for activation checkpointing.
API Documentation¶
vissl.data package¶
-
class
vissl.data.
GenericSSLDataset
(cfg, split, dataset_source_map)[source]¶ Bases:
torch.utils.data.dataset.Dataset
Base Self Supervised Learning Dataset Class.
The GenericSSLDataset class is defined to support reading data from multiple data sources. For example: data = [dataset1, dataset2] and the minibatches generated will have the corresponding data from each dataset.
For this reason, we also support labels from multiple sources. For example targets = [dataset1 targets, dataset2 targets].
In order to support multiple data sources, the dataset configuration always has list inputs.
DATA_SOURCES, LABEL_SOURCES, DATASET_NAMES, DATA_PATHS, LABEL_PATHS
For several data sources, we also support specifying on what dataset the transforms should be applied. By default, apply the transforms on data from all datasets.
- Parameters
cfg (AttrDict) – configuration defined by user
split (str) – the dataset split for which we are constructing the Dataset object
dataset_source_map (Dict[str, Callable]) –
The dictionary that maps what data sources are supported and what object to use to read data from those sources. For example: DATASET_SOURCE_MAP = {
”disk_filelist”: DiskImageDataset, “disk_folder”: DiskImageDataset, “synthetic”: SyntheticImageDataset,
}
-
load_single_label_file
(path)[source]¶ Load the single data file. We only support user specifying the numpy label files if user is specifying a data_filelist source of labels.
To save memory, if the mmap_mode is set to True for loading, we try to load the images in mmap_mode. If it fails, we simply load the labels without mmap
-
__getitem__
(idx)[source]¶ Get the input sample for the minibatch for a specified data index. For each data object (if we are loading several datasets in a minibatch), we get the sample: consisting of {
image data,
label (if applicable) otherwise idx
data_valid: 0 or 1 indicating if the data is valid image
data_idx : index of the data in the dataset for book-keeping and debugging
}
Once the sample data is available, we apply the data transform on the sample.
The final transformed sample is returned to be added into the minibatch.
-
get_image_paths
()[source]¶ Get the image paths for all the data sources.
- Returns
image_paths (List[List[str]]) –
- list containing image paths list for each
data source.
-
get_available_splits
(dataset_config)[source]¶ Get the available splits in the dataset confir. Not specific to this split for which the SSLDataset is being constructed.
NOTE: this is deprecated method.
-
vissl.data.
get_data_files
(split, dataset_config)[source]¶ - Get the path to the dataset (images and labels).
If the user has explicitly specified the data_sources, we simply use those and don’t do lookup in the datasets registered with VISSL from the dataset catalog.
If the user hasn’t specified the path, look for the dataset in the datasets catalog registered with VISSL. For a given list of datasets and a given partition (train/test), we first verify that we have the dataset and the correct source as specified by the user. Then for each dataset in the list, we get the data path (make sure it exists, sources match). For the label file, the file is optional.
Once we have the dataset original paths, we replace the path with the local paths if the data was copied to local disk.
-
vissl.data.
register_datasets
(json_catalog_path)[source]¶ If the json dataset_catalog file is found, we register the datasets specified in the catalog with VISSL. If the catalog also specified VOC or coco datasets, we resister them
- Parameters
json_catalog_path (str) – the path to the json dataset catalog
-
class
vissl.data.
VisslDatasetCatalog
[source]¶ Bases:
object
A catalog that stores information about the datasets and how to obtain them. It contains a mapping from strings (which are names that identify a dataset, e.g. “imagenet1k”) to a dict which contains:
mapping of various data splits (train, test, val) to the data source (path on the disk whether a folder path or a filelist)
source of the data (disk_filelist | disk_folder)
The purpose of having this catalog is to make it easy to choose different datasets, by just using the strings in the config.
-
static
register_json
(json_catalog_path)[source]¶ - Parameters
filepath – a .json filepath that contains the data to be registered
-
static
register_dict
(dict_catalog)[source]¶ - Parameters
dict – a dict with a bunch of datasets to be registered
-
static
register_data
(name, data_dict)[source]¶ - Parameters
name (str) – the name that identifies a dataset, e.g. “imagenet1k_folder”.
func (callable) – a callable which takes no arguments and returns a list of dicts. It must return the same results if called multiple times.
vissl.data.collators module¶
-
vissl.data.collators.
register_collator
(name)[source]¶ Registers Self-Supervision data collators.
This decorator allows VISSL to add custom data collators, even if the collator itself is not part of VISSL. To use it, apply this decorator to a collator function, like this:
@register_collator('my_collator_name') def my_collator_name(): ...
To get a collator from a configuration file, see
get_collator()
.
vissl.data.collators.mixup_collator module¶
-
vissl.data.collators.mixup_collator.
multicrop_mixup_collator
(batch)[source]¶ This collator is used to mix-up 2 images at a time. 2*N input images becomes N images This collator can handle multi-crop input. For each crop, it mixes-up the corresponding crop of the next image.
- Input:
- batch: Example
- batch = [
{“data” : [img1_0, …, img1_k], ..}, {“data” : [img2_0, …, img2_k], …}, … {“data” : [img2N_0, …, img2N_k], …},
]
- Returns: Example output:
- output = [
- {
- “data”: [
torch.tensor([img1_2_0, …, img1_2_k]), torch.tensor([img3_4_0, …, img3_4_k]) …
]
},
]
vissl.data.collators.moco_collator module¶
-
vissl.data.collators.moco_collator.
moco_collator
(batch: List[Dict[str, Any]]) → Dict[str, List[torch.Tensor]][source]¶ This collator is specific to MoCo approach http://arxiv.org/abs/1911.05722
The collators collates the batch for the following input (assuming k-copies of image):
- Input:
- batch: Example
- batch = [
{“data” : [img1_0, …, img1_k], ..}, {“data” : [img2_0, …, img2_k], …}, …
]
- Returns: Example output:
- output = [
- {
“data”: torch.tensor([img1_0, …, img1_k], [img2_0, …, img2_k]) ..
},
]
Dimensions become [num_positives x Batch x C x H x W]
vissl.data.collators.multicrop_collator module¶
-
vissl.data.collators.multicrop_collator.
multicrop_collator
(batch)[source]¶ This collator is used in SwAV approach.
The collators collates the batch for the following input (assuming k-copies of image):
- Input:
- batch: Example
- batch = [
{“data” : [img1_0, …, img1_k], ..}, {“data” : [img2_0, …, img2_k], …}, …
]
- Returns: Example output:
- output = [
- {
“data”: torch.tensor([img1_0, …, imgN_0], [img1_k, …, imgN_k]) ..
},
]
vissl.data.collators.patch_and_image_collator module¶
-
vissl.data.collators.patch_and_image_collator.
patch_and_image_collator
(batch)[source]¶ This collator is used in PIRL approach.
- batch contains two keys “data” and “label”.
data is a list of N+1 elements. 1st element is the “image” and remainder N are patches.
label is an integer (image index in the dataset)
- We collate this to
image: batch_size tensor containing images patches: N * batch_size tensor containing patches
vissl.data.collators.siamese_collator module¶
-
vissl.data.collators.siamese_collator.
siamese_collator
(batch)[source]¶ This collator is used in Jigsaw approach.
- Input:
- batch: Example
- batch = [
{“data”: [img1,], “label”: [lbl1, ]}, #img1 {“data”: [img2,], “label”: [lbl2, ]}, #img2 . . {“data”: [imgN,], “label”: [lblN, ]}, #imgN
]
- where:
img{x} is a tensor of size: num_towers x C x H x W lbl{x} is an integer
- Returns: Example output:
- output = [
- {
“data”: torch.tensor([img1_0, …, imgN_0]) ..
},
] where the output is of dimension: (N * num_towers) x C x H x W
vissl.data.collators.simclr_collator module¶
-
vissl.data.collators.simclr_collator.
simclr_collator
(batch)[source]¶ This collator is used in SimCLR approach.
- The collators collates the batch for the following input (each image has k-copies):
input: [[img1_0, …, img1_k], [img2_0, …, img2_k], …, [imgN_0, …, imgN_k]] output: [img1_0, img2_0, ….., img1_1, img2_1,…]
- Input:
- batch: Example
- batch = [
{“data”: [img1_0, …, img1_k], “label”: [lbl1, ]}, #img1 {“data”: [img2_0, …, img2_k], “label”: [lbl2, ]}, #img2 . . {“data”: [imgN_0, …, imgN_k], “label”: [lblN, ]}, #imgN
]
- where:
img{x} is a tensor of size: C x H x W lbl{x} is an integer
- Returns: Example output:
- output = [
- {
“data”: torch.tensor([img1_0, img2_0, ….., img1_1, img2_1,…]) ..
},
]
vissl.data.collators.targets_one_hot_default_collator module¶
-
vissl.data.collators.targets_one_hot_default_collator.
convert_to_one_hot
(pos_lbl, neg_lbl, num_classes: int) → torch.Tensor[source]¶ This function converts target class indices to one-hot vectors, given the number of classes.
-> 1 for positive labels, -> 0 for negative and -> -1 for ignore labels.
-
vissl.data.collators.targets_one_hot_default_collator.
targets_one_hot_default_collator
(batch, num_classes: int)[source]¶ The collators collates the batch for the following input:
- Input:
input : [[img0, …, imgk]] label: [
[[1, 3, 6], [4, 9]] [[1, 5], [6, 8, 10, 11]] …..
]
- Output:
output: [img0, img0, …..,] label: [[0, 1, 0, 1, …, -1, 0, 0, 1], [0, 1, 0, 0, 0, 1, 0], ….]
vissl.data.ssl_transforms module¶
-
class
vissl.data.ssl_transforms.
SSLTransformsWrapper
(indices, **args)[source]¶ Bases:
classy_vision.dataset.transforms.classy_transform.ClassyTransform
VISSL wraps around transforms so that they work with the multimodal input. VISSL supports batches that come from several datasets and sources. Hence the input batch (images, labels) always is a list.
To apply the user defined transforms, VISSL takes “indices” as input which defines on what dataset/source data in the sample should the transform be applied to. For example:
- Assuming input sample is {
“data”: [dataset1_imgX, dataset2_imgY], “label”: [dataset1_lblX, dataset2_lblY]
} and the transform is:
- TRANSFORMS:
name: RandomGrayscale p: 0.2 indices: 0
then the transform is applied only on dataset1_imgX. If however, the indices are either not specified or set to 0, 1 then the transform is applied on both dataset1_imgX and dataset2_imgY
Since this structure of data is introduced by vissl, the SSLTransformsWrapper takes care of dealing with the multi-modality input by wrapping the original transforms (pytorch transforms or custom transforms defined by user) and calling each transform on each index.
VISSL also supports _TRANSFORMS_WITH_LABELS transforms that modify the label or are used to generate the labels used in self-supervised learning tasks like Jigsaw. When the transforms in _TRANSFORMS_WITH_LABELS are called, the new label is also returned besides the transformed image.
VISSL also supports the _TRANSFORMS_WITH_COPIES which are transforms that basically generate several copies of image. Common example of self-supervised training methods that do this is SimCLR, SwAV, MoCo etc When a transform from _TRANSFORMS_WITH_COPIES is used, the SSLTransformsWrapper will flatten the transform output. For example for the input [img1], if we apply ImgReplicatePil to replicate the image 2 times:
- SSLTransformsWrapper(
ImgReplicatePil(num_times=2), [img1]
) will output [img1_1, img1_2] instead of nested list [[img1_1, img1_2]].
The benefit of this is that the next set of transforms specified by user can now operate on img1_1 and img1_2 as the input becomes multi-modal nature.
VISSL also supports _TRANSFORMS_WITH_GROUPING which essentially means that a single transform should be applied on the full multi-modal input together instead of separately. This is common transform used in BYOL/ For example:
- SSLTransformsWrapper(
- ImgPilMultiCropRandomApply(
RandomApply, prob=[0.0, 0.2]
), [img1_1, img1_2]
) this will apply RandomApply on img1_1 with prob=0.0 and on img1_2 with prob=0.2
-
__init__
(indices, **args)[source]¶ - Parameters
indices (List[int]) (Optional) – the indices list on which transform should be applied for the input which is always a list Example: minibatch of size=2 looks like [[img1], [img2]]). If indices is not specified, transform is applied to all the multi-modal input.
args (dict) – the arguments that the transform takes
-
__call__
(sample)[source]¶ Apply each transform on the specified indices of each entry in the input sample.
-
classmethod
from_config
(config: Dict[str, Any]) → vissl.data.ssl_transforms.SSLTransformsWrapper[source]¶
-
vissl.data.ssl_transforms.
get_transform
(input_transforms_list)[source]¶ Given the list of user specified transforms, return the torchvision.transforms.Compose() version of the transforms. Each transform in the composition is SSLTransformsWrapper which wraps the original transforms to handle multi-modal nature of input.
vissl.data.ssl_transforms.img_patches_tensor module¶
-
class
vissl.data.ssl_transforms.img_patches_tensor.
ImgPatchesFromTensor
(num_patches=9, patch_jitter=21)[source]¶ Bases:
classy_vision.dataset.transforms.classy_transform.ClassyTransform
Create image patches from a torch Tensor or numpy array. This transform was proposed in Jigsaw - https://arxiv.org/abs/1603.09246
- Parameters
-
classmethod
from_config
(config: Dict[str, Any]) → vissl.data.ssl_transforms.img_patches_tensor.ImgPatchesFromTensor[source]¶ Instantiates ImgPatchesFromTensor from configuration.
- Parameters
config (Dict) – arguments for for the transform
- Returns
ImgPatchesFromTensor instance.
vissl.data.ssl_transforms.img_pil_color_distortion module¶
-
class
vissl.data.ssl_transforms.img_pil_color_distortion.
ImgPilColorDistortion
(strength)[source]¶ Bases:
classy_vision.dataset.transforms.classy_transform.ClassyTransform
Apply Random color distortions to the input image. There are multiple different ways of applying these distortions. This implementation follows SimCLR - https://arxiv.org/abs/2002.05709 It randomly distorts the hue, saturation, brightness of an image and can randomly convert the image to grayscale.
-
__init__
(strength)[source]¶ - Parameters
strength (float) – A number used to quantify the strength of the color distortion.
-
classmethod
from_config
(config: Dict[str, Any]) → vissl.data.ssl_transforms.img_pil_color_distortion.ImgPilColorDistortion[source]¶ Instantiates ImgPilColorDistortion from configuration.
- Parameters
config (Dict) – arguments for for the transform
- Returns
ImgPilColorDistortion instance.
-
vissl.data.ssl_transforms.img_pil_gaussian_blur module¶
-
class
vissl.data.ssl_transforms.img_pil_gaussian_blur.
ImgPilGaussianBlur
(p, radius_min, radius_max)[source]¶ Bases:
classy_vision.dataset.transforms.classy_transform.ClassyTransform
Apply Gaussian Blur to the PIL image. Take the radius and probability of application as the parameter.
This transform was used in SimCLR - https://arxiv.org/abs/2002.05709
-
classmethod
from_config
(config: Dict[str, Any]) → vissl.data.ssl_transforms.img_pil_gaussian_blur.ImgPilGaussianBlur[source]¶ Instantiates ImgPilGaussianBlur from configuration.
- Parameters
config (Dict) – arguments for for the transform
- Returns
ImgPilGaussianBlur instance.
-
classmethod
vissl.data.ssl_transforms.img_pil_multicrop_random_apply module¶
-
class
vissl.data.ssl_transforms.img_pil_multicrop_random_apply.
ImgPilMultiCropRandomApply
(transforms: List[Dict[str, Any]], prob: float)[source]¶ Bases:
classy_vision.dataset.transforms.classy_transform.ClassyTransform
Apply a list of transforms on multi-crop input. The transforms are Randomly applied to each crop using the specified probability. This is used in BYOL https://arxiv.org/pdf/2006.07733.pdf
Multi-crops are several crops of a given image. This is most commonly used in contrastive learning. For example SimCLR, SwAV approaches use multi-crop input.
-
__init__
(transforms: List[Dict[str, Any]], prob: float)[source]¶ - Parameters
transforms (List(tranforms)) – List of transforms that should be applied to each crop.
prob (List(float)) –
Probability of RandomApply for the transforms composition on each crop. example: for 2 crop in BYOL, for solarization:
prob = [0.0, 0.2]
-
classmethod
from_config
(config: Dict[str, Any]) → vissl.data.ssl_transforms.img_pil_multicrop_random_apply.ImgPilMultiCropRandomApply[source]¶ Instantiates ImgPilMultiCropRandomApply from configuration.
- Parameters
config (Dict) – arguments for for the transform
- Returns
ImgPilMultiCropRandomApply instance.
-
vissl.data.ssl_transforms.img_pil_random_color_jitter module¶
-
class
vissl.data.ssl_transforms.img_pil_random_color_jitter.
ImgPilRandomColorJitter
(strength, prob)[source]¶ Bases:
classy_vision.dataset.transforms.classy_transform.ClassyTransform
Apply Random color jitter to the input image. It randomly distorts the hue, saturation, brightness of an image.
-
classmethod
from_config
(config: Dict[str, Any]) → vissl.data.ssl_transforms.img_pil_random_color_jitter.ImgPilRandomColorJitter[source]¶ Instantiates ImgPilRandomColorJitter from configuration.
- Parameters
config (Dict) – arguments for for the transform
- Returns
ImgPilRandomColorJitter instance.
-
classmethod
vissl.data.ssl_transforms.img_pil_random_photometric module¶
-
class
vissl.data.ssl_transforms.img_pil_random_photometric.
ImgPilRandomPhotometric
(p)[source]¶ Bases:
classy_vision.dataset.transforms.classy_transform.ClassyTransform
Randomly apply some photometric transforms to an image. This was used in PIRL - https://arxiv.org/abs/1912.01991
- The photometric transforms applied includes:
AutoContrast, RandomPosterize, RandomSharpness, RandomSolarize
-
classmethod
from_config
(config: Dict[str, Any]) → vissl.data.ssl_transforms.img_pil_random_photometric.ImgPilRandomPhotometric[source]¶ Instantiates ImgPilRandomPhotometric from configuration.
- Parameters
config (Dict) – arguments for for the transform
- Returns
ImgPilRandomPhotometric instance.
vissl.data.ssl_transforms.img_pil_random_solarize module¶
-
class
vissl.data.ssl_transforms.img_pil_random_solarize.
ImgPilRandomSolarize
(prob: float)[source]¶ Bases:
classy_vision.dataset.transforms.classy_transform.ClassyTransform
Randomly apply solarization transform to an image. This was used in BYOL - https://arxiv.org/abs/2006.07733
-
classmethod
from_config
(config: Dict[str, Any]) → vissl.data.ssl_transforms.img_pil_random_solarize.ImgPilRandomSolarize[source]¶ Instantiates ImgPilRandomSolarize from configuration.
- Parameters
config (Dict) – arguments for for the transform
- Returns
ImgPilRandomSolarize instance.
-
classmethod
vissl.data.ssl_transforms.img_pil_to_lab_tensor module¶
-
class
vissl.data.ssl_transforms.img_pil_to_lab_tensor.
ImgPil2LabTensor
(indices)[source]¶ Bases:
classy_vision.dataset.transforms.classy_transform.ClassyTransform
Convert a PIL image to LAB tensor of shape C x H x W This transform was proposed in Colorization - https://arxiv.org/abs/1603.08511
The input image is PIL Image. We first convert it to tensor HWC which has channel order RGB. We then convert the RGB to BGR and use OpenCV to convert the image to LAB. The LAB image is 8-bit image in range > L [0, 255], A [0, 255], B [0, 255]. We rescale it to: L [0, 100], A [-128, 127], B [-128, 127]
The output is image torch tensor.
-
classmethod
from_config
(config: Dict[str, Any]) → vissl.data.ssl_transforms.img_pil_to_lab_tensor.ImgPil2LabTensor[source]¶ Instantiates ImgPil2LabTensor from configuration.
- Parameters
config (Dict) – arguments for for the transform
- Returns
ImgPil2LabTensor instance.
-
classmethod
vissl.data.ssl_transforms.img_pil_to_multicrop module¶
-
class
vissl.data.ssl_transforms.img_pil_to_multicrop.
ImgPilToMultiCrop
(total_num_crops, num_crops, size_crops, crop_scales)[source]¶ Bases:
classy_vision.dataset.transforms.classy_transform.ClassyTransform
Convert a PIL image to Multi-resolution Crops. The input is a PIL image and output is the list of image crops.
This transform was proposed in SwAV - https://arxiv.org/abs/2006.09882
-
__init__
(total_num_crops, num_crops, size_crops, crop_scales)[source]¶ Returns total_num_crops square crops of an image. Each crop is a random crop extracted according to the parameters specified in size_crops and crop_scales. For ease of use, one can specify num_crops which removes the need to repeat parameters.
- Parameters
Example usage: - (total_num_crops=2, num_crops=[1, 1],
size_crops=[224, 96], crop_scales=[(0.14, 1.), (0.05, 0.14)]) Extracts 2 crops total of size 224x224 and 96x96
- (total_num_crops=2, num_crops=[1, 2],
size_crops=[224, 96], crop_scales=[(0.14, 1.), (0.05, 0.14)]) Extracts 3 crops total: 1 of size 224x224 and 2 of size 96x96
-
classmethod
from_config
(config: Dict[str, Any]) → vissl.data.ssl_transforms.img_pil_to_multicrop.ImgPilToMultiCrop[source]¶ Instantiates ImgPilToMultiCrop from configuration.
- Parameters
config (Dict) – arguments for for the transform
- Returns
ImgPilToMultiCrop instance.
-
vissl.data.ssl_transforms.img_pil_to_patches_and_image module¶
-
class
vissl.data.ssl_transforms.img_pil_to_patches_and_image.
ImgPilToPatchesAndImage
(crop_scale_image=0.08, 1.0, crop_size_image=224, crop_scale_patches=0.6, 1.0, crop_size_patches=255, permute_patches=True, num_patches=9)[source]¶ Bases:
classy_vision.dataset.transforms.classy_transform.ClassyTransform
Convert an input PIL image to Patches and Image This transform was proposed in PIRL - https://arxiv.org/abs/1912.01991.
- Input:
PIL Image
- Returns
- list containing N+1 elements
zeroth element: a RandomResizedCrop of the image
remainder: N patches extracted uniformly from a RandomResizedCrop
-
__init__
(crop_scale_image=0.08, 1.0, crop_size_image=224, crop_scale_patches=0.6, 1.0, crop_size_patches=255, permute_patches=True, num_patches=9)[source]¶ - Parameters
crop_scale_image (tuple of floats) – scale for RandomResizedCrop of image
crop_size_image (int) – size for RandomResizedCrop of image
crop_scale_patches (tuple of floats) – scale for RandomResizedCrop of patches
crop_size_patches (int) – size for RandomResizedCrop of patches
permute_patches (bool) – permute the patches in any order
num_patches (int) – number of patches to create. should be a square integer.
-
classmethod
from_config
(config: Dict[str, Any]) → vissl.data.ssl_transforms.img_pil_to_patches_and_image.ImgPilToPatchesAndImage[source]¶ Instantiates ImgPilToPatchesAndImage from configuration.
- Parameters
config (Dict) – arguments for for the transform
- Returns
ImgPilToPatchesAndImage instance.
vissl.data.ssl_transforms.img_pil_to_raw_tensor module¶
-
class
vissl.data.ssl_transforms.img_pil_to_raw_tensor.
ImgPilToRawTensor
[source]¶ Bases:
classy_vision.dataset.transforms.classy_transform.ClassyTransform
Convert a PIL image to Raw Tensor if we don’t want to apply the default division by 255 by torchvision.transforms.ToTensor()
-
classmethod
from_config
(config: Dict[str, Any]) → vissl.data.ssl_transforms.img_pil_to_raw_tensor.ImgPilToRawTensor[source]¶ Instantiates ImgPilToRawTensor from configuration.
- Parameters
config (Dict) – arguments for for the transform
- Returns
ImgPilToRawTensor instance.
-
classmethod
vissl.data.ssl_transforms.img_pil_to_tensor module¶
-
class
vissl.data.ssl_transforms.img_pil_to_tensor.
ImgToTensor
[source]¶ Bases:
classy_vision.dataset.transforms.classy_transform.ClassyTransform
The Transform that overrides the PyTorch transform to provide better transformation speed.
# credits: mannatsingh@fb.com
vissl.data.ssl_transforms.img_replicate_pil module¶
-
class
vissl.data.ssl_transforms.img_replicate_pil.
ImgReplicatePil
(num_times: int = 2)[source]¶ Bases:
classy_vision.dataset.transforms.classy_transform.ClassyTransform
Adds the same image multiple times to the batch K times so that the batch. Size is now N*K. Use the simclr_collator to convert into batches.
This transform is useful when generating multiple copies of the same image, for example, when training contrastive methods.
-
__init__
(num_times: int = 2)[source]¶ - Parameters
num_times (int) – how many times should the image be replicated.
-
classmethod
from_config
(config: Dict[str, Any]) → vissl.data.ssl_transforms.img_replicate_pil.ImgReplicatePil[source]¶ Instantiates ImgReplicatePil from configuration.
- Parameters
config (Dict) – arguments for for the transform
- Returns
ImgReplicatePil instance.
-
vissl.data.ssl_transforms.img_rotate_pil module¶
-
class
vissl.data.ssl_transforms.img_rotate_pil.
ImgRotatePil
(num_angles=4, num_rotations_per_img=1)[source]¶ Bases:
classy_vision.dataset.transforms.classy_transform.ClassyTransform
Apply rotation to a PIL Image. Samples rotation angle from a set of predefined rotation angles.
Predefined rotation angles are sampled at equal intervals in the [0, 360) angle space where the number of angles is specified by num_angles.
This transform was used in RotNet - https://arxiv.org/abs/1803.07728
-
classmethod
from_config
(config: Dict[str, Any]) → vissl.data.ssl_transforms.img_rotate_pil.ImgRotatePil[source]¶ Instantiates ImgRotatePil from configuration.
- Parameters
config (Dict) – arguments for for the transform
- Returns
ImgRotatePil instance.
-
classmethod
vissl.data.ssl_transforms.pil_photometric_transforms_lib module¶
-
class
vissl.data.ssl_transforms.pil_photometric_transforms_lib.
TransformObject
[source]¶ Bases:
object
Helper object to that prints information about the transformation and other transforms can inherit from this.
-
class
vissl.data.ssl_transforms.pil_photometric_transforms_lib.
RandomValueApplier
(min_v, max_v, root_transform, vtype='float', closed_interval=False)[source]¶ Bases:
vissl.data.ssl_transforms.pil_photometric_transforms_lib.TransformObject
-
__init__
(min_v, max_v, root_transform, vtype='float', closed_interval=False)[source]¶ Applies a transform by sampling a random value between [min_v, max_v]
- Parameters
root_transform (transform object) – transform that will be applied. must accept a value as input.
vtype (string) – value type - either “float” or “int”
closed_interval (bool) – sample from [min_v, max_v] (when True) or [min_v, max_v) when False
-
-
vissl.data.ssl_transforms.pil_photometric_transforms_lib.
Sharpness
(img, v)[source]¶ Applies PIL.ImageEnhance.Sharpness to the image
-
vissl.data.ssl_transforms.pil_photometric_transforms_lib.
Solarize
(img, v)[source]¶ Applies PIL.ImageOps.solarize to the image
-
vissl.data.ssl_transforms.pil_photometric_transforms_lib.
Posterize
(img, v)[source]¶ Applies PIL.ImageOps.posterize to the image
-
vissl.data.ssl_transforms.pil_photometric_transforms_lib.
AutoContrast
(img, _)[source]¶ Applies PIL.ImageOps.autocontrast to the image
-
class
vissl.data.ssl_transforms.pil_photometric_transforms_lib.
RandomSharpnessTransform
(min_v=0.1, max_v=1.9, root_transform=<function Sharpness>, vtype='float')[source]¶ Bases:
vissl.data.ssl_transforms.pil_photometric_transforms_lib.RandomValueApplier
Randomly apply the Sharpness transformation with the random value selected from an interval.
-
class
vissl.data.ssl_transforms.pil_photometric_transforms_lib.
RandomPosterizeTransform
(min_v=4, max_v=8, root_transform=<function Posterize>, vtype='int')[source]¶ Bases:
vissl.data.ssl_transforms.pil_photometric_transforms_lib.RandomValueApplier
-
class
vissl.data.ssl_transforms.pil_photometric_transforms_lib.
RandomSolarizeTransform
(min_v=0, max_v=256, root_transform=<function Solarize>, vtype='int')[source]¶ Bases:
vissl.data.ssl_transforms.pil_photometric_transforms_lib.RandomValueApplier
-
class
vissl.data.ssl_transforms.pil_photometric_transforms_lib.
AutoContrastTransform
[source]¶ Bases:
vissl.data.ssl_transforms.pil_photometric_transforms_lib.TransformObject
Wraps the AutoContrast method
vissl.data.ssl_transforms.shuffle_img_patches module¶
-
class
vissl.data.ssl_transforms.shuffle_img_patches.
ShuffleImgPatches
(perm_file: str)[source]¶ Bases:
classy_vision.dataset.transforms.classy_transform.ClassyTransform
This transform is used to shuffle the list of tensors (usually image patches of shape C x H x W) according to a randomly selected permutation from a pre-defined set of permutations.
This is a common operation used in Jigsaw approach https://arxiv.org/abs/1603.09246
-
__init__
(perm_file: str)[source]¶ - Parameters
perm_file (string) – path to the file containing pre-defined permutations.
-
__call__
(input_patches)[source]¶ The interface __call__ is used to transform the input data. It should contain the actual implementation of data transform.
- Parameters
input_patches (List[torch.tensor]) – list of torch tensors
-
classmethod
from_config
(config: Dict[str, Any]) → vissl.data.ssl_transforms.shuffle_img_patches.ShuffleImgPatches[source]¶ Instantiates ShuffleImgPatches from configuration.
- Parameters
config (Dict) – arguments for for the transform
- Returns
ShuffleImgPatches instance.
-
vissl.data.data_helper module¶
-
vissl.data.data_helper.
get_mean_image
(crop_size)[source]¶ Helper function that returns a gray PIL image of the size specified by user.
- Parameters
crop_size (int) – used to generate (crop_size x crop_size x 3) image.
- Returns
img – PIL Image
-
class
vissl.data.data_helper.
StatefulDistributedSampler
(dataset, batch_size=None)[source]¶ Bases:
torch.utils.data.distributed.DistributedSampler
More fine-grained state DataSampler that uses training iteration and epoch both for shuffling data. PyTorch DistributedSampler only uses epoch for the shuffling and starts sampling data from the start. In case of training on very large data, we train for one epoch only and when we resume training, we want to resume the data sampler from the training iteration.
-
__init__
(dataset, batch_size=None)[source]¶ Initializes the instance of StatefulDistributedSampler. Random seed is set for the epoch set and data is shuffled. For starting the sampling, use the start_iter (set to 0 or set by checkpointing resuming) to sample data from the remaining images.
- Parameters
dataset (Dataset) – Pytorch dataset that sampler will shuffle
batch_size (int) – batch size we want the sampler to sample
-
-
class
vissl.data.data_helper.
QueueDataset
(queue_size)[source]¶ Bases:
torch.utils.data.dataset.Dataset
This class helps dealing with the invalid images in the dataset by using two queue. One queue is used to enqueue seen and valid images from previous batches. The other queue is used to dequeue. The class is implemented such that the same batch will never have duplicate images. If we can’t dequeue a valid image, we return None for that instance.
- Parameters
queue_size – size the the queue (ideally set it to batch_size). Both queues will be of the same size
vissl.data.dataloader_sync_gpu_wrapper module¶
-
class
vissl.data.dataloader_sync_gpu_wrapper.
DataloaderSyncGPUWrapper
(dataloader: Iterable)[source]¶ Bases:
classy_vision.dataset.dataloader_wrapper.DataloaderWrapper
Dataloader which wraps another dataloader, and moves the data to GPU in async manner so as to overlap the cost of copying data from cpu to gpu with the previous model iteration.
vissl.data.ssl_dataset module¶
-
class
vissl.data.ssl_dataset.
GenericSSLDataset
(cfg, split, dataset_source_map)[source]¶ Bases:
torch.utils.data.dataset.Dataset
Base Self Supervised Learning Dataset Class.
The GenericSSLDataset class is defined to support reading data from multiple data sources. For example: data = [dataset1, dataset2] and the minibatches generated will have the corresponding data from each dataset.
For this reason, we also support labels from multiple sources. For example targets = [dataset1 targets, dataset2 targets].
In order to support multiple data sources, the dataset configuration always has list inputs.
DATA_SOURCES, LABEL_SOURCES, DATASET_NAMES, DATA_PATHS, LABEL_PATHS
For several data sources, we also support specifying on what dataset the transforms should be applied. By default, apply the transforms on data from all datasets.
- Parameters
cfg (AttrDict) – configuration defined by user
split (str) – the dataset split for which we are constructing the Dataset object
dataset_source_map (Dict[str, Callable]) –
The dictionary that maps what data sources are supported and what object to use to read data from those sources. For example: DATASET_SOURCE_MAP = {
”disk_filelist”: DiskImageDataset, “disk_folder”: DiskImageDataset, “synthetic”: SyntheticImageDataset,
}
-
load_single_label_file
(path)[source]¶ Load the single data file. We only support user specifying the numpy label files if user is specifying a data_filelist source of labels.
To save memory, if the mmap_mode is set to True for loading, we try to load the images in mmap_mode. If it fails, we simply load the labels without mmap
-
__getitem__
(idx)[source]¶ Get the input sample for the minibatch for a specified data index. For each data object (if we are loading several datasets in a minibatch), we get the sample: consisting of {
image data,
label (if applicable) otherwise idx
data_valid: 0 or 1 indicating if the data is valid image
data_idx : index of the data in the dataset for book-keeping and debugging
}
Once the sample data is available, we apply the data transform on the sample.
The final transformed sample is returned to be added into the minibatch.
-
get_image_paths
()[source]¶ Get the image paths for all the data sources.
- Returns
image_paths (List[List[str]]) –
- list containing image paths list for each
data source.
-
get_available_splits
(dataset_config)[source]¶ Get the available splits in the dataset confir. Not specific to this split for which the SSLDataset is being constructed.
NOTE: this is deprecated method.
vissl.data.disk_dataset module¶
-
class
vissl.data.disk_dataset.
DiskImageDataset
(cfg, data_source, path, split, dataset_name)[source]¶ Bases:
vissl.data.data_helper.QueueDataset
Base Dataset class for loading images from Disk. Can load a predefined list of images or all images inside a folder.
Inherits from QueueDataset class in VISSL to provide better handling of the invalid images by replacing them with the valid and seen images.
- Parameters
cfg (AttrDict) – configuration defined by user
data_source (string) – data source either of “disk_filelist” or “disk_folder”
path (string) –
can be either of the following 1. A .npy file containing a list of filepaths.
In this case data_source = “disk_filelist”
A folder such that folder/split contains images. In this case data_source = “disk_folder”
split (string) – specify split for the dataset. Usually train/val/test. Used to read images if reading from a folder path and retrieve settings for that split from the config path.
dataset_name (string) – name of dataset. For information only.
NOTE: This dataset class only returns images (not labels or other metdata). To load labels you must specify them in LABEL_SOURCES (See ssl_dataset.py). LABEL_SOURCES follows a similar convention as the dataset and can either be a filelist or a torchvision ImageFolder compatible folder - 1. Store labels in a numpy file 2. Store images in a nested directory structure so that torchvision ImageFolder
dataset can infer the labels.
-
__getitem__
(idx)[source]¶ We do delayed loading of data to reduce the memory size due to pickling of dataset across dataloader workers.
Loads the data if not already loaded.
Sets and initializes the queue if not already initialized
Depending on the data source (folder or filelist), get the image. If using the QueueDataset and image is valid, save the image in queue if not full. Otherwise return a valid seen image from the queue if queue is not empty.
vissl.data.synthetic_dataset module¶
-
class
vissl.data.synthetic_dataset.
SyntheticImageDataset
(cfg, path, split, dataset_name, data_source='synthetic')[source]¶ Bases:
torch.utils.data.dataset.Dataset
Synthetic dataset class. Mean image is returned always. This dataset is used/recommended to use for testing purposes only.
- Parameters
path (string) – can be “” [not used]
split (string) – specify split for the dataset. Usually train/val/test. Used to read images if reading from a folder `path’ and retrieve settings for that split from the config path [not used]
dataset_name (string) – name of dataset. For information only. [not used]
data_source (string, Optional) – data source (“synthetic”) [not used]
vissl.data.dataset_catalog module¶
Data and labels file for various datasets.
-
class
vissl.data.dataset_catalog.
VisslDatasetCatalog
[source]¶ Bases:
object
A catalog that stores information about the datasets and how to obtain them. It contains a mapping from strings (which are names that identify a dataset, e.g. “imagenet1k”) to a dict which contains:
mapping of various data splits (train, test, val) to the data source (path on the disk whether a folder path or a filelist)
source of the data (disk_filelist | disk_folder)
The purpose of having this catalog is to make it easy to choose different datasets, by just using the strings in the config.
-
static
register_json
(json_catalog_path)[source]¶ - Parameters
filepath – a .json filepath that contains the data to be registered
-
static
register_dict
(dict_catalog)[source]¶ - Parameters
dict – a dict with a bunch of datasets to be registered
-
static
register_data
(name, data_dict)[source]¶ - Parameters
name (str) – the name that identifies a dataset, e.g. “imagenet1k_folder”.
func (callable) – a callable which takes no arguments and returns a list of dicts. It must return the same results if called multiple times.
-
vissl.data.dataset_catalog.
get_local_path
(input_file, dest_dir)[source]¶ If user specified copying data to a local directory, get the local path where the data files were copied.
If input_file is just a file, we return the dest_dir/filename
If the intput_file is a directory, then we check if the environemt is SLURM and use slurm_dir or otherwise dest_dir to look up copy_complete file is available. If available, we return the directory.
If both above fail, we return the input_file as is.
-
vissl.data.dataset_catalog.
get_local_output_filepaths
(input_files, dest_dir)[source]¶ If we have copied the files to local disk as specified in the config, we return those local paths. Otherwise return the original paths.
-
vissl.data.dataset_catalog.
check_data_exists
(data_files)[source]¶ Check that the input data files exist. If the data_files is a list, we iteratively check for each file in the list.
-
vissl.data.dataset_catalog.
register_pascal_voc
()[source]¶ Register PASCAL VOC 2007 and 2012 datasets to the data catalog. We first look up for these datasets paths in the dataset catalog, if the paths exist, we register, otherwise we remove the voc_data from the catalog registry.
-
vissl.data.dataset_catalog.
register_coco
()[source]¶ Register COCO 2004 datasets to the data catalog. We first look up for these datasets paths in the dataset catalog, if the paths exist, we register, otherwise we remove the coco2014_folder from the catalog registry.
-
vissl.data.dataset_catalog.
register_datasets
(json_catalog_path)[source]¶ If the json dataset_catalog file is found, we register the datasets specified in the catalog with VISSL. If the catalog also specified VOC or coco datasets, we resister them
- Parameters
json_catalog_path (str) – the path to the json dataset catalog
-
vissl.data.dataset_catalog.
get_data_files
(split, dataset_config)[source]¶ - Get the path to the dataset (images and labels).
If the user has explicitly specified the data_sources, we simply use those and don’t do lookup in the datasets registered with VISSL from the dataset catalog.
If the user hasn’t specified the path, look for the dataset in the datasets catalog registered with VISSL. For a given list of datasets and a given partition (train/test), we first verify that we have the dataset and the correct source as specified by the user. Then for each dataset in the list, we get the data path (make sure it exists, sources match). For the label file, the file is optional.
Once we have the dataset original paths, we replace the path with the local paths if the data was copied to local disk.
vissl.engines package¶
vissl.engines.train module¶
-
vissl.engines.train.
train_main
(cfg: vissl.utils.hydra_config.AttrDict, dist_run_id: str, checkpoint_path: str, checkpoint_folder: str, local_rank: int = 0, node_id: int = 0, hook_generator: Callable[[Any], List[classy_vision.hooks.classy_hook.ClassyHook]] = <function default_hook_generator>)[source]¶ Sets up and executes training workflow per machine.
- Parameters
cfg (AttrDict) – user specified input config that has optimizer, loss, meters etc settings relevant to the training
dist_run_id (str) –
For multi-gpu training with PyTorch, we have to specify how the gpus are going to rendezvous. This requires specifying the communication method: file, tcp and the unique rendezvous run_id that is specific to 1 run. We recommend:
for 1node: use init_method=tcp and run_id=auto
2) for multi-node, use init_method=tcp and specify run_id={master_node}:{port}
checkpoint_path (str) – if the training is being resumed from a checkpoint, path to the checkpoint. The tools/run_distributed_engines.py automatically looks for the checkpoint in the checkpoint directory.
checkpoint_folder (str) – what directory to use for checkpointing. The tools/run_distributed_engines.py creates the directory based on user input in the yaml config file.
local_rank (int) – id of the current device on the machine. If using gpus, local_rank = gpu number on the current machine
node_id (int) – id of the current machine. starts from 0. valid for multi-gpu
hook_generator (Callable) – The utility function that prepares all the hoooks that will be used in training based on user selection. Some basic hooks are used by default.
vissl.engines.extract_features module¶
-
vissl.engines.extract_features.
extract_main
(cfg: vissl.utils.hydra_config.AttrDict, dist_run_id: str, local_rank: int = 0, node_id: int = 0)[source]¶ Sets up and executes feature extraction workflow per machine.
- Parameters
cfg (AttrDict) – user specified input config that has optimizer, loss, meters etc settings relevant to the training
dist_run_id (str) –
For multi-gpu training with PyTorch, we have to specify how the gpus are going to rendezvous. This requires specifying the communication method: file, tcp and the unique rendezvous run_id that is specific to 1 run. We recommend:
for 1node: use init_method=tcp and run_id=auto
2) for multi-node, use init_method=tcp and specify run_id={master_node}:{port}
local_rank (int) – id of the current device on the machine. If using gpus, local_rank = gpu number on the current machine
node_id (int) – id of the current machine. starts from 0. valid for multi-gpu
vissl.meters package¶
vissl.meters.accuracy_list_meter¶
-
class
vissl.meters.accuracy_list_meter.
AccuracyListMeter
(num_meters: int, topk_values: List[int], meter_names: List[str])[source]¶ Bases:
classy_vision.meters.classy_meter.ClassyMeter
Meter to calculate top-k accuracy for single label image classification task.
Supports Single target and multiple output. A list of accuracy meters is constructed and each output has a meter associated.
- Parameters
num_meters – number of meters and hence we have same number of outputs
topk_values – list of int k values. Example: [1, 5]
meter_names – list of str indicating the name of meter. Usually corresponds to the output layer name.
-
classmethod
from_config
(meters_config: vissl.utils.hydra_config.AttrDict)[source]¶ Get the AccuracyListMeter instance from the user defined config
-
property
name
¶ Name of the meter
-
property
value
¶ Value of the meter globally synced. For each output, all the top-k values are returned. If there are several meters attached to the same layer name, a list of top-k values will be returned for that layer name meter.
-
update
(model_output: Union[torch.Tensor, List[torch.Tensor]], target: torch.Tensor)[source]¶ Updates the value of the meter for the given model output list and targets.
- Parameters
model_output – list of tensors of shape (B, C) where each value is either logit or class probability.
target – tensor of shape (B).
NOTE: For binary classification, C=2.
vissl.meters.mean_ap_meter¶
-
class
vissl.meters.mean_ap_meter.
MeanAPMeter
(meters_config: vissl.utils.hydra_config.AttrDict)[source]¶ Bases:
classy_vision.meters.classy_meter.ClassyMeter
Meter to calculate mean AP metric for multi-label image classification task.
- Parameters
meters_config (AttrDict) – config containing the meter settings
meters_config should specify the num_classes
-
classmethod
from_config
(meters_config: vissl.utils.hydra_config.AttrDict)[source]¶ Get the AccuracyListMeter instance from the user defined config
-
property
name
¶ Name of the meter
-
property
value
¶ Value of the meter globally synced. mean AP and AP for each class is returned
-
gather_scores
(scores: torch.Tensor)[source]¶ Do a gather over all embeddings, so we can compute the loss. Final shape is like: (batch_size * num_gpus) x embedding_dim
-
gather_targets
(targets: torch.Tensor)[source]¶ Do a gather over all embeddings, so we can compute the loss. Final shape is like: (batch_size * num_gpus) x embedding_dim
vissl.meters.mean_ap_list_meter¶
-
class
vissl.meters.mean_ap_list_meter.
MeanAPListMeter
(meters_config: vissl.utils.hydra_config.AttrDict)[source]¶ Bases:
classy_vision.meters.classy_meter.ClassyMeter
Meter to calculate mean AP metric for multi-label image classification task on multiple output single target.
Supports Single target and multiple output. A list of mean AP meters is constructed and each output has a meter associated.
- Parameters
meters_config (AttrDict) – config containing the meter settings
meters_config should specify the num_meters and meter_names
-
classmethod
from_config
(meters_config: vissl.utils.hydra_config.AttrDict)[source]¶ Get the AccuracyListMeter instance from the user defined config
-
property
name
¶ Name of the meter
-
property
value
¶ Value of the meter globally synced. For each output, mean AP and AP for each class is returned.
-
update
(model_output: Union[torch.Tensor, List[torch.Tensor]], target: torch.Tensor)[source]¶ Updates the value of the meter for the given model output list and targets.
- Parameters
model_output – list of tensors of shape (B, C) where each value is either logit or class probability.
target – tensor of shape (B).
NOTE: For binary classification, C=2.
vissl.models package¶
-
class
vissl.models.
BaseSSLMultiInputOutputModel
(*args, **kwargs)[source]¶ Bases:
classy_vision.models.classy_model.ClassyModel
Class to implement a Self-Supervised model. The model is split into `trunk’ that computes features and `head’ that computes outputs (projections, classifications etc)
This class supports many use cases: 1. Model producing single output as in standard supervised ImageNet training 2. Model producing multiple outputs (Multi-task) 3. Model producing multiple outputs from different features (layers)
from the trunk (useful in linear evaluation of features from several model layers)
Model that accepts multiple inputs (e.g. image and patches as in PIRL appraoch).
Model where the trunk is frozen.
Model that supports multiple resolutions inputs as in SwAV
- How to specify heads?
For information on heads see the _get_heads() function
-
multi_input_with_head_mapping_forward
(batch)[source]¶ Perform forward pass (trunk + heads) separately on each input and return the model output on all inputs as a list.
-
multi_res_input_forward
(batch, feature_names)[source]¶ Perform forward pass separately on each resolution input. The inputs corresponding to a single resolution are clubbed and single forward is run on the same resolution inputs. Hence we do several forward passes = number of different resolutions used. We then concatenate all the output features. Then run the head forward on the concatenated features.
-
single_input_forward
(batch, feature_names, heads)[source]¶ Simply run the trunk and heads forward on the input tensor. We run the trunk first and then the heads on the trunk output. If the model is trunk feature extraction only, then we simply return the output of the trunk.
-
heads_forward
(feats, heads)[source]¶ Run the forward of the head on the trunk output features. We have 2 cases:
#heads = #feats -> example training linear classifiers on various layers. We run one head on the corresponding feature.
#feats = 1 and #heads > 1 -> head consists of many layers to be run sequentially. #outputs = 1
-
forward
(batch)[source]¶ Main forward of the model. Depending on the model type the calls are patched to the suitable function.
-
freeze_trunk
()[source]¶ Freeze the model trunk by setting requires_grad=False for all the parameters
-
freeze_head_and_trunk
()[source]¶ Freeze the full model including the heads and the trunk. In 99% cases, we do not use the pretext head as it is specific to the self-supervised pretext task. But in case of some models like NPID, SimCLR, SwAV, the head is essentially a low dimensional feature projection which we want to use. Hence, we provide utility to freeze the full model.
-
is_fully_frozen_model
()[source]¶ Look at all the parameters of the model (trunk + heads) and check if there is any trainable parameter. if not, the model is completely frozen.
-
get_features
(batch)[source]¶ Run the trunk forward on the input batch. This give us the features from the trunk at several layers of the model.
In case of feature extraction, we don’t run the heads and only the trunk. The trunk will already have the feature extractor Pooling layers and flattened features attached. feature extractor heads are part of the trunk already.
-
get_classy_state
(deep_copy=False)[source]¶ Return the model state (trunk + heads) to checkpoint.
We call this on the state.base_model which is not wrapped with DDP. get the model state_dict to checkpoint
-
set_classy_state
(state)[source]¶ Initialize the model trunk and head from the state dictionary.
We call this on the state.base_model which is not wrapped with DDP. load the model from checkpoint.
-
property
num_classes
¶ Not implemented and not required
-
property
input_shape
¶ Not implemented and not required
-
property
output_shape
¶ Not implemented and not required
-
vissl.models.
convert_sync_bn
(config, model)[source]¶ Convert the BatchNorm layers in the model to the SyncBatchNorm layers.
For SyncBatchNorm, we support two sources: Apex and PyTorch. The optimized SyncBN kernels provided by apex run faster.
- Parameters
config (AttrDict) – configuration file
model – Pytorch model whose BatchNorm layers should be converted to SyncBN layers.
- NOTE: Since SyncBatchNorm layer synchronize the BN stats across machines, using
the syncBN layer can be slow. In order to speed up training while using syncBN, we recommend using process_groups which are very well supported for Apex. To set the process groups, set SYNC_BN_CONFIG.GROUP_SIZE following below: 1) if group_size=-1 -> use the VISSL default setting. We synchronize within a
machine and hence will set group_size=num_gpus per node. This gives the best speedup.
if group_size>0 -> will set group_size=value set by user.
if group_size=0 -> no groups are created and process_group=None. This means global sync is done.
-
vissl.models.
is_feature_extractor_model
(model_config)[source]¶ - If the model is a feature extractor model:
evaluation model is on
trunk is frozen
number of features specified for features extratction > 0
-
vissl.models.
build_model
(model_config, optimizer_config)[source]¶ Given the model config and the optimizer config, construct the model. The returned model is not copied to gpu yet (if using gpu) and neither wrapped with DDP yet. This is done later train_task.py .prepare()
vissl.models.model_helpers module¶
-
vissl.models.model_helpers.
transform_model_input_data_type
(model_input, model_config)[source]¶ Default model input follow RGB format. Based the model input specified, change the type. Supported types: RGB, BGR, LAB
-
vissl.models.model_helpers.
is_feature_extractor_model
(model_config)[source]¶ - If the model is a feature extractor model:
evaluation model is on
trunk is frozen
number of features specified for features extratction > 0
-
vissl.models.model_helpers.
get_trunk_output_feature_names
(model_config)[source]¶ Get the feature names which we will use to associate the features witl. If Feature eval mode is set, we get feature names from config.FEATURE_EVAL_SETTINGS.LINEAR_EVAL_FEAT_POOL_OPS_MAP.
-
class
vissl.models.model_helpers.
Wrap
(function)[source]¶ Bases:
torch.nn.modules.module.Module
Wrap a free function into a nn.Module. Can be useful to build a model block, and include activations or light tensor alterations
-
class
vissl.models.model_helpers.
SyncBNTypes
(value)[source]¶ -
Supported SyncBN types
-
apex
= 'apex'¶
-
pytorch
= 'pytorch'¶
-
-
vissl.models.model_helpers.
convert_sync_bn
(config, model)[source]¶ Convert the BatchNorm layers in the model to the SyncBatchNorm layers.
For SyncBatchNorm, we support two sources: Apex and PyTorch. The optimized SyncBN kernels provided by apex run faster.
- Parameters
config (AttrDict) – configuration file
model – Pytorch model whose BatchNorm layers should be converted to SyncBN layers.
- NOTE: Since SyncBatchNorm layer synchronize the BN stats across machines, using
the syncBN layer can be slow. In order to speed up training while using syncBN, we recommend using process_groups which are very well supported for Apex. To set the process groups, set SYNC_BN_CONFIG.GROUP_SIZE following below: 1) if group_size=-1 -> use the VISSL default setting. We synchronize within a
machine and hence will set group_size=num_gpus per node. This gives the best speedup.
if group_size>0 -> will set group_size=value set by user.
if group_size=0 -> no groups are created and process_group=None. This means global sync is done.
-
class
vissl.models.model_helpers.
Flatten
(dim=- 1)[source]¶ Bases:
torch.nn.modules.module.Module
Flatten module attached in the model. It basically flattens the input tensor.
-
class
vissl.models.model_helpers.
Identity
(args=None)[source]¶ Bases:
torch.nn.modules.module.Module
A helper module that outputs the input as is
-
class
vissl.models.model_helpers.
LayerNorm2d
(num_channels, eps=1e-05, affine=True)[source]¶ Bases:
torch.nn.modules.normalization.GroupNorm
Use GroupNorm to construct LayerNorm as pytorch LayerNorm2d requires specifying input_shape explicitly which is inconvenient. Set num_groups=1 to convert GroupNorm to LayerNorm.
-
class
vissl.models.model_helpers.
RESNET_NORM_LAYER
(value)[source]¶ -
Types of Norms supported in ResNe(X)t trainings. can be easily set and modified from the config file.
-
BatchNorm
= 'BatchNorm'¶
-
LayerNorm
= 'LayerNorm'¶
-
-
vissl.models.model_helpers.
parse_out_keys_arg
(out_feat_keys: List[str], all_feat_names: List[str]) → Tuple[List[str], int][source]¶ Checks if all out_feature_keys are mapped to a layer in the model. Returns the last layer to forward pass through for efficiency. Allow duplicate features also to be evaluated. Adapted from (https://github.com/gidariss/FeatureLearningRotNet).
-
vissl.models.model_helpers.
get_trunk_forward_outputs_module_list
(feat: torch.Tensor, out_feat_keys: List[str], feature_blocks: torch.nn.modules.container.ModuleList, all_feat_names: List[str] = None) → List[torch.Tensor][source]¶ - Parameters
feat – model input.
out_feat_keys – a list/tuple with the feature names of the features that the function should return. By default the last feature of the network is returned.
feature_blocks – list of feature blocks in the model
feature_mapping – name of the layers in the model
- Returns
out_feats – a list with the asked output features placed in the same order as in out_feat_keys.
-
vissl.models.model_helpers.
get_trunk_forward_outputs
(feat: torch.Tensor, out_feat_keys: List[str], feature_blocks: torch.nn.modules.container.ModuleDict, feature_mapping: Dict[str, str] = None, use_checkpointing: bool = True, checkpointing_splits: int = 2) → List[torch.Tensor][source]¶ - Parameters
feat – model input.
out_feat_keys – a list/tuple with the feature names of the features that the function should return. By default the last feature of the network is returned.
feature_blocks – ModuleDict containing feature blocks in the model
feature_mapping – an optional correspondence table in between the requested feature names and the model’s.
- Returns
out_feats – a list with the asked output features placed in the same order as in out_feat_keys.
vissl.models.heads module¶
-
vissl.models.heads.
get_model_head
(name: str)[source]¶ Given the model head name, construct the head if it’s registered with VISSL.
-
class
vissl.models.heads.
LinearEvalMLP
(model_config: vissl.utils.hydra_config.AttrDict, in_channels: int, dims: List[int], use_bn: bool = False, use_relu: bool = False)[source]¶ Bases:
torch.nn.modules.module.Module
A standard Linear classification module that can be attached to several layers of the model to evaluate the representation quality of features.
- The layers attached are:
BatchNorm2d -> Linear (1 or more)
Accepts a 4D input tensor. If you want to use 2D input tensor instead, use the “mlp” head directly.
-
__init__
(model_config: vissl.utils.hydra_config.AttrDict, in_channels: int, dims: List[int], use_bn: bool = False, use_relu: bool = False)[source]¶ - Parameters
model_config (AttrDict) – dictionary config.MODEL in the config file
in_channels (int) – number of channels the input has. This information is used to attached the BatchNorm2D layer.
dims (int) – dimensions of the linear layer. Example [8192, 1000] which means attaches nn.Linear(8192, 1000, bias=True)
-
forward
(batch: torch.Tensor)[source]¶ - Parameters
batch (torch.Tensor) – 4D torch tensor. This layer is meant to be attached at several parts of the model to evaluate feature representation quality. For 2D input tensor, the tensor is unsqueezed to NxDx1x1 and then eval_mlp is applied
- Returns
out (torch.Tensor) – 2D output torch tensor
-
class
vissl.models.heads.
MLP
(model_config: vissl.utils.hydra_config.AttrDict, dims: List[int], use_bn: bool = False, use_relu: bool = False, use_dropout: bool = False, use_bias: bool = True)[source]¶ Bases:
torch.nn.modules.module.Module
This module can be used to attach combination of {Linear, BatchNorm, Relu, Dropout} layers and they are fully configurable from the config file. The module also supports stacking multiple MLPs.
Examples
Linear Linear -> BN Linear -> ReLU Linear -> Dropout Linear -> BN -> ReLU -> Dropout Linear -> ReLU -> Dropout Linear -> ReLU -> Linear -> ReLU -> … Linear -> Linear -> … …
Accepts a 2D input tensor. Also accepts 4D input tensor of shape N x C x 1 x 1.
-
__init__
(model_config: vissl.utils.hydra_config.AttrDict, dims: List[int], use_bn: bool = False, use_relu: bool = False, use_dropout: bool = False, use_bias: bool = True)[source]¶ - Parameters
model_config (AttrDict) – dictionary config.MODEL in the config file
use_bn (bool) – whether to attach BatchNorm after Linear layer
use_relu (bool) – whether to attach ReLU after (Linear (-> BN optional))
use_dropout (bool) – whether to attach Dropout after (Linear (-> BN -> relu optional))
use_bias (bool) – whether the Linear layer should have bias or not
dims (int) – dimensions of the linear layer. Example [8192, 1000] which attaches nn.Linear(8192, 1000, bias=True)
-
forward
(batch: torch.Tensor)[source]¶ - Parameters
batch (torch.Tensor) – 2D torch tensor or 4D tensor of shape N x C x 1 x 1
- Returns
out (torch.Tensor) – 2D output torch tensor
-
-
class
vissl.models.heads.
SiameseConcatView
(model_config: vissl.utils.hydra_config.AttrDict, num_towers: int)[source]¶ Bases:
torch.nn.modules.module.Module
This head is useful for dealing with Siamese models which have multiple towers. For an input of type (N * num_towers) x C, this head can convert the output to N x (num_towers * C).
This head is used in case of PIRL https://arxiv.org/abs/1912.01991 and Jigsaw https://arxiv.org/abs/1603.09246 approaches.
-
__init__
(model_config: vissl.utils.hydra_config.AttrDict, num_towers: int)[source]¶
-
forward
(batch: torch.Tensor)[source]¶ - Parameters
batch (torch.Tensor) – 2D torch tensor (N * num_towers) x C or 4D tensor of shape (N * num_towers) x C x 1 x 1
- Returns
out (torch.Tensor) – 2D output torch tensor N x (C * num_towers)
-
-
class
vissl.models.heads.
SwAVPrototypesHead
(model_config: vissl.utils.hydra_config.AttrDict, dims: List[int], use_bn: bool, num_clusters: int, use_bias: bool = True, return_embeddings: bool = True, skip_last_bn: bool = True, normalize_feats: bool = True)[source]¶ Bases:
torch.nn.modules.module.Module
SwAV head used in https://arxiv.org/pdf/2006.09882.pdf paper.
- The head is composed of 2 parts
projection of features to lower dimension like 128
feature classification into clusters (also called prototypes)
The projected features are L2 normalized before clustering step.
Input: 4D torch.tensor of shape (N x C x H x W)
Output: List(2D torch.tensor of shape N x num_clusters)
-
__init__
(model_config: vissl.utils.hydra_config.AttrDict, dims: List[int], use_bn: bool, num_clusters: int, use_bias: bool = True, return_embeddings: bool = True, skip_last_bn: bool = True, normalize_feats: bool = True)[source]¶ - Parameters
model_config (AttrDict) – dictionary config.MODEL in the config file
dims (int) –
dimensions of the linear layer. Must have length at least 2. Example: [2048, 2048, 128] attaches linear layer
Linear(2048, 2048) -> BN -> Relu -> Linear(2048, 128)
use_bn (bool) – whether to attach BatchNorm after Linear layer
num_clusters (List(int)) –
number of prototypes or clusters. Typically 3000. Example dims=[3000] will attach 1 prototype head.
dims=[3000, 3000] will attach 2 prototype heads
use_bias (bool) – whether the Linear layer should have bias or not
return_embeddings (bool) – whether return the projected embeddings or not
skip_last_bn (bool) –
whether to attach BN + Relu at the end of projection head. .. rubric:: Example
[2048, 2048, 128] with skip_last_bn=True attaches linear layer Linear(2048, 2048) -> BN -> Relu -> Linear(2048, 128)
[2048, 2048, 128] with skip_last_bn=False attaches linear layer Linear(2048, 2048) -> BN -> Relu -> Linear(2048, 128) -> BN -> ReLU
This could be particularly useful when performing full finetuning on hidden layers.
-
forward
(batch: torch.Tensor)[source]¶ - Parameters
batch (4D torch.tensor) – shape (N x C x H x W)
- Returns
List(2D torch.tensor of shape N x num_clusters)
vissl.models.trunks module¶
-
vissl.models.trunks.
register_model_trunk
(name: str)[source]¶ Registers Self-Supervision Model Trunks.
This decorator allows VISSL to add custom model trunk, even if the model trunk itself is not part of VISSL. To use it, apply this decorator to a model trunk class, like this:
@register_model_trunk('my_model_trunk_name') def my_model_trunk(): ...
To get a model trunk from a configuration file, see
get_model_trunk()
.
vissl.losses package¶
vissl.losses.simclr_info_nce_loss¶
-
class
vissl.losses.simclr_info_nce_loss.
SimclrInfoNCELoss
(loss_config: vissl.utils.hydra_config.AttrDict, device: str = 'gpu')[source]¶ Bases:
classy_vision.losses.classy_loss.ClassyLoss
This is the loss which was proposed in SimCLR https://arxiv.org/abs/2002.05709 paper. See the paper for the details on the loss.
- Config params:
temperature (float): the temperature to be applied on the logits buffer_params:
world_size (int): total number of trainers in training embedding_dim (int): output dimensions of the features projects effective_batch_size (int): total batch size used (includes positives)
-
classmethod
from_config
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Instantiates SimclrInfoNCELoss from configuration.
- Parameters
loss_config – configuration for the loss
- Returns
SimclrInfoNCELoss instance.
-
class
vissl.losses.simclr_info_nce_loss.
SimclrInfoNCECriterion
(buffer_params, temperature: float)[source]¶ Bases:
torch.nn.modules.module.Module
The criterion corresponding to the SimCLR loss as defined in the paper https://arxiv.org/abs/2002.05709.
- Parameters
temperature (float) – the temperature to be applied on the logits
buffer_params – world_size (int): total number of trainers in training embedding_dim (int): output dimensions of the features projects effective_batch_size (int): total batch size used (includes positives)
-
precompute_pos_neg_mask
()[source]¶ We precompute the positive and negative masks to speed up the loss calculation
-
forward
(embedding: torch.Tensor)[source]¶ Calculate the loss. Operates on embeddings tensor.
-
static
gather_embeddings
(embedding: torch.Tensor)[source]¶ Do a gather over all embeddings, so we can compute the loss. Final shape is like: (batch_size * num_gpus) x embedding_dim
vissl.losses.multicrop_simclr_info_nce_loss¶
-
class
vissl.losses.multicrop_simclr_info_nce_loss.
MultiCropSimclrInfoNCELoss
(loss_config: vissl.utils.hydra_config.AttrDict, device: str = 'gpu')[source]¶ Bases:
vissl.losses.simclr_info_nce_loss.SimclrInfoNCELoss
Expanded version of the SimCLR loss. The SimCLR loss works only on 2 positives. We expand the loss to work for more positives following the multi-crop augmentation proposed in SwAV paper. See SwAV paper https://arxiv.org/abs/2006.09882 for the multi-crop augmentation details.
- Config params:
temperature (float): the temperature to be applied on the logits num_crops (int): number of positives used buffer_params:
world_size (int): total number of trainers in training embedding_dim (int): output dimensions of the features projects effective_batch_size (int): total batch size used (includes positives)
-
class
vissl.losses.multicrop_simclr_info_nce_loss.
MultiCropSimclrInfoNCECriterion
(buffer_params, temperature: float, num_crops: int)[source]¶ Bases:
vissl.losses.simclr_info_nce_loss.SimclrInfoNCECriterion
The criterion corresponding to the expandion SimCLR loss (as defined in the paper https://arxiv.org/abs/2002.05709) using the multi-crop augmentaion proposed in SwAV paper. The multi-crop augmentation allows using more positives per image.
- Parameters
temperature (float) – the temperature to be applied on the logits
num_crops (int) – number of positives
buffer_params – world_size (int): total number of trainers in training embedding_dim (int): output dimensions of the features projects effective_batch_size (int): total batch size used (includes positives)
-
precompute_pos_neg_mask
()[source]¶ We precompute the positive and negative masks to speed up the loss calculation
-
forward
(embedding: torch.Tensor)[source]¶ Calculate the loss. Operates on embeddings tensor.
vissl.losses.swav_loss¶
-
class
vissl.losses.swav_loss.
SwAVLoss
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Bases:
classy_vision.losses.classy_loss.ClassyLoss
This loss is proposed by the SwAV paper https://arxiv.org/abs/2006.09882 by Caron et al. See the paper for more details about the loss.
- Config params:
embedding_dim (int): the projection head output dimension temperature (float): temperature to be applied to the logits use_double_precision (bool): whether to use double precision for the loss.
This could be a good idea to avoid NaNs.
normalize_last_layer (bool): whether to normalize the last layer num_iters (int): number of sinkhorn algorithm iterations to make epsilon (float): see the paper for details num_crops (int): number of crops used crops_for_assign (List[int]): what crops to use for assignment num_prototypes (List[int]): number of prototypes temp_hard_assignment_iters (int): whether to do hard assignment for the initial
few iterations
- output_dir (str): for dumping the debugging info in case loss
becomes NaN
- queue:
queue_length (int): number of features to store and used in the scores start_iter (int): when to start using the queue for the scores local_queue_length (int): length of queue per gpu
-
classmethod
from_config
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Instantiates SwAVLoss from configuration.
- Parameters
loss_config – configuration for the loss
- Returns
SwAVLoss instance.
-
forward
(output: torch.Tensor, target: torch.Tensor)[source]¶
-
class
vissl.losses.swav_loss.
SwAVCriterion
(temperature: float, crops_for_assign: List[int], num_crops: int, num_iters: int, epsilon: float, use_double_prec: bool, num_prototypes: List[int], local_queue_length: int, embedding_dim: int, temp_hard_assignment_iters: int, output_dir: str)[source]¶ Bases:
torch.nn.modules.module.Module
This criterion is used by the SwAV paper https://arxiv.org/abs/2006.09882 by Caron et al. See the paper for more details about the loss.
- Config params:
embedding_dim (int): the projection head output dimension temperature (float): temperature to be applied to the logits
num_iters (int): number of sinkhorn algorithm iterations to make epsilon (float): see the paper for details num_crops (int): number of crops used crops_for_assign (List[int]): what crops to use for assignment num_prototypes (List[int]): number of prototypes temp_hard_assignment_iters (int): whether to do hard assignment for the initial
few iterations
- output_dir (str): for dumping the debugging info in case loss
becomes NaN
local_queue_length (int): length of queue per gpu
-
distributed_sinkhornknopp
(Q: torch.Tensor)[source]¶ Apply the distributed sinknorn optimization on the scores matrix to find the assignments
-
forward
(scores: torch.Tensor, head_id: int)[source]¶
vissl.losses.bce_logits_multiple_output_single_target¶
-
class
vissl.losses.bce_logits_multiple_output_single_target.
BCELogitsMultipleOutputSingleTargetLoss
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Bases:
classy_vision.losses.classy_loss.ClassyLoss
-
__init__
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Intializer for the sum cross-entropy loss. For a single tensor, this is equivalent to the cross-entropy loss. For a list of tensors, this computes the sum of the cross-entropy losses for each tensor in the list against the target.
- Config params:
reduction: specifies reduction to apply to the output, optional normalize_output: Whether to L2 normalize the outputs world_size: total number of gpus in training. automatically inferred by vissl
-
classmethod
from_config
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Instantiates BCELogitsMultipleOutputSingleTargetLoss from configuration.
- Parameters
loss_config – configuration for the loss
- Returns
BCELogitsMultipleOutputSingleTargetLoss instance.
-
forward
(output: Union[torch.Tensor, List[torch.Tensor]], target: torch.Tensor)[source]¶ For each output and single target, loss is calculated. The returned loss value is the sum loss across all outputs.
-
vissl.losses.swav_momentum_loss¶
-
class
vissl.losses.swav_momentum_loss.
SwAVMomentumLoss
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Bases:
classy_vision.losses.classy_loss.ClassyLoss
This loss extends the SwAV loss proposed in paper https://arxiv.org/abs/2006.09882 by Caron et al. The loss combines the benefits of using the SwAV approach with the momentum encoder as used in MoCo.
- Config params:
momentum (float): for the momentum encoder momentum_eval_mode_iter_start (int): from what iteration should the momentum encoder
network be in eval mode
embedding_dim (int): the projection head output dimension temperature (float): temperature to be applied to the logits use_double_precision (bool): whether to use double precision for the loss.
This could be a good idea to avoid NaNs.
normalize_last_layer (bool): whether to normalize the last layer num_iters (int): number of sinkhorn algorithm iterations to make epsilon (float): see the paper for details num_crops (int): number of crops used crops_for_assign (List[int]): what crops to use for assignment num_prototypes (List[int]): number of prototypes queue:
queue_length (int): number of features to store and used in the scores start_iter (int): when to start using the queue for the scores local_queue_length (int): length of queue per gpu
-
classmethod
from_config
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Instantiates SwAVMomentumLoss from configuration.
- Parameters
loss_config – configuration for the loss
- Returns
SwAVMomentumLoss instance.
-
load_state_dict
(state_dict, *args, **kwargs)[source]¶ Restore the loss state given a checkpoint
- Parameters
state_dict (serialized via torch.save) –
-
forward
(output: torch.Tensor, *args, **kwargs)[source]¶
-
distributed_sinkhornknopp
(Q: torch.Tensor)[source]¶ Apply the distributed sinknorn optimization on the scores matrix to find the assignments
vissl.losses.moco_loss¶
-
class
vissl.losses.moco_loss.
MoCoLossConfig
(embedding_dim, queue_size, momentum, temperature)[source]¶ Bases:
vissl.losses.moco_loss._MoCoLossConfig
Settings for the MoCo loss
-
static
defaults
() → vissl.losses.moco_loss.MoCoLossConfig[source]¶
-
static
-
class
vissl.losses.moco_loss.
MoCoLoss
(config: vissl.losses.moco_loss.MoCoLossConfig)[source]¶ Bases:
classy_vision.losses.classy_loss.ClassyLoss
This is the loss which was proposed in the “Momentum Contrast for Unsupervised Visual Representation Learning” paper, from Kaiming He et al. See http://arxiv.org/abs/1911.05722 for details and https://github.com/facebookresearch/moco for a reference implementation, reused here
- Config params:
embedding_dim (int): head output output dimension queue_size (int): number of elements in queue momentum (float): encoder momentum value for the update temperature (float): temperature to use on the logits
-
classmethod
from_config
(config: vissl.losses.moco_loss.MoCoLossConfig)[source]¶ Instantiates MoCoLoss from configuration.
- Parameters
loss_config – configuration for the loss
- Returns
MoCoLoss instance.
-
forward
(query: torch.Tensor, *args, **kwargs) → torch.Tensor[source]¶ Given the encoder queries, the key and the queue of the previous queries, compute the cross entropy loss for this batch
- Parameters
query – output of the encoder given the current batch
- Returns
loss
vissl.losses.nce_loss¶
-
class
vissl.losses.nce_loss.
NCELossWithMemory
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Bases:
classy_vision.losses.classy_loss.ClassyLoss
Distributed version of the NCE loss. It performs an “all_gather” to gather the allocated buffers like memory no a single gpu. For this, Pytorch distributed backend is used. If using NCCL, one must ensure that all the buffer are on GPU. This class supports training using both NCE and CrossEntropy (InfoNCE).
This loss is used by NPID (https://arxiv.org/pdf/1805.01978.pdf), NPID++ and PIRL (https://arxiv.org/abs/1912.01991) approaches.
Written by: Ishan Misra (imisra@fb.com)
- Config params:
norm_embedding (bool): whether to normalize embeddings temperature (float): the temperature to apply to logits norm_constant (int): Z parameter in the NCEAverage update_mem_with_emb_index (int): In case we have multiple embeddings used
in the nce loss, specify which embedding to use to update the memory.
- loss_type (str): options are “nce” | “cross_entropy”. Using the
cross_entropy turns the loss into InfoNCE loss.
- loss_weights (List[float]): if the NCE loss is computed between multiple pairs,
we can set a loss weight per term can be used to weight different pair contributions differently
- negative_sampling_params:
num_negatives (int): how many negatives to contrast with type (str): how to select the negatives. options “random”
- memory_params:
- memory_size (int): number of training samples as all the samples are
stored in memory
embedding_dim (int): the projection head output dimension momentum (int): momentum to use to update the memory norm_init (bool): whether to L2 normalize the initialized memory bank update_mem_on_forward (bool): whether to update memory on the forward pass
num_train_samples (int): number of unique samples in the training dataset
-
classmethod
from_config
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Instantiates NCELossWithMemory from configuration.
- Parameters
loss_config – configuration for the loss
- Returns
NCELossWithMemory instance.
-
forward
(output: Union[torch.Tensor, List[torch.Tensor]], target: torch.Tensor)[source]¶ For each output and single target, loss is calculated.
-
sync_memory
()[source]¶ Sync memory across all processes before first forward pass. Only needed in the distributed case. After the first forward pass, the update_memory function in NCEAverage does a gather over all embeddings, so memory stays in sync. Doing a gather over embeddings is O(batch size). Syncing memory is O(num items in memory). Generally, batch size << num items in memory. So, we prefer doing the syncs in update_memory.
-
class
vissl.losses.nce_loss.
NCEAverage
(memory_params, negative_sampling_params, T=0.07, Z=- 1, loss_type='nce')[source]¶ Bases:
torch.nn.modules.module.Module
Computes the scores of the model embeddings against the `positive’ and `negative’ samples from the Memory Bank. This class does NOT compute the actual loss, just the scores, i.e., inner products followed by normalizations/exponentiation etc.
vissl.losses.deepclusterv2_loss¶
-
class
vissl.losses.deepclusterv2_loss.
DeepClusterV2Loss
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Bases:
classy_vision.losses.classy_loss.ClassyLoss
Loss used for DeepClusterV2 approach as provided in SwAV paper https://arxiv.org/abs/2006.09882
- Config params:
DROP_LAST (bool): automatically inferred from DATA.TRAIN.DROP_LAST BATCHSIZE_PER_REPLICA (int): 256 # automatically inferred from
DATA.TRAIN.BATCHSIZE_PER_REPLICA
num_crops (int): 2 # automatically inferred from DATA.TRAIN.TRANSFORMS temperature (float): 0.1 num_clusters (List[int]): [3000, 3000, 3000] kmeans_iters (int): 10 crops_for_mb: [0] embedding_dim: 128 num_train_samples (int): -1 # @auto-filled
-
classmethod
from_config
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Instantiates DeepClusterV2Loss from configuration.
- Parameters
loss_config – configuration for the loss
- Returns
DeepClusterV2Loss instance.
-
forward
(output: torch.Tensor, idx: int)[source]¶
vissl.losses.cross_entropy_multiple_output_single_target¶
-
class
vissl.losses.cross_entropy_multiple_output_single_target.
CrossEntropyMultipleOutputSingleTargetLoss
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Bases:
classy_vision.losses.classy_loss.ClassyLoss
Intializer for the sum cross-entropy loss. For a single tensor, this is equivalent to the cross-entropy loss. For a list of tensors, this computes the sum of the cross-entropy losses for each tensor in the list against the target.
- Config params:
weight: weight of sample, optional ignore_index: sample should be ignored for loss, optional reduction: specifies reduction to apply to the output, optional temperature: specify temperature for softmax. Default 1.0
-
classmethod
from_config
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Instantiates CrossEntropyMultipleOutputSingleTargetLoss from configuration.
- Parameters
loss_config – configuration for the loss
- Returns
CrossEntropyMultipleOutputSingleTargetLoss instance.
-
forward
(output: Union[torch.Tensor, List[torch.Tensor]], target: torch.Tensor)[source]¶ For each output and single target, loss is calculated. The returned loss value is the sum loss across all outputs.
vissl.hooks package¶
-
class
vissl.hooks.
SSLClassyHookFunctions
(value)[source]¶ Bases:
enum.Enum
Enumeration of all the hook functions in the ClassyHook class.
-
on_start
= 1¶
-
on_phase_start
= 2¶
-
on_forward
= 3¶
-
on_loss_and_meter
= 4¶
-
on_backward
= 5¶
-
on_update
= 6¶
-
on_step
= 7¶
-
on_phase_end
= 8¶
-
on_end
= 9¶
-
-
vissl.hooks.
default_hook_generator
(cfg: vissl.utils.hydra_config.AttrDict) → List[classy_vision.hooks.classy_hook.ClassyHook][source]¶ The utility function that prepares all the hoooks that will be used in training based on user selection. Some basic hooks are used by default.
- Optional hooks:
Tensorboard hook,
loss specific hooks (swav loss, deepcluster loss, moco loss) used only when the loss is being used
model complexity hook (if user wants to compute model flops, activations, params) enable the hook via MODEL.MODEL_COMPLEXITY.COMPUTE_COMPLEXITY = True
- Returns
hooks (List(functions)) – list containing the hook functions that will be used
vissl.hooks.deepclusterv2_hooks module¶
-
class
vissl.hooks.deepclusterv2_hooks.
InitMemoryHook
[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
Initialize the memory banks. Valid only for DeepClusterV2 training
-
on_phase_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_forward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_loss_and_meter
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_update
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_step
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
-
class
vissl.hooks.deepclusterv2_hooks.
ClusterMemoryHook
[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
Cluster the memory banks with distributed k-means. Valid only for DeepClusterV2 trainings.
-
on_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_forward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_loss_and_meter
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_update
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_step
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
vissl.hooks.log_hooks module¶
All the hooks involved in human-readable logging
-
class
vissl.hooks.log_hooks.
LogGpuStatsHook
[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
Hook executed at the start of training and after every training iteration is done. Logs Gpu nvidia-smi stats to logger streams: at the start of training and after 50 training iterations.
-
on_forward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_loss_and_meter
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_update
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
-
class
vissl.hooks.log_hooks.
LogLossLrEtaHook
(btime_freq: Optional[int] = None)[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
Hook executed after every parameters update step. Logs training stats like: LR, iteration, ETA, batch time etc to logger streams.
-
on_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_forward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_step
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_loss_and_meter
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
__init__
(btime_freq: Optional[int] = None) → None[source]¶ - Parameters
btime_freq – if specified, logs average batch time of rolling_freq batches also.
-
on_update
(task: classy_vision.tasks.classy_task.ClassyTask) → None[source]¶ Executed after after parameter update. If the current phase is training, and it’s a logging iteration, we compute and log several helpul training stats to keep track of ongoing training.
For monitoring the batch size (average training iteration time), we allow monitoring the stats (optionally) for every N iterations to get better idea about the batch time and training eta.
Set the btime_freq input using cfg.PERF_STAT_FREQUENCY=N ensuring that cfg.MONITOR_PERF_STATS = True.
-
-
class
vissl.hooks.log_hooks.
LogLossMetricsCheckpointHook
[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
Hook called after every forward pass (to check training doesn’t give NaN), after every step and at the end of epoch (to check if the model should be checkpointed) and print the meters values at the end of every phase.
-
on_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_loss_and_meter
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_update
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_forward
(task: classy_vision.tasks.classy_task.ClassyTask) → None[source]¶ Called each time a model forward is done and make sure that the model forward output is not NaN. If we encounter NaN as the model output, we checkpoint the model to enable debugging and also checkpoint the model input sample, model output.
-
-
class
vissl.hooks.log_hooks.
LogPerfTimeMetricsHook
(log_freq: Optional[int] = None)[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
Computes and prints performance metrics. Logs at the end of a phase or every log_freq if specified by user.
-
on_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_forward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_update
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_step
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
__init__
(log_freq: Optional[int] = None) → None[source]¶ - Parameters
log_freq – if specified, logs every log_freq batches also.
-
on_phase_start
(task: classy_vision.tasks.classy_task.ClassyTask) → None[source]¶ Initialize start time and reset perf stats
-
vissl.hooks.moco_hooks module¶
-
class
vissl.hooks.moco_hooks.
MoCoHook
(momentum: float, shuffle_batch: bool = True)[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
This hook corresponds to the loss proposed in the “Momentum Contrast for Unsupervised Visual Representation Learning” paper, from Kaiming He et al. See http://arxiv.org/abs/1911.05722 for details and https://github.com/facebookresearch/moco for a reference implementation, reused here.
Called after every forward pass to update the momentum encoder. At the beginning of training i.e. after 1st forward call, the encoder is contructed and updated.
-
on_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_loss_and_meter
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_step
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_update
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_forward
(task: classy_vision.tasks.classy_task.ClassyTask) → None[source]¶ Update the momentum encoder.
Compute the key reusing the updated moco-encoder. If we use the batch shuffling, the perform global shuffling of the batch and then run the moco encoder to compute the features. We unshuffle the computer features and use the features as “key” in computing the moco loss.
-
vissl.hooks.state_update_hooks module¶
-
class
vissl.hooks.state_update_hooks.
SSLModelComplexityHook
[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
Logs the number of paramaters, forward pass FLOPs and activations of the model. Adapted from: https://github.com/facebookresearch/ClassyVision/blob/master/classy_vision/hooks/model_complexity_hook.py#L20 # NOQA
-
on_phase_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_forward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_loss_and_meter
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_update
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_step
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
-
class
vissl.hooks.state_update_hooks.
SetDataSamplerEpochHook
[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
We use DistributedDataSampler for sampling the data. At the beginnning of each training epoch/phase, we need to set the epoch for the sampler.
-
on_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_forward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_loss_and_meter
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_update
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_step
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_start
(task: classy_vision.tasks.classy_task.ClassyTask) → None[source]¶ Called at the start of each epoch or phase to set the data sampler epoch. This is important to ensure the data is shuffled and the shuffling can be reproduced deterministically if the training is resumed from a checkpoint.
-
-
class
vissl.hooks.state_update_hooks.
UpdateBatchesSeenHook
[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
Book-keeping only hook. Tracks how many forward passes have been done. aka how many batches have been seen by the trainer irrespective of the train or test phase. updates task.batches
-
on_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_loss_and_meter
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_update
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
-
class
vissl.hooks.state_update_hooks.
UpdateTrainIterationNumHook
[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
Book-keeping hook: updates the training iteration number (only updated if it’s a training phase). task.iteration is updated.
-
on_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_loss_and_meter
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_update
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
-
class
vissl.hooks.state_update_hooks.
UpdateTrainBatchTimeHook
[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
After after parameters update step (training phase), we update the batch time aka the training time for the current iteration.
-
on_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_forward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_loss_and_meter
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
-
class
vissl.hooks.state_update_hooks.
UpdateTestBatchTimeHook
[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
Include the batch time for test phase as well and called every time loss has been computed. Only updates task.batch_time if it’s a test phase and train phase is already updated by UpdateTrainBatchTimeHook hook.
-
on_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_forward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_step
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
-
class
vissl.hooks.state_update_hooks.
CheckNanLossHook
[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
After every loss computation, verify the loss is not infinite. Called for both training/test phase.
-
on_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_forward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_step
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
-
class
vissl.hooks.state_update_hooks.
FreezeParametersHook
[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
Hook that helps to freeze some specified model parameters for certain number of training iterations. The parameters are specified in a dictionary containing {param_name: frozen_iterations}. Used in SwAV training.
-
on_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_forward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_loss_and_meter
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_update
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_step
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
vissl.hooks.swav_hooks module¶
-
class
vissl.hooks.swav_hooks.
SwAVUpdateQueueScoresHook
[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
Update queue scores, useful with small batches and helps getting meaningful gradients.
-
on_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_loss_and_meter
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_update
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_step
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
-
class
vissl.hooks.swav_hooks.
NormalizePrototypesHook
[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
L2 Normalize the prototypes in swav training. Optional.
-
on_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_forward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_loss_and_meter
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
vissl.hooks.swav_momentum_hooks module¶
-
class
vissl.hooks.swav_momentum_hooks.
SwAVMomentumHook
(momentum: float, momentum_eval_mode_iter_start: int, crops_for_assign: List[int])[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
This hook is for the extension of the SwAV loss proposed in paper https://arxiv.org/abs/2006.09882 by Caron et al. The loss combines the benefits of using the SwAV approach with the momentum encoder as used in MoCo.
-
on_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_loss_and_meter
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_step
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_update
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
-
class
vissl.hooks.swav_momentum_hooks.
SwAVMomentumNormalizePrototypesHook
[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
L2 Normalize the prototypes in swav training. Optional. We normalize the momentum_encoder output prototypes as well additionally.
-
on_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_forward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_loss_and_meter
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_phase_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
vissl.hooks.tensorboard_hooks module¶
-
class
vissl.hooks.tensorboard_hook.
SSLTensorboardHook
(tb_writer: torch.utils.tensorboard.writer.SummaryWriter, log_params: bool = False, log_params_every_n_iterations: int = - 1, log_params_gradients: bool = False)[source]¶ Bases:
classy_vision.hooks.classy_hook.ClassyHook
SSL Specific variant of the Classy Vision tensorboard hook
-
on_loss_and_meter
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_backward
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_start
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_end
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
on_step
(*args, **kwargs) → None¶ Derived classes can set their hook functions to this.
This is useful if they want those hook functions to not do anything.
-
__init__
(tb_writer: torch.utils.tensorboard.writer.SummaryWriter, log_params: bool = False, log_params_every_n_iterations: int = - 1, log_params_gradients: bool = False) → None[source]¶ The constructor method of SSLTensorboardHook.
- Parameters
tb_writer – Tensorboard SummaryWriter instance
log_params (bool) – whether to log model params to tensorboard
log_params_every_n_iterations (int) – frequency at which parameters should be logged to tensorboard
log_params_gradients (bool) – whether to log params gradients as well to tensorboard.
-
on_forward
(task: classy_vision.tasks.classy_task.ClassyTask) → None[source]¶ Called after every forward if tensorboard hook is enabled. Logs the model parameters if the training iteration matches the logging frequency.
-
on_phase_start
(task: classy_vision.tasks.classy_task.ClassyTask) → None[source]¶ Called at the start of every epoch if the tensorboard hook is enabled. Logs the model parameters once at the beginning of training only.
-
on_phase_end
(task: classy_vision.tasks.classy_task.ClassyTask) → None[source]¶ Called at the end of every epoch if the tensorboard hook is enabled. Log model parameters and/or parameter gradients as set by user in the tensorboard configuration. Also resents the CUDA memory counter.
-
on_update
(task: classy_vision.tasks.classy_task.ClassyTask) → None[source]¶ Called after every parameters update if tensorboard hook is enabled. Logs the parameter gradients if they are being set to log, log the scalars like training loss, learning rate, average training iteration time, batch size per gpu, img/sec/gpu, ETA, gpu memory used, peak gpu memory used.
-
vissl.optimizers package¶
-
vissl.optimizers.
get_optimizer_param_groups
(model, model_config, optimizer_config, optimizer_schedulers)[source]¶ Go through all the layers, sort out which parameters should be regularized, unregularized and optimization settings for the head/trunk. We filter the trainable params only and add them to the param_groups.
- Returns
param_groups (List[Dict]) –
- [
- {
“params”: trunk_regularized_params, “lr”: lr_value, “weight_decay”: wd_value,
}, {
”params”: trunk_unregularized_params, “lr”: lr_value, “weight_decay”: 0.0,
}, {
”params”: head_regularized_params, “lr”: head_lr_value, “weight_decay”: head_weight_decay,
}, {
”params”: head_unregularized_params, “lr”: head_lr_value, “weight_decay”: 0.0,
}, {
”params”: remaining_regularized_params, “lr”: lr_value
}
]
vissl.optimizers.optimizer_helper module¶
-
vissl.optimizers.optimizer_helper.
get_optimizer_param_groups
(model, model_config, optimizer_config, optimizer_schedulers)[source]¶ Go through all the layers, sort out which parameters should be regularized, unregularized and optimization settings for the head/trunk. We filter the trainable params only and add them to the param_groups.
- Returns
param_groups (List[Dict]) –
- [
- {
“params”: trunk_regularized_params, “lr”: lr_value, “weight_decay”: wd_value,
}, {
”params”: trunk_unregularized_params, “lr”: lr_value, “weight_decay”: 0.0,
}, {
”params”: head_regularized_params, “lr”: head_lr_value, “weight_decay”: head_weight_decay,
}, {
”params”: head_unregularized_params, “lr”: head_lr_value, “weight_decay”: 0.0,
}, {
”params”: remaining_regularized_params, “lr”: lr_value
}
]
vissl.optimizers.param_scheduler.cosine_warm_restart_scheduler module¶
-
class
vissl.optimizers.param_scheduler.cosine_warm_restart_scheduler.
CosineWaveTypes
(value)[source]¶ -
An enumeration.
-
half
= 'half'¶
-
full
= 'full'¶
-
-
class
vissl.optimizers.param_scheduler.cosine_warm_restart_scheduler.
CosineWarmRestartScheduler
(start_value: float, end_value: float, restart_interval_length: float, wave_type: str, lr_multiplier: float, is_adaptive: bool, update_interval: classy_vision.optim.param_scheduler.classy_vision_param_scheduler.UpdateInterval = <UpdateInterval.STEP: 'step'>)[source]¶ Bases:
classy_vision.optim.param_scheduler.classy_vision_param_scheduler.ClassyParamScheduler
Changes the param value after every epoch based on a cosine schedule. The schedule is updated after every train step by default.
Can be used for cosine learning rate with warm restarts. For restarts, we calculate what will be the maximum learning rate after every restart. There are 3 options:
Option 1: LR after every restart is same as original max LR
Option 2: LR after every restart decays with a fixed LR multiplier
- Option 3: LR after every restart is adaptively calculated such that the resulting
max LR matches the original cosine wave LR.
- Parameters
wave_type – half | full
lr_multiplier – float value -> LR after every restart decays with a fixed LR multiplier
is_adaptive – True -> if after every restart, maximum LR is adaptively calculated such that the resulting max LR matches the original cosine wave LR.
update_interval – step | epoch -> if the LR should be updated after every training iteration or after training epoch
Example
start_value: 0.1 end_value: 0.0001 restart_interval_length: 0.5 # for 1 restart wave_type: half lr_multiplier: 1.0 # for using a decayed max LR value at every restart use_adaptive_decay: False
-
classmethod
from_config
(config: Dict[str, Any]) → vissl.optimizers.param_scheduler.cosine_warm_restart_scheduler.CosineWarmRestartScheduler[source]¶ Instantiates a CosineWarmRestartScheduler from a configuration.
- Parameters
config – A configuration for a CosineWarmRestartScheduler. See
__init__()
for parameters expected in the config.- Returns
A CosineWarmRestartScheduler instance.
vissl.optimizers.param_scheduler.inverse_sqrt_decay module¶
-
class
vissl.optimizers.param_scheduler.inverse_sqrt_decay.
InverseSqrtScheduler
(start_value: float, warmup_interval_length: float, update_interval: classy_vision.optim.param_scheduler.classy_vision_param_scheduler.UpdateInterval = <UpdateInterval.STEP: 'step'>)[source]¶ Bases:
classy_vision.optim.param_scheduler.classy_vision_param_scheduler.ClassyParamScheduler
Decay the LR based on the inverse square root of the update number.
Example
start_value: 4.8 warmup_interval_length: 0.1
Corresponds to a inverse sqrt decay schedule with values in [4.8, 0]
-
classmethod
from_config
(config: Dict[str, Any]) → vissl.optimizers.param_scheduler.inverse_sqrt_decay.InverseSqrtScheduler[source]¶ Instantiates a InverseSqrtScheduler from a configuration.
- Parameters
config – A configuration for a InverseSqrtScheduler. See
__init__()
for parameters expected in the config.- Returns
A InverseSqrtScheduler instance.
-
classmethod
vissl.trainer package¶
-
vissl.trainer.trainer_main.
build_task
(config)[source]¶ Builds a ClassyTask from a config.
This assumes a ‘name’ key in the config which is used to determine what task class to instantiate. For instance, a config {“name”: “my_task”, “foo”: “bar”} will find a class that was registered as “my_task” (see
register_task()
) and call .from_config on it.
-
class
vissl.trainer.trainer_main.
SelfSupervisionTrainer
(cfg: vissl.utils.hydra_config.AttrDict, dist_run_id: str, checkpoint_path: str = None, checkpoint_folder: str = None, hooks: List[classy_vision.hooks.classy_hook.ClassyHook] = None)[source]¶ Bases:
object
The main entry point for any training or feature extraction workflows in VISSL.
The trainer constructs a train_task which prepares all the components of the training (optimizer, loss, meters, model etc) using the settings specified by user in the yaml config file. See the vissl/trainer/train_task.py for more details.
- Parameters
cfg (AttrDict) – user specified input config that has optimizer, loss, meters etc settings relevant to the training
dist_run_id (str) –
For multi-gpu training with PyTorch, we have to specify how the gpus are going to rendezvous. This requires specifying the communication method: file, tcp and the unique rendezvous run_id that is specific to 1 run. We recommend:
for 1node: use init_method=tcp and run_id=auto
2) for multi-node, use init_method=tcp and specify run_id={master_node}:{port}
checkpoint_path (str) – if the training is being resumed from a checkpoint, path to the checkpoint. The tools/run_distributed_engines.py automatically looks for the checkpoint in the checkpoint directory.
checkpoint_folder (str) – what directory to use for checkpointing. The tools/run_distributed_engines.py creates the directory based on user input in the yaml config file.
hooks (List[ClassyHooks]) – the list of hooks to use during the training. The hooks vissl/engines/{train, extract_features}.py determine the hooks.
-
setup_distributed
(use_gpu: bool)[source]¶ Setup the distributed training. VISSL support both GPU and CPU only training.
Initialize the torch.distributed.init_process_group if the distributed is not already initialized. The init_method, backend are specified by user in the yaml config file. See vissl/defaults.yaml file for description on how to set init_method, backend.
We also set the global cuda device index using torch.cuda.set_device or cpu device
-
train
()[source]¶ The train workflow. We get the training loop to use (vissl default is standard_train_step) but the user can create their own training loop and specify the name TRAINER.TRAIN_STEP_NAME
The training happens: 1. Execute any hooks at the start of training (mostly resets the variable like
iteration num phase_num etc)
For each epoch (train or test), run the hooks at the start of an epoch. Mostly involves setting things like timer, setting dataloader epoch etc
Execute the training loop (1 training iteration) involving forward, loss, backward, optimizer update, metrics collection etc.
At the end of epoch, sync meters and execute hooks at the end of phase. Involves things like checkpointing model, logging timers, logging to tensorboard etc
-
extract
()[source]¶ Extract workflow supports multi-gpu feature extraction. Since we are only extracting features, only the model is built (and initialized from some model weights file if specified by user). The model is set to the eval mode fully.
The features are extracted for whatever data splits (train, val, test) etc that user wants.
vissl.train_task package¶
-
class
vissl.trainer.train_task.
SelfSupervisionTask
(config: vissl.utils.hydra_config.AttrDict)[source]¶ Bases:
classy_vision.tasks.classification_task.ClassificationTask
A task prepares and holds all the components of a training like optimizer, datasets, dataloaders, losses, meters etc. Task also contains the variable like training iteration, epoch number etc. that are updated during the training.
We prepare every single component according to the parameter settings user wants and specified in the yaml config file.
Task also supports 2 additional things: 1) converts the model BatchNorm layers to the synchronized batchnorm 2) sets mixed precision (apex and pytorch both supported)
-
set_device
()[source]¶ Set the training device: whether gpu or cpu. We use the self.device in the rest of the workflow to determine if we should do cpu only training or use gpu. set MACHINE.DEVICE = “gpu” or “cpu”
-
set_ddp_bucket_cap_mb
()[source]¶ PyTorch DDP supports setting the bucket_cap_mb for all reduce. Tuning this parameter can help with the speed of the model. We use the default pytorch value of 25MB.
-
set_available_splits
()[source]¶ Given the data settings, we determine if we are using both train and test datasets. If TEST_MODEL=true, we will add the test to the available_splits. If TEST_ONLY=false, we add train to the split as well.
-
set_amp_args
()[source]¶ Two automatic mixed precision implementations are available: Apex’s and PyTorch’s.
If Apex’s AMP is enabled, amp_args is a dictionary containing arguments
to be passed to amp.initialize. Set to None to disable amp. To enable mixed precision training, pass amp_args={“opt_level”: “O1”} here. See https://nvidia.github.io/apex/amp.html for more info.
If Pytorch’s AMP is enabled, no arguments are needed.
-
set_iteration
(iteration)[source]¶ Set the iteration number. we maintain and store the iteration in the state itself. It counts total number of iterations we do in training phases. Updated after every forward pass of training step in UpdateTrainIterationNumHook. Starts from 1
-
get_config
()[source]¶ Utility function to store and use the config that was used for the given training.
-
build_datasets
()[source]¶ Get the datasets for the data splits we will use in the training. The set_available_splits variable determines the splits used in the training.
-
build_dataloaders
(pin_memory: bool) → torch.utils.data.dataloader.DataLoader[source]¶ Build PyTorch dataloaders for all the available_splits. We construct the standard PyTorch Dataloader and allow setting all dataloader options.
-
get_global_batchsize
()[source]¶ Return global batchsize used in the training across all the trainers. We check what phase we are in (train or test) and get the dataset used in that phase. We call get_global_batchsize() of the dataset.
-
recreate_data_iterator
(phase_type, epoch, compute_start_iter)[source]¶ Recreate data iterator (including multiprocessing workers) and destroy the previous iterators.
This is called when we load a new checkpoint or when phase changes during the training (one epoch to the next). DataSampler may need to be informed on those events to update the epoch and start_iteration so that the data is deterministically shuffled, so we call them here.
-
run_hooks
(hook_function_name)[source]¶ Override the ClassyTask run_hook function and run the hooks whenever called
-
prepare_optimizer
()[source]¶ Constructs the optimizer using the user defined settings in the yaml config. The model must be on the correct device (cuda or cpu) by this point.
-
prepare
(pin_memory: bool = False)[source]¶ Prepares the task: - dataloaders - model - copy model to correct device - meters - loss - optimizer - LR schedulers - AMP state - resume from a checkpoint if available
-
prepare_extraction
(pin_memory: bool = False)[source]¶ Prepares a light-weight task for feature extraction on multi-gpu. The model runs in eval mode only.
-
property
enable_manual_gradient_reduction
¶ Lazily initial the enable flag once when model is not None.
-
vissl.trainer.train_steps module¶
Here we create all the custom train steps required for SSL model trainings.
-
vissl.trainer.train_steps.
register_train_step
(name)[source]¶ Registers Self-Supervision Train step.
This decorator allows VISSL to add custom train steps, even if the train step itself is not part of VISSL. To use it, apply this decorator to a train step function, like this:
@register_train_step('my_step_name') def my_step_name(): ...
To get a train step from a configuration file, see
get_train_step()
.
vissl.trainer.train_steps.standard_train_step module¶
This is the train step that”s most commonly used in most of the model trainings.
-
vissl.trainer.train_steps.standard_train_step.
construct_sample_for_model
(batch_data, task)[source]¶ Given the input batch from the dataloader, verify the input is as expected: the input data and target data is present in the batch. In case of multi-input trainings like PIRL, make sure the data is in right format i.e. the multiple input should be nested under a common key “input”.
-
vissl.trainer.train_steps.standard_train_step.
standard_train_step
(task)[source]¶ Single training iteration loop of the model.
Performs: data read, forward, loss computation, backward, optimizer step, parameter updates.
Various intermediate steps are also performed: - logging the training loss, training eta, LR, etc to loggers - logging to tensorboard, - performing any self-supervised method specific operations (like in MoCo approach, the momentum encoder is updated), computing the scores in swav - checkpointing model if user wants to checkpoint in the middle of an epoch
vissl.utils package¶
vissl.utils.instance_retrieval_utils.data_util module¶
-
vissl.utils.instance_retrieval_utils.data_util.
is_revisited_dataset
(dataset_name: str)[source]¶ Computes whether the specified dataseet name is a revisited version of the oxford and paris datasets. simply looks for pattern “roxford5k” and “rparis6k” in specified dataset_name.
-
vissl.utils.instance_retrieval_utils.data_util.
is_instre_dataset
(dataset_name: str)[source]¶ Returns True if the dataset name is “instre”. Helper function used in code at several places.
-
vissl.utils.instance_retrieval_utils.data_util.
is_whiten_dataset
(dataset_name: str)[source]¶ Returns if the dataset specified has name “whitening”. User can use any dataset they want for whitening.
-
vissl.utils.instance_retrieval_utils.data_util.
add_bias_channel
(x, dim: int = 1)[source]¶ Adds a bias channel useful during pooling + whitening operation.
-
vissl.utils.instance_retrieval_utils.data_util.
flatten
(x: torch.Tensor, keepdims: bool = False)[source]¶ Flattens B C H W input to B C*H*W output, optionally retains trailing dimensions.
-
vissl.utils.instance_retrieval_utils.data_util.
gem
(x: torch.Tensor, p: int = 3, eps: float = 1e-06, clamp: bool = True, add_bias: bool = False, keepdims: bool = False)[source]¶ Gem pooling on the given tensor.
- Parameters
x (torch.Tensor) – tensor on which the pooling should be done
p (int) – pooling number. If p=inf then simply perform max_pool2d If p=1 and x tensor has grad, simply perform avg_pool2d else, perform Gem pooling for specified p
eps (float) – if clamping the x tensor, use the eps for clamping
clamp (float) – whether to clamp the tensor
add_bias (bool) – whether to add the biad channel
keepdims (bool) – whether to flatten or keep the dimensions as is
- Returns
x (torch.Tensor) – Gem pooled tensor
-
vissl.utils.instance_retrieval_utils.data_util.
l2n
(x: torch.Tensor, eps: float = 1e-06, dim: int = 1)[source]¶ L2 normalize the input tensor along the specified dimension
- Parameters
x (torch.Tensor) – the tensor to normalize
eps (float) – epsilon to use to normalize to avoid the inf output
dim (int) – along which dimension to L2 normalize
- Returns
x (torch.Tensor) – L2 normalized tensor
-
class
vissl.utils.instance_retrieval_utils.data_util.
MultigrainResize
(size: int, largest: bool = False, **kwargs)[source]¶ Bases:
torchvision.transforms.transforms.Resize
Resize with a largest=False argument allowing to resize to a common largest side without cropping Approach used in the Multigrain paper https://arxiv.org/pdf/1902.05509.pdf
-
class
vissl.utils.instance_retrieval_utils.data_util.
WhiteningTrainingImageDataset
(base_dir: str, image_list_file: str, num_samples: int = 0)[source]¶ Bases:
object
A set of training images for whitening
-
class
vissl.utils.instance_retrieval_utils.data_util.
InstreDataset
(dataset_path: str, num_samples: int = 0)[source]¶ Bases:
object
A dataset class that reads and parses the Instre Dataset so it’s ready to be used in the code for retrieval evaluations
-
class
vissl.utils.instance_retrieval_utils.data_util.
RevisitedInstanceRetrievalDataset
(dataset: str, dir_main: str)[source]¶ Bases:
object
A dataset class used for the Revisited Instance retrieval datasets: Revisited Oxford and Revisited Paris. The object reads and parses the datasets so it’s ready to be used in the code for retrieval evaluations.
-
class
vissl.utils.instance_retrieval_utils.data_util.
InstanceRetrievalImageLoader
(S, transforms)[source]¶ Bases:
object
The custom loader for the Paris and Oxford Instance Retrieval datasets.
-
load_and_prepare_whitening_image
(fname)[source]¶ from the filename, load the whitening image and prepare it to be used by applying data transforms
-
load_and_prepare_instre_image
(fname)[source]¶ from the filename, load the db or query image and prepare it to be used by applying data transforms
-
-
class
vissl.utils.instance_retrieval_utils.data_util.
InstanceRetrievalDataset
(path, eval_binary_path, num_samples=None)[source]¶ Bases:
object
A dataset class used for the Instance retrieval datasets: Oxford and Paris. The object reads and parses the datasets so it’s ready to be used in the code for retrieval evaluations.
Credits: https://github.com/facebookresearch/deepcluster/blob/master/eval_retrieval.py # NOQA Adapted by: Priya Goyal (prigoyal@fb.com)
vissl.utils.instance_retrieval_utils.evaluate module¶
-
vissl.utils.instance_retrieval_utils.evaluate.
score_ap_from_ranks_1
(ranks, nres)[source]¶ Compute the average precision of one search.
- Parameters
ranks – ordered list of ranks of true positives
nres – total number of positives in dataset
- Returns
ap (float) – the average precision following the Holidays and the INSTRE package
-
vissl.utils.instance_retrieval_utils.evaluate.
compute_ap
(ranks, nres)[source]¶ Computes average precision for given ranked indexes.
- Parameters
ranks – zero-based ranks of positive images
nres – number of positive images
- Returns
ap (float) – average precision
-
vissl.utils.instance_retrieval_utils.evaluate.
compute_map
(ranks, gnd, kappas)[source]¶ Computes the mAP for a given set of returned results.
- Credits:
https://github.com/filipradenovic/revisitop/blob/master/python/evaluate.py
- Usage:
- map = compute_map (ranks, gnd)
computes mean average precsion (map) only
- map, aps, pr, prs = compute_map (ranks, gnd, kappas)
- -> computes mean average precision (map), average precision (aps) for
each query
- -> computes mean precision at kappas (pr), precision at kappas (prs) for
each query
Notes: 1) ranks starts from 0, ranks.shape = db_size X #queries 2) The junk results (e.g., the query itself) should be declared in the gnd
stuct array
If there are no positive images for some query, that query is excluded from the evaluation
vissl.utils.instance_retrieval_utils.pca module¶
vissl.utils.instance_retrieval_utils.rmac module¶
-
vissl.utils.instance_retrieval_utils.rmac.
normalize_L2
(a, dim)[source]¶ L2 normalize the input tensor along the specified dimension
- Parameters
a (torch.Tensor) – the tensor to normalize
dim (int) – along which dimension to L2 normalize
- Returns
a (torch.Tensor) – L2 normalized tensor
-
vissl.utils.instance_retrieval_utils.rmac.
get_rmac_region_coordinates
(H, W, L)[source]¶ Almost verbatim from Tolias et al Matlab implementation. Could be heavily pythonized, but really not worth it… Desired overlap of neighboring regions
-
vissl.utils.instance_retrieval_utils.rmac.
get_rmac_descriptors
(features, rmac_levels, pca=None)[source]¶ RMAC descriptors. Coordinates are retrieved following Tolias et al. L2 normalize the descriptors and optionally apply PCA on the descriptors if specified by the user. After PCA, aggregate the descriptors (sum) and normalize the aggregated descriptor and return.
vissl.utils.svm_utils.evaluate module¶
-
vissl.utils.svm_utils.evaluate.
calculate_ap
(rec, prec)[source]¶ Computes the AP under the precision recall curve.
-
vissl.utils.svm_utils.evaluate.
get_precision_recall
(targets, scores, weights=None)[source]¶ [P, R, score, ap] = get_precision_recall(targets, scores, weights)
- Parameters
targets – number of occurrences of this class in the ith image
scores – score for this image
weights – 0 or 1 whether where 0 means we should ignore the sample
- Returns
P, R – precision and recall score: score which corresponds to the particular precision and recall ap: average precision
vissl.utils.svm_utils.svm_trainer module¶
-
class
vissl.utils.svm_utils.svm_trainer.
SVMTrainer
(config, layer, output_dir)[source]¶ Bases:
object
SVM trainer that takes care of training (using k-fold cross validation), and evaluating the SVMs
-
load_input_data
(data_file, targets_file)[source]¶ Given the input data (features) and targets (labels) files, load the features of shape N x D and labels of shape (N,)
-
get_best_cost_value
()[source]¶ During the SVM training, we write the cross vaildation AP value for training at each class and cost value combination. We load the AP values and for each class, determine the cost value that gives the maximum AP. We return the chosen cost values for each class as a numpy matrix.
-
train_cls
(features, targets, cls_num)[source]¶ Train SVM on the input features and targets for a given class. The SVMs are trained for all costs values for the given class. We also save the cross-validation AP at each cost value for the given class.
-
train
(features, targets)[source]¶ Train SVMs on the given features and targets for all classes and all the costs values.
-
test
(features, targets)[source]¶ Test the trained SVM models on the test features and targets values. We use the cost per class that gives the maximum cross validation AP on the training and load the correspond trained SVM model for the cost value and the class.
Log the test ap to stdout and also save the AP in a file.
-
vissl.utils.svm_utils.svm_low_shot_trainer module¶
-
class
vissl.utils.svm_utils.svm_low_shot_trainer.
SVMLowShotTrainer
(config, layer, output_dir)[source]¶ Bases:
vissl.utils.svm_utils.svm_trainer.SVMTrainer
Train the SVM for the low-shot image classification tasks. Currently, datasets like VOC07 and Places205 are supported.
The trained inherits from the SVMTrainer class and takes care of training SVM, evaluating, and aggregate the metrics.
-
train
(features, targets, sample_num, low_shot_kvalue)[source]¶ Train SVM on the input features and targets for a given low-shot k-value and the independent low-shot sample number.
- We save the trained SVM model for each combination:
cost value, class number, sample number, k-value
-
test
(features, targets, sample_num, low_shot_kvalue)[source]¶ - Test the SVM for the input test features and targets for the given:
low-shot k-value, sample number
We compute the meanAP across all classes for a given cost value. We get the output matrix of shape (1, #costs) for the given sample_num and k-value and save the matrix. We use this information to aggregate later.
-
aggregate_stats
(k_values, sample_inds)[source]¶ Aggregate the test AP across all k-values and independent samples.
For each low-shot k-value, we obtain the mean, max, min, std AP value. Steps:
For each k-value, get the min/max/mean/std value across all the independent samples. This results in matrices [#k-values x #classes]
Then we aggregate stats across the classes. For the mean stats in step 1, for each k-value, we get the class which has maximum mean.
-
vissl.utils.activation_checkpointing module¶
This module centralizes all activation checkpointing related code. It is a work-in-progress as we evolve the APIs and eventually put this in fairscale so that multiple projects can potentially share it.
-
vissl.utils.activation_checkpointing.
manual_gradient_reduction
(model: torch.nn.modules.module.Module, config_flag: bool) → bool[source]¶ Return if we should use manual gradient reduction or not.
We should use manual DDP if config says so and model is wrapped by DDP.
-
vissl.utils.activation_checkpointing.
manual_sync_params
(model: torch.nn.parallel.distributed.DistributedDataParallel) → None[source]¶ Manually sync params and buffers for DDP.
-
vissl.utils.activation_checkpointing.
manual_gradient_all_reduce
(model: torch.nn.parallel.distributed.DistributedDataParallel) → None[source]¶ Gradient reduction function used after backward is done.
vissl.utils.checkpoint module¶
-
vissl.utils.checkpoint.
is_training_finished
(cfg: vissl.utils.hydra_config.AttrDict, checkpoint_folder: str)[source]¶ Given the checkpoint folder, we check that there’s not already a final checkpoint If the final checkpoint exists but the user wants to override the final checkpoint then we mark training as not finished.
-
vissl.utils.checkpoint.
get_checkpoint_folder
(config: vissl.utils.hydra_config.AttrDict)[source]¶ Check, create and return the checkpoint folder. User can specify their own checkpoint directory otherwise the default “.” is used.
Optionally, for training that involves more than 1 machine, we allow to append the distributed run id which helps to uniquely identify the training. This is completely optional and user can se APPEND_DISTR_RUN_ID=true for this.
-
vissl.utils.checkpoint.
is_checkpoint_phase
(mode_num: int, mode_frequency: int, train_phase_idx: int, num_epochs: int, mode: str)[source]¶ Determines if a checkpoint should be saved on current epoch. If epoch=1, then we check whether to save at current iteration or not.
- Parameters
mode (str) – what model we are checkpointing models at - every few iterations or at the end of every phase/epoch. The mode is encoded in the checkpoint filename.
mode_num (int) – what is the current iteration or epoch number that we are trying to checkpoint at.
mode_frequency (int) – checkpoint frequency - every N iterations or every N epochs/phase
train_phase_idx (int) – the current training phase we are in. Starts from 0
num_epochs (int) – total number of epochs in training
- Returns
checkpointing_phase (bool) – whether the model should be checkpointed or not
-
vissl.utils.checkpoint.
has_checkpoint
(checkpoint_folder: str, skip_final: bool = False)[source]¶ Check whether there are any checkpoints at all in the checkpoint folder.
-
vissl.utils.checkpoint.
has_final_checkpoint
(checkpoint_folder: str, final_checkpoint_pattern: str = 'model_final')[source]¶ Check whether the final checkpoint exists in the checkpoint folder. The final checkpoint is recognized by the prefix “model_final_” in VISSL.
-
vissl.utils.checkpoint.
get_checkpoint_resume_files
(checkpoint_folder: str, config: vissl.utils.hydra_config.AttrDict, skip_final: bool = False, latest_checkpoint_resume_num: int = 1)[source]¶ Get the checkpoint file from which the model should be resumed. We look at all the checkpoints in the checkpoint_folder and if the final model checkpoint exists (starts with model_final_) and not overriding it, then return the final checkpoint. Otherwise find the latest checkpoint.
- Parameters
checkpoint_folder (str) – path to the checkpoint folder.
config (AttrDict) – root config
skip_final (bool) – whether the final model checkpoint should be skipped or not
latest_checkpoint_resume_num (int) – what Nth latest checkpoint to resume from. Sometimes the latest checkpoints could be corrupt so this option helps to resume from instead a few checkpoints before the last checkpoint.
-
vissl.utils.checkpoint.
get_resume_checkpoint
(cfg: vissl.utils.hydra_config.AttrDict, checkpoint_folder: str)[source]¶ Return the checkpoint from which to resume traning. If no checkpoint found, return None. Resuming training is optional and user can set AUTO_RESUME=false to not resume the training.
If we want to overwrite the existing final checkpoint, we ignore the final checkpoint and return the previous checkpoints if exist.
-
vissl.utils.checkpoint.
print_state_dict_shapes
(state_dict: Dict[str, Any])[source]¶ For the given model state dictionary, print the name and shape of each parameter tensor in the model state. Helps debugging.
- Parameters
state_dict (Dict[str, Any]) – model state dictionary
-
vissl.utils.checkpoint.
print_loaded_dict_info
(model_state_dict: Dict[str, Any], state_dict: Dict[str, Any], skip_layers: List[str], model_config: vissl.utils.hydra_config.AttrDict)[source]¶ Print what layers were loaded, what layers were ignored/skipped/not found when initializing a model from a specified model params file.
-
vissl.utils.checkpoint.
replace_module_prefix
(state_dict: Dict[str, Any], prefix: str, replace_with: str = '')[source]¶ Remove prefixes in a state_dict needed when loading models that are not VISSL trained models.
Specify the prefix in the keys that should be removed.
-
vissl.utils.checkpoint.
append_module_prefix
(state_dict: Dict[str, Any], prefix: str)[source]¶ Append prefixes in a state_dict needed when loading models that are not VISSL trained models.
- In order to load the model (if not trained with VISSL) with VISSL, there are 2 scenarios:
If you are interested in evaluating the model features and freeze the trunk. Set APPEND_PREFIX=”trunk.base_model.” This assumes that your model is compatible with the VISSL trunks. The VISSL trunks start with “_feature_blocks.” prefix. If your model doesn’t have these prefix you can append them. For example: For TorchVision ResNet trunk, set APPEND_PREFIX=”trunk.base_model._feature_blocks.”
where you want to load the model simply and finetune the full model. Set APPEND_PREFIX=”trunk.” This assumes that your model is compatible with the VISSL trunks. The VISSL trunks start with “_feature_blocks.” prefix. If your model doesn’t have these prefix you can append them. For TorchVision ResNet trunk, set APPEND_PREFIX=”trunk._feature_blocks.”
NOTE: the prefix is appended to all the layers in the model
-
vissl.utils.checkpoint.
check_model_compatibilty
(config: vissl.utils.hydra_config.AttrDict, state_dict: Dict[str, Any])[source]¶ Given a VISSL model and state_dict, check if the state_dict can be loaded to VISSL model (trunk + head) based on the trunk and head prefix that is expected. If not compatible, we raise exception.
Prefix checked for head: heads. Prefix checked for trunk: trunk._feature_blocks. or trunk.base_model._feature_blocks.
depending on the workflow type (training | evaluation).
-
vissl.utils.checkpoint.
get_checkpoint_model_state_dict
(config: vissl.utils.hydra_config.AttrDict, state_dict: Dict[str, Any])[source]¶ Given a specified pre-trained VISSL model (composed of head and trunk), we get the state_dict that can be loaded by appending prefixes to model and trunk.
- Parameters
config (AttrDict) – full config file
state_dict (Dict) – raw state_dict loaded from the checkpoint or weights file
- Returns
state_dict (Dict) –
- vissl state_dict with layer names matching compatible with
vissl model. Hence this state_dict can be loaded directly.
-
vissl.utils.checkpoint.
init_model_from_weights
(config: vissl.utils.hydra_config.AttrDict, model, state_dict: Dict[str, Any], state_dict_key_name: str, skip_layers: List[str], replace_prefix=None, append_prefix=None)[source]¶ Initialize the model from any given params file. This is particularly useful during the feature evaluation process or when we want to evaluate a model on a range of tasks.
- Parameters
config (AttrDict) – config file
model (object) – instance of base_ssl_model
state_dict (Dict) – torch.load() of user provided params file path.
state_dict_key_name (string) – key name containing the model state dict
skip_layers (List(string)) – layer names with this key are not copied
replace_prefix (string) – remove these prefixes from the layer names (executed first)
append_prefix (string) – append the prefix to the layer names (executed after replace_prefix)
- Returns
model (object) – the model initialized from the weights file
vissl.utils.collect_env module¶
vissl.utils.env module¶
-
vissl.utils.env.
set_env_vars
(local_rank: int, node_id: int, cfg: vissl.utils.hydra_config.AttrDict)[source]¶ Set some environment variables like total number of gpus used in training, distributed rank and local rank of the current gpu, whether to print the nccl debugging info and tuning nccl settings.
vissl.utils.hydra_config module¶
-
class
vissl.utils.hydra_config.
AttrDict
(dictionary)[source]¶ Bases:
dict
Dictionary subclass whose entries can be accessed like attributes (as well as normally). Credits: https://aiida.readthedocs.io/projects/aiida-core/en/latest/_modules/aiida/common/extendeddicts.html#AttributeDict # noqa
-
__init__
(dictionary)[source]¶ Recursively turn the dict and all its nested dictionaries into AttrDict instance.
-
__getattr__
(key)[source]¶ Read a key as an attribute.
- Raises
AttributeError – if the attribute does not correspond to an existing key.
-
__delattr__
(key)[source]¶ Delete a key as an attribute.
- Raises
AttributeError – if the attribute does not correspond to an existing key.
-
-
vissl.utils.hydra_config.
convert_to_attrdict
(cfg: omegaconf.dictconfig.DictConfig, cmdline_args: List[Any] = None)[source]¶ Given the user input Hydra Config, and some command line input options to override the config file: 1. merge and override the command line options in the config 2. Convert the Hydra OmegaConf to AttrDict structure to make it easy
to access the keys in the config file
Also check the config version used is compatible and supported in vissl. In future, we would want to support upgrading the old config versions if we make changes to the VISSL default config structure (deleting, renaming keys)
We infer values of some parameters in the config file using the other parameter values.
-
vissl.utils.hydra_config.
is_hydra_available
()[source]¶ Check if Hydra is available. Simply python import to test.
-
vissl.utils.hydra_config.
print_cfg
(cfg)[source]¶ Supports printing both Hydra DictConfig and also the AttrDict config
-
vissl.utils.hydra_config.
resolve_linear_schedule
(cfg, param_schedulers)[source]¶ For the given composite schedulers, for each linear schedule, if the training is 1 node only, the https://arxiv.org/abs/1706.02677 linear warmup rule has to be checked if the rule is applicable and necessary.
We set the end_value = scaled_lr (assuming it’s a linear warmup). In case only 1 machine is used in training, the start_lr = scaled_lr and then the linear warmup is not needed.
-
vissl.utils.hydra_config.
get_scaled_lr_scheduler
(cfg, param_schedulers, scaled_lr)[source]¶ Scale learning rate value for different Learning rate types. See assert_learning_rate() for how the scaled LR is calculated.
Values changed for learning rate schedules: 1. cosine:
end_value = scaled_lr * (end_value / start_value) start_value = scaled_lr and
- multistep:
gamma = values[1] / values[0] values = [scaled_lr * pow(gamma, idx) for idx in range(len(values))]
- step_with_fixed_gamma
base_value = scaled_lr
linear: end_value = scaled_lr
inverse_sqrt: start_value = scaled_lr
constant: value = scaled_lr
- composite:
recursively call to scale each composition. If the composition consists of a linear schedule, we assume that a linear warmup is applied. If the linear warmup is applied, it’s possible the warmup is not necessary if the global batch_size is smaller than the base_lr_batch_size and in that case, we remove the linear warmup from the schedule.
-
vissl.utils.hydra_config.
assert_learning_rate
(cfg)[source]¶ 1) Assert the Learning rate here. LR is scaled as per https://arxiv.org/abs/1706.02677. to turn this automatic scaling off, set config.OPTIMIZER.param_schedulers.lr.auto_lr_scaling.auto_scale=false
- scaled_lr is calculated:
- given base_lr_batch_size = batch size for which the base learning rate is specified,
base_value = base learning rate value that will be scaled, The current batch size is used to determine how to scale the base learning rate value.
scaled_lr = ((batchsize_per_gpu * world_size) * base_value ) / base_lr_batch_size
We perform this auto-scaling for head learning rate as well if user wants to use a different learning rate for the head
infer the model head params weight decay: if the head should use a different weight decay value than the trunk. If using different weight decay value for the head, set here. otherwise, the same value as trunk will be automatically used.
-
vissl.utils.hydra_config.
assert_losses
(cfg)[source]¶ Infer settings for various self-supervised losses. Takes care of setting various loss parameters correctly like world size, batch size per gpu, effective global batch size, collator etc. Each loss has additional set of parameters that can be inferred to ensure smooth training in case user forgets to adjust all the parameters.
-
vissl.utils.hydra_config.
assert_hydra_conf
(cfg)[source]¶ Infer values of few parameters in the config file using the value of other config parameters 1. Inferring losses 2. Auto scale learning rate if user has specified auto scaling to be True. 3. Infer meter names (model layer name being evaluated) since we support list meters
that have multiple output and same target. This is very common in self-supervised learning where we want to evaluate metric for several layers of the models. VISSL supports running evaluation for multiple model layers in a single training run.
Support multi-gpu DDP eval model by attaching a dummy parameter. This is particularly helpful for the multi-gpu feature extraction especially when the dataset is large for which features are being extracted.
Infer what kind of labels are being used. If user has specified a labels source, we set LABEL_TYPE to “standard” (also vissl default), otherwise if no label is specified, we set the LABEL_TYPE to “sample_index”.
vissl.utils.io module¶
-
vissl.utils.io.
cache_url
(url: str, cache_dir: str) → str[source]¶ This implementation downloads the remote resource and caches it locally. The resource will only be downloaded if not previously requested.
-
vissl.utils.io.
create_file_symlink
(file1, file2)[source]¶ Simply create the symlinks for a given file1 to file2. Useful during model checkpointing to symlinks to the latest successful checkpoint.
-
vissl.utils.io.
save_file
(data, filename)[source]¶ Common i/o utility to handle saving data to various file formats. Supported:
.pkl, .pickle, .npy, .json
-
vissl.utils.io.
load_file
(filename, mmap_mode=None)[source]¶ Common i/o utility to handle loading data from various file formats. Supported:
.pkl, .pickle, .npy, .json
For the npy files, we support reading the files in mmap_mode. If the mmap_mode of reading is not successful, we load data without the mmap_mode.
-
vissl.utils.io.
is_url
(input_url)[source]¶ Check if an input string is a url. look for http(s):// and ignoring the case
-
vissl.utils.io.
cleanup_dir
(dir)[source]¶ Utility for deleting a directory. Useful for cleaning the storage space that contains various training artifacts like checkpoints, data etc.
-
vissl.utils.io.
copy_file
(input_file, destination_dir, tmp_destination_dir)[source]¶ Copy a given input_file from source to the destination directory.
Steps: 1. We use PathManager to extract the data to local path. 2. we simply move the files from the PathManager cached local directory
to the user specified destination directory. We use rsync. How destination dir is chosen:
If user is using slurm, we set destination_dir = slurm_dir (see get_slurm_dir)
If the local path used by PathManafer is same as the input_file path, and the destination directory is not specified, we set destination_dir = tmp_destination_dir
- Returns
output_file (str) – the new path of the file destination_dir (str): the destination dir that was actually used
-
vissl.utils.io.
copy_dir
(input_dir, destination_dir, num_threads)[source]¶ Copy contents of one directory to the specified destination directory using the number of threads to speed up the copy. When the data is copied successfully, we create a copy_complete file in the destination_dir folder to mark the completion. If the destination_dir folder already exists and has the copy_complete file, we don’t copy the file.
useful for copying datasets like ImageNet to speed up dataloader. Using 20 threads for imagenet takes about 20 minutes to copy.
- Returns
destination_dir (str) – directory where the contents were copied
-
vissl.utils.io.
copy_data
(input_file, destination_dir, num_threads, tmp_destination_dir)[source]¶ Copy data from one source to the other using num_threads. The data to copy can be a single file or a directory. We check what type of data and call the relevant functions.
- Returns
output_file (str) – the new path of the data (could be file or dir) destination_dir (str): the destination dir that was actually used
-
vissl.utils.io.
copy_data_to_local
(input_files, destination_dir, num_threads=40, tmp_destination_dir=None)[source]¶ Iteratively copy the list of data to a destination directory. Each data to copy could be a single file or a directory.
- Returns
output_file (str) –
- the new path of the file. If there were
no files to copy, simply return the input_files
destination_dir (str): the destination dir that was actually used
vissl.utils.logger module¶
-
vissl.utils.logger.
setup_logging
(name, output_dir=None, rank=0)[source]¶ Setup various logging streams: stdout and file handlers.
For file handlers, we only setup for the master gpu.
-
vissl.utils.logger.
shutdown_logging
()[source]¶ After training is done, we ensure to shut down all the logger streams.
vissl.utils.misc module¶
-
vissl.utils.misc.
is_apex_available
()[source]¶ Check if apex is available with simple python imports.
-
vissl.utils.misc.
is_faiss_available
()[source]¶ Check if faiss is available with simple python imports. To install faiss, simply do:
If using PIP env: pip install faiss-gpu If using conda env: conda install faiss-gpu -c pytorch
-
vissl.utils.misc.
is_opencv_available
()[source]¶ Check if opencv is available with simple python imports. To install opencv, simply do: pip install opencv-python regardless of whether using conda or pip environment.
-
vissl.utils.misc.
find_free_tcp_port
()[source]¶ Find the free port that can be used for Rendezvous on the local machine. We use this for 1 machine training where the port is automatically detected.
-
vissl.utils.misc.
get_dist_run_id
(cfg, num_nodes)[source]¶ For multi-gpu training with PyTorch, we have to specify how the gpus are going to rendezvous. This requires specifying the communication method: file, tcp and the unique rendezvous run_id that is specific to 1 run.
- We recommend:
for 1-node: use init_method=tcp and run_id=auto
for multi-node, use init_method=tcp and specify run_id={master_node}:{port}
-
vissl.utils.misc.
setup_multiprocessing_method
(method_name: str)[source]¶ PyTorch supports several multiprocessing options: forkserver | spawn | fork
We recommend and use forkserver as the default method in VISSL.
-
vissl.utils.misc.
set_seeds
(cfg, node_id=0)[source]¶ Set the python random, numpy and torch seed for each gpu. Also set the CUDA seeds if the CUDA is available. This ensures deterministic nature of the training.
-
vissl.utils.misc.
get_indices_sparse
(data)[source]¶ Is faster than np.argwhere. Used in loss functions like swav loss, etc
-
vissl.utils.misc.
merge_features
(output_dir, split, layer, cfg)[source]¶ For multi-gpu feature extraction, each gpu saves features corresponding to its share of the data. We can merge the features across all gpus to get the features for the full data.
The features are saved along with the data indexes and label. The data indexes can be used to sort the data and ensure the uniqueness.
We organize the features, targets corresponding to the data index of each feature, ensure the uniqueness and return.
- Parameters
- Returns
output (Dict) – contains features, targets, inds as the keys
vissl.utils.perf_stats module¶
-
class
vissl.utils.perf_stats.
PerfTimer
(timer_name: str, perf_stats: Optional[PerfStats])[source]¶ Bases:
object
Very simple timing wrapper, with context manager wrapping. Typical usage:
- with PerfTimer(‘forward_pass’, perf_stats):
model.forward(data)
# … with PerfTimer(‘backward_pass’, perf_stats):
model.backward(loss)
# … print(perf_stats.report_str())
Note that timer stats accumulate by name, so you can as if resume them by re-using the name.
You can also use it without context manager, i.e. via start() / stop() directly.
If supplied PerfStats is constructed with use_cuda_events=True (which is default), then Cuda events will be added to correctly track time of async execution of Cuda kernels:
- with PerfTimer(‘foobar’, perf_stats):
some_cpu_work() schedule_some_cuda_work()
In example above, the “Host” column will capture elapsed time from the perspective of the Python process, and “CudaEvent” column will capture elapsed time between scheduling of Cuda work (within the PerfTimer scope) and completion of this work, some of which might happen outside the PerfTimer scope.
If perf_stats is None, using PerfTimer does nothing.
-
start
()[source]¶ Start the recording if the perfTimer should not be skipped or if the recording is not already in progress. If using cuda, we record time of cuda events as well.
-
class
vissl.utils.perf_stats.
PerfMetric
[source]¶ Bases:
object
Encapsulates numerical tracking of a single metric, with a .update(value) API. Under-the-hood this can additionally keep track of sums, (exp.) moving averages, sum of squares (e.g. for stdev), filtered values, etc.
-
EMA_FACTOR
= 0.1¶
-
-
class
vissl.utils.perf_stats.
PerfStats
(use_cuda_events=True)[source]¶ Bases:
object
Accumulate stats (from timers) over many iterations
-
MAX_PENDING_TIMERS
= 1000¶
-
update_with_timer
(timer: vissl.utils.perf_stats.PerfTimer)[source]¶
-
vissl.utils.slurm module¶
vissl.utils.tensorboard module¶
This script contains some helpful functions to handle tensorboard setup.
-
vissl.utils.tensorboard.
is_tensorboard_available
()[source]¶ Check whether tensorboard is available or not.
- Returns
tb_available (bool) –
- based on tensorboard imports, returns whether tensboarboard
is available or not.
-
vissl.utils.tensorboard.
get_tensorboard_dir
(cfg)[source]¶ Get the output directory where the tensorboard events will be written.
- Parameters
cfg (AttrDict) – User specified config file containing the settings for the tensorboard as well like log directory, logging frequency etc
- Returns
tensorboard_dir (str) – output directory path
-
vissl.utils.tensorboard.
get_tensorboard_hook
(cfg)[source]¶ Construct the Tensorboard hook for visualization from the specified config
- Parameters
cfg (AttrDict) – User specified config file containing the settings for the tensorboard as well like log directory, logging frequency etc
- Returns
SSLTensorboardHook (function) – the tensorboard hook constructed