vissl.models package¶

class vissl.models.BaseSSLMultiInputOutputModel(*args, **kwargs)[source]¶

Bases: classy_vision.models.classy_model.ClassyModel

Class to implement a Self-Supervised model. The model is split into `trunk’ that computes features and `head’ that computes outputs (projections, classifications etc)

This class supports many use cases: 1. Model producing single output as in standard supervised ImageNet training 2. Model producing multiple outputs (Multi-task) 3. Model producing multiple outputs from different features (layers)

from the trunk (useful in linear evaluation of features from several model layers)

Model that accepts multiple inputs (e.g. image and patches as in PIRL appraoch).
Model where the trunk is frozen.
Model that supports multiple resolutions inputs as in SwAV

How to specify heads?
For information on heads see the _get_heads() function
What inputs do `heads’ operate on?
One can specify the `input’ to heads mapping in the list MULTI_INPUT_HEAD_MAPPING. See the _setup_multi_input_head_mapping() function for details.

multi_input_with_head_mapping_forward(batch)[source]¶: Perform forward pass (trunk + heads) separately on each input and return the model output on all inputs as a list.

multi_res_input_forward(batch, feature_names)[source]¶: Perform forward pass separately on each resolution input. The inputs corresponding to a single resolution are clubbed and single forward is run on the same resolution inputs. Hence we do several forward passes = number of different resolutions used. We then concatenate all the output features. Then run the head forward on the concatenated features.

single_input_forward(batch, feature_names, heads)[source]¶: Simply run the trunk and heads forward on the input tensor. We run the trunk first and then the heads on the trunk output. If the model is trunk feature extraction only, then we simply return the output of the trunk.

heads_forward(feats, heads)[source]¶

Run the forward of the head on the trunk output features. We have 2 cases:

#heads = #feats -> example training linear classifiers on various layers. We run one head on the corresponding feature.

#feats = 1 and #heads > 1 -> head consists of many layers to be run sequentially. #outputs = 1

forward(batch)[source]¶: Main forward of the model. Depending on the model type the calls are patched to the suitable function.

freeze_head()[source]¶: Freeze the model head by setting requires_grad=False for all the parameters

freeze_trunk()[source]¶: Freeze the model trunk by setting requires_grad=False for all the parameters

freeze_head_and_trunk()[source]¶: Freeze the full model including the heads and the trunk. In 99% cases, we do not use the pretext head as it is specific to the self-supervised pretext task. But in case of some models like NPID, SimCLR, SwAV, the head is essentially a low dimensional feature projection which we want to use. Hence, we provide utility to freeze the full model.

is_fully_frozen_model()[source]¶: Look at all the parameters of the model (trunk + heads) and check if there is any trainable parameter. if not, the model is completely frozen.

get_features(batch)[source]¶

Run the trunk forward on the input batch. This give us the features from the trunk at several layers of the model.

In case of feature extraction, we don’t run the heads and only the trunk. The trunk will already have the feature extractor Pooling layers and flattened features attached. feature extractor heads are part of the trunk already.

get_classy_state(deep_copy=False)[source]¶

Return the model state (trunk + heads) to checkpoint.

We call this on the state.base_model which is not wrapped with DDP. get the model state_dict to checkpoint

set_classy_state(state)[source]¶

Initialize the model trunk and head from the state dictionary.

We call this on the state.base_model which is not wrapped with DDP. load the model from checkpoint.

property num_classes¶: Not implemented and not required

property input_shape¶: Not implemented and not required

property output_shape¶: Not implemented and not required

validate(dataset_output_shape)[source]¶: Not implemented and not required

vissl.models.convert_sync_bn(config, model)[source]¶

Convert the BatchNorm layers in the model to the SyncBatchNorm layers.

For SyncBatchNorm, we support two sources: Apex and PyTorch. The optimized SyncBN kernels provided by apex run faster.

Parameters

config (AttrDict) – configuration file
model – Pytorch model whose BatchNorm layers should be converted to SyncBN layers.

NOTE: Since SyncBatchNorm layer synchronize the BN stats across machines, using

the syncBN layer can be slow. In order to speed up training while using syncBN, we recommend using process_groups which are very well supported for Apex. To set the process groups, set SYNC_BN_CONFIG.GROUP_SIZE following below: 1) if group_size=-1 -> use the VISSL default setting. We synchronize within a

machine and hence will set group_size=num_gpus per node. This gives the best speedup.

if group_size>0 -> will set group_size=value set by user.
if group_size=0 -> no groups are created and process_group=None. This means global sync is done.

vissl.models.is_feature_extractor_model(model_config)[source]¶

If the model is a feature extractor model:

evaluation model is on
trunk is frozen
number of features specified for features extratction > 0

vissl.models.build_model(model_config, optimizer_config)[source]¶: Given the model config and the optimizer config, construct the model. The returned model is not copied to gpu yet (if using gpu) and neither wrapped with DDP yet. This is done later train_task.py .prepare()

vissl.models.model_helpers module¶

vissl.models.model_helpers.transform_model_input_data_type(model_input, model_config)[source]¶: Default model input follow RGB format. Based the model input specified, change the type. Supported types: RGB, BGR, LAB

vissl.models.model_helpers.is_feature_extractor_model(model_config)[source]¶

If the model is a feature extractor model:

evaluation model is on
trunk is frozen
number of features specified for features extratction > 0

vissl.models.model_helpers.get_trunk_output_feature_names(model_config)[source]¶: Get the feature names which we will use to associate the features witl. If Feature eval mode is set, we get feature names from config.FEATURE_EVAL_SETTINGS.LINEAR_EVAL_FEAT_POOL_OPS_MAP.

class vissl.models.model_helpers.Wrap(function)[source]¶

Bases: torch.nn.modules.module.Module

Wrap a free function into a nn.Module. Can be useful to build a model block, and include activations or light tensor alterations

forward(x)[source]¶

class vissl.models.model_helpers.SyncBNTypes(value)[source]¶

Bases: str, enum.Enum

Supported SyncBN types

apex = 'apex'¶

pytorch = 'pytorch'¶

vissl.models.model_helpers.convert_sync_bn(config, model)[source]¶

Convert the BatchNorm layers in the model to the SyncBatchNorm layers.

For SyncBatchNorm, we support two sources: Apex and PyTorch. The optimized SyncBN kernels provided by apex run faster.

Parameters

config (AttrDict) – configuration file
model – Pytorch model whose BatchNorm layers should be converted to SyncBN layers.

NOTE: Since SyncBatchNorm layer synchronize the BN stats across machines, using

the syncBN layer can be slow. In order to speed up training while using syncBN, we recommend using process_groups which are very well supported for Apex. To set the process groups, set SYNC_BN_CONFIG.GROUP_SIZE following below: 1) if group_size=-1 -> use the VISSL default setting. We synchronize within a

machine and hence will set group_size=num_gpus per node. This gives the best speedup.

if group_size>0 -> will set group_size=value set by user.
if group_size=0 -> no groups are created and process_group=None. This means global sync is done.

class vissl.models.model_helpers.Flatten(dim=- 1)[source]¶

Bases: torch.nn.modules.module.Module

Flatten module attached in the model. It basically flattens the input tensor.

forward(feat)[source]¶: flatten the input feat

flops(x)[source]¶: number of floating point operations performed. 0 for this module.

class vissl.models.model_helpers.Identity(args=None)[source]¶

Bases: torch.nn.modules.module.Module

A helper module that outputs the input as is

forward(x)[source]¶: Return the input as the output

class vissl.models.model_helpers.LayerNorm2d(num_channels, eps=1e-05, affine=True)[source]¶

Bases: torch.nn.modules.normalization.GroupNorm

Use GroupNorm to construct LayerNorm as pytorch LayerNorm2d requires specifying input_shape explicitly which is inconvenient. Set num_groups=1 to convert GroupNorm to LayerNorm.

class vissl.models.model_helpers.RESNET_NORM_LAYER(value)[source]¶

Bases: str, enum.Enum

Types of Norms supported in ResNe(X)t trainings. can be easily set and modified from the config file.

BatchNorm = 'BatchNorm'¶

LayerNorm = 'LayerNorm'¶

vissl.models.model_helpers.parse_out_keys_arg(out_feat_keys: List[str], all_feat_names: List[str]) → Tuple[List[str], int][source]¶: Checks if all out_feature_keys are mapped to a layer in the model. Returns the last layer to forward pass through for efficiency. Allow duplicate features also to be evaluated. Adapted from (https://github.com/gidariss/FeatureLearningRotNet).

vissl.models.model_helpers.get_trunk_forward_outputs_module_list(feat: torch.Tensor, out_feat_keys: List[str], feature_blocks: torch.nn.modules.container.ModuleList, all_feat_names: List[str] = None) → List[torch.Tensor][source]¶

Parameters

feat – model input.
out_feat_keys – a list/tuple with the feature names of the features that the function should return. By default the last feature of the network is returned.
feature_blocks – list of feature blocks in the model
feature_mapping – name of the layers in the model

Returns

out_feats – a list with the asked output features placed in the same order as in out_feat_keys.

vissl.models.model_helpers.get_trunk_forward_outputs(feat: torch.Tensor, out_feat_keys: List[str], feature_blocks: torch.nn.modules.container.ModuleDict, feature_mapping: Dict[str, str] = None, use_checkpointing: bool = True, checkpointing_splits: int = 2) → List[torch.Tensor][source]¶

Parameters

feat – model input.
out_feat_keys – a list/tuple with the feature names of the features that the function should return. By default the last feature of the network is returned.
feature_blocks – ModuleDict containing feature blocks in the model
feature_mapping – an optional correspondence table in between the requested feature names and the model’s.

Returns

out_feats – a list with the asked output features placed in the same order as in out_feat_keys.

vissl.models.heads module¶

vissl.models.heads.get_model_head(name: str)[source]¶: Given the model head name, construct the head if it’s registered with VISSL.

class vissl.models.heads.LinearEvalMLP(model_config: vissl.utils.hydra_config.AttrDict, in_channels: int, dims: List[int], use_bn: bool = False, use_relu: bool = False)[source]¶

Bases: torch.nn.modules.module.Module

A standard Linear classification module that can be attached to several layers of the model to evaluate the representation quality of features.

The layers attached are:
BatchNorm2d -> Linear (1 or more)

Accepts a 4D input tensor. If you want to use 2D input tensor instead, use the “mlp” head directly.

__init__(model_config: vissl.utils.hydra_config.AttrDict, in_channels: int, dims: List[int], use_bn: bool = False, use_relu: bool = False)[source]¶

Parameters

model_config (AttrDict) – dictionary config.MODEL in the config file
in_channels (int) – number of channels the input has. This information is used to attached the BatchNorm2D layer.
dims (int) – dimensions of the linear layer. Example [8192, 1000] which means attaches nn.Linear(8192, 1000, bias=True)

forward(batch: torch.Tensor)[source]¶

Parameters: batch (torch.Tensor) – 4D torch tensor. This layer is meant to be attached at several parts of the model to evaluate feature representation quality. For 2D input tensor, the tensor is unsqueezed to NxDx1x1 and then eval_mlp is applied
Returns: out (torch.Tensor) – 2D output torch tensor

class vissl.models.heads.MLP(model_config: vissl.utils.hydra_config.AttrDict, dims: List[int], use_bn: bool = False, use_relu: bool = False, use_dropout: bool = False, use_bias: bool = True)[source]¶

Bases: torch.nn.modules.module.Module

This module can be used to attach combination of {Linear, BatchNorm, Relu, Dropout} layers and they are fully configurable from the config file. The module also supports stacking multiple MLPs.

Examples

Linear Linear -> BN Linear -> ReLU Linear -> Dropout Linear -> BN -> ReLU -> Dropout Linear -> ReLU -> Dropout Linear -> ReLU -> Linear -> ReLU -> … Linear -> Linear -> … …

Accepts a 2D input tensor. Also accepts 4D input tensor of shape N x C x 1 x 1.

__init__(model_config: vissl.utils.hydra_config.AttrDict, dims: List[int], use_bn: bool = False, use_relu: bool = False, use_dropout: bool = False, use_bias: bool = True)[source]¶

Parameters

model_config (AttrDict) – dictionary config.MODEL in the config file
use_bn (bool) – whether to attach BatchNorm after Linear layer
use_relu (bool) – whether to attach ReLU after (Linear (-> BN optional))
use_dropout (bool) – whether to attach Dropout after (Linear (-> BN -> relu optional))
use_bias (bool) – whether the Linear layer should have bias or not
dims (int) – dimensions of the linear layer. Example [8192, 1000] which attaches nn.Linear(8192, 1000, bias=True)

scale_weights(model_config)[source]¶

forward(batch: torch.Tensor)[source]¶

Parameters: batch (torch.Tensor) – 2D torch tensor or 4D tensor of shape N x C x 1 x 1
Returns: out (torch.Tensor) – 2D output torch tensor

class vissl.models.heads.SiameseConcatView(model_config: vissl.utils.hydra_config.AttrDict, num_towers: int)[source]¶

Bases: torch.nn.modules.module.Module

This head is useful for dealing with Siamese models which have multiple towers. For an input of type (N * num_towers) x C, this head can convert the output to N x (num_towers * C).

This head is used in case of PIRL https://arxiv.org/abs/1912.01991 and Jigsaw https://arxiv.org/abs/1603.09246 approaches.

__init__(model_config: vissl.utils.hydra_config.AttrDict, num_towers: int)[source]¶

Parameters

model_config (AttrDict) – dictionary config.MODEL in the config file
num_towers (int) – number of towers in siamese model

forward(batch: torch.Tensor)[source]¶

Parameters: batch (torch.Tensor) – 2D torch tensor (N * num_towers) x C or 4D tensor of shape (N * num_towers) x C x 1 x 1
Returns: out (torch.Tensor) – 2D output torch tensor N x (C * num_towers)

class vissl.models.heads.SwAVPrototypesHead(model_config: vissl.utils.hydra_config.AttrDict, dims: List[int], use_bn: bool, num_clusters: int, use_bias: bool = True, return_embeddings: bool = True, skip_last_bn: bool = True, normalize_feats: bool = True)[source]¶

Bases: torch.nn.modules.module.Module

SwAV head used in https://arxiv.org/pdf/2006.09882.pdf paper.

The head is composed of 2 parts

projection of features to lower dimension like 128
feature classification into clusters (also called prototypes)

The projected features are L2 normalized before clustering step.

Input: 4D torch.tensor of shape (N x C x H x W)

Output: List(2D torch.tensor of shape N x num_clusters)

__init__(model_config: vissl.utils.hydra_config.AttrDict, dims: List[int], use_bn: bool, num_clusters: int, use_bias: bool = True, return_embeddings: bool = True, skip_last_bn: bool = True, normalize_feats: bool = True)[source]¶

Parameters

model_config (AttrDict) – dictionary config.MODEL in the config file
dims (int) –
dimensions of the linear layer. Must have length at least 2. Example: [2048, 2048, 128] attaches linear layer

Linear(2048, 2048) -> BN -> Relu -> Linear(2048, 128)
use_bn (bool) – whether to attach BatchNorm after Linear layer
num_clusters (List(int)) –
number of prototypes or clusters. Typically 3000. Example dims=[3000] will attach 1 prototype head.

dims=[3000, 3000] will attach 2 prototype heads
use_bias (bool) – whether the Linear layer should have bias or not
return_embeddings (bool) – whether return the projected embeddings or not
skip_last_bn (bool) –
whether to attach BN + Relu at the end of projection head. .. rubric:: Example

[2048, 2048, 128] with skip_last_bn=True attaches linear layer Linear(2048, 2048) -> BN -> Relu -> Linear(2048, 128)

[2048, 2048, 128] with skip_last_bn=False attaches linear layer Linear(2048, 2048) -> BN -> Relu -> Linear(2048, 128) -> BN -> ReLU

This could be particularly useful when performing full finetuning on hidden layers.

forward(batch: torch.Tensor)[source]¶

Parameters: batch (4D torch.tensor) – shape (N x C x H x W)
Returns: List(2D torch.tensor of shape N x num_clusters)

vissl.models.trunks module¶

vissl.models.trunks.register_model_trunk(name: str)[source]¶

Registers Self-Supervision Model Trunks.

This decorator allows VISSL to add custom model trunk, even if the model trunk itself is not part of VISSL. To use it, apply this decorator to a model trunk class, like this:

@register_model_trunk('my_model_trunk_name')
def my_model_trunk():
    ...

To get a model trunk from a configuration file, see get_model_trunk().

vissl.models.trunks.get_model_trunk(name: str)[source]¶: Given the model trunk name, construct the trunk if it’s registered with VISSL.