Building Models¶

The model in VISSL is split into trunk that computes features and head that computes outputs (projections, classifications etc).

VISSL supports several types of Heads and several types of trunks. Overall, the following use cases are supported by VISSL models:

Model producing single output as in standard supervised ImageNet training
Model producing multiple outputs (Multi-task)
Model producing multiple outputs from different features (layers) from the trunk (useful in linear evaluation of features from several model layers)
Model that accepts multiple inputs (e.g. image and patches as in PIRL appraoch).
Model where the trunk is frozen and head is trained
Model that supports multiple resolutions inputs as in SwAV
Model that is completely frozen and features are extracted.

Trunks¶

VISSL supports many trunks including AlexNet (variants for approaches like Jigsaw, Colorization, RotNet, DeepCluster etc), ResNets, ResNeXt, RegNets, EfficientNet.

To set the trunk, user needs to specify the trunk name in MODEL.TRUNK.NAME.

Examples of trunks:

Using ResNe(X)ts trunk:

MODEL:
  TRUNK:
    NAME: resnet
    TRUNK_PARAMS:
      RESNETS:
        DEPTH: 50
        WIDTH_MULTIPLIER: 1
        NORM: BatchNorm    # BatchNorm | LayerNorm
        GROUPS: 1
        ZERO_INIT_RESIDUAL: False
        WIDTH_PER_GROUP: 64
        # Colorization model uses stride=1 for last layer to retain higher spatial resolution
        # for the pixel-wise task. Torchvision default is stride=2 and all other models
        # use this so we set the default as 2.
        LAYER4_STRIDE: 2

Using RegNets trunk: We follow RegNets defined in ClassyVision directly and users can either use a pre-defined ClassyVision RegNet config or define their own.
- for example, to create a new RegNet config for RegNet-256Gf model (1.3B params):
```
MODEL:
  TRUNK:
    NAME: regnet
    TRUNK_PARAMS:
      REGNET:
        depth: 27
        w_0: 640
        w_a: 230.83
        w_m: 2.53
        group_width: 373
```
- To use a pre-defined RegNet config in classy vision example: RegNetY-16gf
```
MODEL:
  TRUNK:
    NAME: regnet_y_16gf
```

Heads¶

This function creates the heads needed by the module. The head is specified by setting MODEL.HEAD.PARAMS in the configuration file.

The MODEL.HEAD.PARAMS is a list of Pairs containing parameters for (multiple) heads.

Pair[0] = Name of Head.
Pair[1] = kwargs passed to head constructor.

Example of [“name”, kwargs] MODEL.HEAD.PARAMS=["mlp", {"dims": [2048, 128]}]

Types of Heads one can specify¶

Case1: Simple Head containing single module - Single Input, Single output

MODEL:
  HEAD:
    PARAMS: [
        ["mlp", {"dims": [2048, 128]}]
    ]

Case2: Complex Head containing chain of head modules - Single Input, Single output

MODEL:
  HEAD:
    PARAMS: [
        ["mlp", {"dims": [2048, 1000], "use_bn": False, "use_relu": False}],
        ["siamese_concat_view", {"num_towers": 9}],
        ["mlp", {"dims": [9000, 128]}]
    ]

Case3: Multiple Heads (example 2 heads) - Single input, multiple output: can be used for multi-task learning

MODEL:
  HEAD:
    PARAMS: [
        # head 0
        [
            ["mlp", {"dims": [2048, 128]}]
        ],
        # head 1
        [
            ["mlp", {"dims": [2048, 1000], "use_bn": False, "use_relu": False}],
            ["siamese_concat_view", {"num_towers": 9}],
            ["mlp", {"dims": [9000, 128]}],
        ]
    ]

Case4: Multiple Heads (example 5 simple heads) - Single input, multiple output:: For example, used in linear evaluation of models

MODEL:
  HEAD:
    PARAMS: [
        ["eval_mlp", {"in_channels": 64, "dims": [9216, 1000]}],
        ["eval_mlp", {"in_channels": 256, "dims": [9216, 1000]}],
        ["eval_mlp", {"in_channels": 512, "dims": [8192, 1000]}],
        ["eval_mlp", {"in_channels": 1024, "dims": [9216, 1000]}],
        ["eval_mlp", {"in_channels": 2048, "dims": [8192, 1000]}],
    ]

Applying heads on multiple trunk features¶

By default, the head operates on the trunk output (single or multiple output). However, one can explicitly specify the input to heads mapping in the list MODEL.MULTI_INPUT_HEAD_MAPPING. This is used in PIRL training.

Assumptions:

This assumes that the same trunk is used to extract features for the different types of inputs.
One head only operates on one kind of input, Every individual head can contain several layers as in Case2 above.

MODEL.MULTI_INPUT_HEAD_MAPPING specifies Input -> Trunk Features mapping. Like in the single input case, the heads can operate on features from different layers. In this case, we specify MODEL.MULTI_INPUT_HEAD_MAPPING to be a list like:

MODEL:
  MULTI_INPUT_HEAD_MAPPING: [
        ["input_key", [list of features heads is applied on]]
  ]

For example: for a model that applies two heads on images and one head on patches:

MODEL:
    MULTI_INPUT_HEAD_MAPPING: [
        ["images", ["res5", "res4"]],
        ["patches", ["res3"]
    ],