vissl.losses package

vissl.losses.simclr_info_nce_loss

class vissl.losses.simclr_info_nce_loss.SimclrInfoNCELoss(loss_config: vissl.utils.hydra_config.AttrDict, device: str = 'gpu')[source]

Bases: classy_vision.losses.classy_loss.ClassyLoss

This is the loss which was proposed in SimCLR https://arxiv.org/abs/2002.05709 paper. See the paper for the details on the loss.

Config params:

temperature (float): the temperature to be applied on the logits buffer_params:

world_size (int): total number of trainers in training embedding_dim (int): output dimensions of the features projects effective_batch_size (int): total batch size used (includes positives)

classmethod from_config(loss_config: vissl.utils.hydra_config.AttrDict)[source]

Instantiates SimclrInfoNCELoss from configuration.

Parameters

loss_config – configuration for the loss

Returns

SimclrInfoNCELoss instance.

forward(output, target)[source]
class vissl.losses.simclr_info_nce_loss.SimclrInfoNCECriterion(buffer_params, temperature: float)[source]

Bases: torch.nn.modules.module.Module

The criterion corresponding to the SimCLR loss as defined in the paper https://arxiv.org/abs/2002.05709.

Parameters
  • temperature (float) – the temperature to be applied on the logits

  • buffer_params – world_size (int): total number of trainers in training embedding_dim (int): output dimensions of the features projects effective_batch_size (int): total batch size used (includes positives)

precompute_pos_neg_mask()[source]

We precompute the positive and negative masks to speed up the loss calculation

forward(embedding: torch.Tensor)[source]

Calculate the loss. Operates on embeddings tensor.

static gather_embeddings(embedding: torch.Tensor)[source]

Do a gather over all embeddings, so we can compute the loss. Final shape is like: (batch_size * num_gpus) x embedding_dim

vissl.losses.multicrop_simclr_info_nce_loss

class vissl.losses.multicrop_simclr_info_nce_loss.MultiCropSimclrInfoNCELoss(loss_config: vissl.utils.hydra_config.AttrDict, device: str = 'gpu')[source]

Bases: vissl.losses.simclr_info_nce_loss.SimclrInfoNCELoss

Expanded version of the SimCLR loss. The SimCLR loss works only on 2 positives. We expand the loss to work for more positives following the multi-crop augmentation proposed in SwAV paper. See SwAV paper https://arxiv.org/abs/2006.09882 for the multi-crop augmentation details.

Config params:

temperature (float): the temperature to be applied on the logits num_crops (int): number of positives used buffer_params:

world_size (int): total number of trainers in training embedding_dim (int): output dimensions of the features projects effective_batch_size (int): total batch size used (includes positives)

class vissl.losses.multicrop_simclr_info_nce_loss.MultiCropSimclrInfoNCECriterion(buffer_params, temperature: float, num_crops: int)[source]

Bases: vissl.losses.simclr_info_nce_loss.SimclrInfoNCECriterion

The criterion corresponding to the expandion SimCLR loss (as defined in the paper https://arxiv.org/abs/2002.05709) using the multi-crop augmentaion proposed in SwAV paper. The multi-crop augmentation allows using more positives per image.

Parameters
  • temperature (float) – the temperature to be applied on the logits

  • num_crops (int) – number of positives

  • buffer_params – world_size (int): total number of trainers in training embedding_dim (int): output dimensions of the features projects effective_batch_size (int): total batch size used (includes positives)

precompute_pos_neg_mask()[source]

We precompute the positive and negative masks to speed up the loss calculation

forward(embedding: torch.Tensor)[source]

Calculate the loss. Operates on embeddings tensor.

vissl.losses.swav_loss

class vissl.losses.swav_loss.SwAVLoss(loss_config: vissl.utils.hydra_config.AttrDict)[source]

Bases: classy_vision.losses.classy_loss.ClassyLoss

This loss is proposed by the SwAV paper https://arxiv.org/abs/2006.09882 by Caron et al. See the paper for more details about the loss.

Config params:

embedding_dim (int): the projection head output dimension temperature (float): temperature to be applied to the logits use_double_precision (bool): whether to use double precision for the loss.

This could be a good idea to avoid NaNs.

normalize_last_layer (bool): whether to normalize the last layer num_iters (int): number of sinkhorn algorithm iterations to make epsilon (float): see the paper for details num_crops (int): number of crops used crops_for_assign (List[int]): what crops to use for assignment num_prototypes (List[int]): number of prototypes temp_hard_assignment_iters (int): whether to do hard assignment for the initial

few iterations

output_dir (str): for dumping the debugging info in case loss

becomes NaN

queue:

queue_length (int): number of features to store and used in the scores start_iter (int): when to start using the queue for the scores local_queue_length (int): length of queue per gpu

classmethod from_config(loss_config: vissl.utils.hydra_config.AttrDict)[source]

Instantiates SwAVLoss from configuration.

Parameters

loss_config – configuration for the loss

Returns

SwAVLoss instance.

forward(output: torch.Tensor, target: torch.Tensor)[source]
class vissl.losses.swav_loss.SwAVCriterion(temperature: float, crops_for_assign: List[int], num_crops: int, num_iters: int, epsilon: float, use_double_prec: bool, num_prototypes: List[int], local_queue_length: int, embedding_dim: int, temp_hard_assignment_iters: int, output_dir: str)[source]

Bases: torch.nn.modules.module.Module

This criterion is used by the SwAV paper https://arxiv.org/abs/2006.09882 by Caron et al. See the paper for more details about the loss.

Config params:

embedding_dim (int): the projection head output dimension temperature (float): temperature to be applied to the logits

num_iters (int): number of sinkhorn algorithm iterations to make epsilon (float): see the paper for details num_crops (int): number of crops used crops_for_assign (List[int]): what crops to use for assignment num_prototypes (List[int]): number of prototypes temp_hard_assignment_iters (int): whether to do hard assignment for the initial

few iterations

output_dir (str): for dumping the debugging info in case loss

becomes NaN

local_queue_length (int): length of queue per gpu

distributed_sinkhornknopp(Q: torch.Tensor)[source]

Apply the distributed sinknorn optimization on the scores matrix to find the assignments

forward(scores: torch.Tensor, head_id: int)[source]
update_emb_queue(emb)[source]
compute_queue_scores(head)[source]
initialize_queue()[source]

vissl.losses.bce_logits_multiple_output_single_target

class vissl.losses.bce_logits_multiple_output_single_target.BCELogitsMultipleOutputSingleTargetLoss(loss_config: vissl.utils.hydra_config.AttrDict)[source]

Bases: classy_vision.losses.classy_loss.ClassyLoss

__init__(loss_config: vissl.utils.hydra_config.AttrDict)[source]

Intializer for the sum cross-entropy loss. For a single tensor, this is equivalent to the cross-entropy loss. For a list of tensors, this computes the sum of the cross-entropy losses for each tensor in the list against the target.

Config params:

reduction: specifies reduction to apply to the output, optional normalize_output: Whether to L2 normalize the outputs world_size: total number of gpus in training. automatically inferred by vissl

classmethod from_config(loss_config: vissl.utils.hydra_config.AttrDict)[source]

Instantiates BCELogitsMultipleOutputSingleTargetLoss from configuration.

Parameters

loss_config – configuration for the loss

Returns

BCELogitsMultipleOutputSingleTargetLoss instance.

forward(output: Union[torch.Tensor, List[torch.Tensor]], target: torch.Tensor)[source]

For each output and single target, loss is calculated. The returned loss value is the sum loss across all outputs.

vissl.losses.swav_momentum_loss

class vissl.losses.swav_momentum_loss.SwAVMomentumLoss(loss_config: vissl.utils.hydra_config.AttrDict)[source]

Bases: classy_vision.losses.classy_loss.ClassyLoss

This loss extends the SwAV loss proposed in paper https://arxiv.org/abs/2006.09882 by Caron et al. The loss combines the benefits of using the SwAV approach with the momentum encoder as used in MoCo.

Config params:

momentum (float): for the momentum encoder momentum_eval_mode_iter_start (int): from what iteration should the momentum encoder

network be in eval mode

embedding_dim (int): the projection head output dimension temperature (float): temperature to be applied to the logits use_double_precision (bool): whether to use double precision for the loss.

This could be a good idea to avoid NaNs.

normalize_last_layer (bool): whether to normalize the last layer num_iters (int): number of sinkhorn algorithm iterations to make epsilon (float): see the paper for details num_crops (int): number of crops used crops_for_assign (List[int]): what crops to use for assignment num_prototypes (List[int]): number of prototypes queue:

queue_length (int): number of features to store and used in the scores start_iter (int): when to start using the queue for the scores local_queue_length (int): length of queue per gpu

classmethod from_config(loss_config: vissl.utils.hydra_config.AttrDict)[source]

Instantiates SwAVMomentumLoss from configuration.

Parameters

loss_config – configuration for the loss

Returns

SwAVMomentumLoss instance.

initialize_queue()[source]
load_state_dict(state_dict, *args, **kwargs)[source]

Restore the loss state given a checkpoint

Parameters

state_dict (serialized via torch.save) –

forward(output: torch.Tensor, *args, **kwargs)[source]
distributed_sinkhornknopp(Q: torch.Tensor)[source]

Apply the distributed sinknorn optimization on the scores matrix to find the assignments

update_emb_queue()[source]
compute_queue_scores(head)[source]

vissl.losses.moco_loss

class vissl.losses.moco_loss.MoCoLossConfig(embedding_dim, queue_size, momentum, temperature)[source]

Bases: vissl.losses.moco_loss._MoCoLossConfig

Settings for the MoCo loss

static defaults()vissl.losses.moco_loss.MoCoLossConfig[source]
class vissl.losses.moco_loss.MoCoLoss(config: vissl.losses.moco_loss.MoCoLossConfig)[source]

Bases: classy_vision.losses.classy_loss.ClassyLoss

This is the loss which was proposed in the “Momentum Contrast for Unsupervised Visual Representation Learning” paper, from Kaiming He et al. See http://arxiv.org/abs/1911.05722 for details and https://github.com/facebookresearch/moco for a reference implementation, reused here

Config params:

embedding_dim (int): head output output dimension queue_size (int): number of elements in queue momentum (float): encoder momentum value for the update temperature (float): temperature to use on the logits

classmethod from_config(config: vissl.losses.moco_loss.MoCoLossConfig)[source]

Instantiates MoCoLoss from configuration.

Parameters

loss_config – configuration for the loss

Returns

MoCoLoss instance.

forward(query: torch.Tensor, *args, **kwargs)torch.Tensor[source]

Given the encoder queries, the key and the queue of the previous queries, compute the cross entropy loss for this batch

Parameters

query – output of the encoder given the current batch

Returns

loss

load_state_dict(state_dict, *args, **kwargs)[source]

Restore the loss state given a checkpoint

Parameters

state_dict (serialized via torch.save) –

vissl.losses.nce_loss

class vissl.losses.nce_loss.NCELossWithMemory(loss_config: vissl.utils.hydra_config.AttrDict)[source]

Bases: classy_vision.losses.classy_loss.ClassyLoss

Distributed version of the NCE loss. It performs an “all_gather” to gather the allocated buffers like memory no a single gpu. For this, Pytorch distributed backend is used. If using NCCL, one must ensure that all the buffer are on GPU. This class supports training using both NCE and CrossEntropy (InfoNCE).

This loss is used by NPID (https://arxiv.org/pdf/1805.01978.pdf), NPID++ and PIRL (https://arxiv.org/abs/1912.01991) approaches.

Written by: Ishan Misra (imisra@fb.com)

Config params:

norm_embedding (bool): whether to normalize embeddings temperature (float): the temperature to apply to logits norm_constant (int): Z parameter in the NCEAverage update_mem_with_emb_index (int): In case we have multiple embeddings used

in the nce loss, specify which embedding to use to update the memory.

loss_type (str): options are “nce” | “cross_entropy”. Using the

cross_entropy turns the loss into InfoNCE loss.

loss_weights (List[float]): if the NCE loss is computed between multiple pairs,

we can set a loss weight per term can be used to weight different pair contributions differently

negative_sampling_params:

num_negatives (int): how many negatives to contrast with type (str): how to select the negatives. options “random”

memory_params:
memory_size (int): number of training samples as all the samples are

stored in memory

embedding_dim (int): the projection head output dimension momentum (int): momentum to use to update the memory norm_init (bool): whether to L2 normalize the initialized memory bank update_mem_on_forward (bool): whether to update memory on the forward pass

num_train_samples (int): number of unique samples in the training dataset

classmethod from_config(loss_config: vissl.utils.hydra_config.AttrDict)[source]

Instantiates NCELossWithMemory from configuration.

Parameters

loss_config – configuration for the loss

Returns

NCELossWithMemory instance.

forward(output: Union[torch.Tensor, List[torch.Tensor]], target: torch.Tensor)[source]

For each output and single target, loss is calculated.

sync_memory()[source]

Sync memory across all processes before first forward pass. Only needed in the distributed case. After the first forward pass, the update_memory function in NCEAverage does a gather over all embeddings, so memory stays in sync. Doing a gather over embeddings is O(batch size). Syncing memory is O(num items in memory). Generally, batch size << num items in memory. So, we prefer doing the syncs in update_memory.

update_memory(embedding, y)[source]
class vissl.losses.nce_loss.NCEAverage(memory_params, negative_sampling_params, T=0.07, Z=- 1, loss_type='nce')[source]

Bases: torch.nn.modules.module.Module

Computes the scores of the model embeddings against the `positive’ and `negative’ samples from the Memory Bank. This class does NOT compute the actual loss, just the scores, i.e., inner products followed by normalizations/exponentiation etc.

forward(embedding, y, idx=None, update_memory_on_forward=None)[source]
compute_partition_function(out)[source]
do_negative_sampling(embedding, y, num_negatives)[source]
setup_negative_sampling(negative_sampling_params)[source]
init_memory(memory_params)[source]
update_memory(embedding, y)[source]
class vissl.losses.nce_loss.AliasMethod(probs)[source]

Bases: torch.nn.modules.module.Module

A fast way to sample from a multinomial distribution. Faster than torch.multinomial or np.multinomial. The setup (__init__) for this class is slow, however `draw’ (actual sampling) is fast.

draw(N)[source]

Draw N samples from multinomial :param N: number of samples :return: samples

class vissl.losses.nce_loss.NumpySampler(high)[source]

Bases: object

draw(num_negatives)[source]
class vissl.losses.nce_loss.NCECriterion(nLem)[source]

Bases: torch.nn.modules.module.Module

forward(x, targets)[source]

vissl.losses.deepclusterv2_loss

class vissl.losses.deepclusterv2_loss.DeepClusterV2Loss(loss_config: vissl.utils.hydra_config.AttrDict)[source]

Bases: classy_vision.losses.classy_loss.ClassyLoss

Loss used for DeepClusterV2 approach as provided in SwAV paper https://arxiv.org/abs/2006.09882

Config params:

DROP_LAST (bool): automatically inferred from DATA.TRAIN.DROP_LAST BATCHSIZE_PER_REPLICA (int): 256 # automatically inferred from

DATA.TRAIN.BATCHSIZE_PER_REPLICA

num_crops (int): 2 # automatically inferred from DATA.TRAIN.TRANSFORMS temperature (float): 0.1 num_clusters (List[int]): [3000, 3000, 3000] kmeans_iters (int): 10 crops_for_mb: [0] embedding_dim: 128 num_train_samples (int): -1 # @auto-filled

classmethod from_config(loss_config: vissl.utils.hydra_config.AttrDict)[source]

Instantiates DeepClusterV2Loss from configuration.

Parameters

loss_config – configuration for the loss

Returns

DeepClusterV2Loss instance.

forward(output: torch.Tensor, idx: int)[source]
init_memory(dataloader, model)[source]
update_memory_bank(emb, idx)[source]
cluster_memory()[source]

vissl.losses.cross_entropy_multiple_output_single_target

class vissl.losses.cross_entropy_multiple_output_single_target.CrossEntropyMultipleOutputSingleTargetLoss(loss_config: vissl.utils.hydra_config.AttrDict)[source]

Bases: classy_vision.losses.classy_loss.ClassyLoss

Intializer for the sum cross-entropy loss. For a single tensor, this is equivalent to the cross-entropy loss. For a list of tensors, this computes the sum of the cross-entropy losses for each tensor in the list against the target.

Config params:

weight: weight of sample, optional ignore_index: sample should be ignored for loss, optional reduction: specifies reduction to apply to the output, optional temperature: specify temperature for softmax. Default 1.0

classmethod from_config(loss_config: vissl.utils.hydra_config.AttrDict)[source]

Instantiates CrossEntropyMultipleOutputSingleTargetLoss from configuration.

Parameters

loss_config – configuration for the loss

Returns

CrossEntropyMultipleOutputSingleTargetLoss instance.

forward(output: Union[torch.Tensor, List[torch.Tensor]], target: torch.Tensor)[source]

For each output and single target, loss is calculated. The returned loss value is the sum loss across all outputs.