vissl.losses package¶
vissl.losses.simclr_info_nce_loss¶
-
class
vissl.losses.simclr_info_nce_loss.
SimclrInfoNCELoss
(loss_config: vissl.utils.hydra_config.AttrDict, device: str = 'gpu')[source]¶ Bases:
classy_vision.losses.classy_loss.ClassyLoss
This is the loss which was proposed in SimCLR https://arxiv.org/abs/2002.05709 paper. See the paper for the details on the loss.
- Config params:
temperature (float): the temperature to be applied on the logits buffer_params:
world_size (int): total number of trainers in training embedding_dim (int): output dimensions of the features projects effective_batch_size (int): total batch size used (includes positives)
-
classmethod
from_config
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Instantiates SimclrInfoNCELoss from configuration.
- Parameters
loss_config – configuration for the loss
- Returns
SimclrInfoNCELoss instance.
-
class
vissl.losses.simclr_info_nce_loss.
SimclrInfoNCECriterion
(buffer_params, temperature: float)[source]¶ Bases:
torch.nn.modules.module.Module
The criterion corresponding to the SimCLR loss as defined in the paper https://arxiv.org/abs/2002.05709.
- Parameters
temperature (float) – the temperature to be applied on the logits
buffer_params – world_size (int): total number of trainers in training embedding_dim (int): output dimensions of the features projects effective_batch_size (int): total batch size used (includes positives)
-
precompute_pos_neg_mask
()[source]¶ We precompute the positive and negative masks to speed up the loss calculation
-
forward
(embedding: torch.Tensor)[source]¶ Calculate the loss. Operates on embeddings tensor.
-
static
gather_embeddings
(embedding: torch.Tensor)[source]¶ Do a gather over all embeddings, so we can compute the loss. Final shape is like: (batch_size * num_gpus) x embedding_dim
vissl.losses.multicrop_simclr_info_nce_loss¶
-
class
vissl.losses.multicrop_simclr_info_nce_loss.
MultiCropSimclrInfoNCELoss
(loss_config: vissl.utils.hydra_config.AttrDict, device: str = 'gpu')[source]¶ Bases:
vissl.losses.simclr_info_nce_loss.SimclrInfoNCELoss
Expanded version of the SimCLR loss. The SimCLR loss works only on 2 positives. We expand the loss to work for more positives following the multi-crop augmentation proposed in SwAV paper. See SwAV paper https://arxiv.org/abs/2006.09882 for the multi-crop augmentation details.
- Config params:
temperature (float): the temperature to be applied on the logits num_crops (int): number of positives used buffer_params:
world_size (int): total number of trainers in training embedding_dim (int): output dimensions of the features projects effective_batch_size (int): total batch size used (includes positives)
-
class
vissl.losses.multicrop_simclr_info_nce_loss.
MultiCropSimclrInfoNCECriterion
(buffer_params, temperature: float, num_crops: int)[source]¶ Bases:
vissl.losses.simclr_info_nce_loss.SimclrInfoNCECriterion
The criterion corresponding to the expandion SimCLR loss (as defined in the paper https://arxiv.org/abs/2002.05709) using the multi-crop augmentaion proposed in SwAV paper. The multi-crop augmentation allows using more positives per image.
- Parameters
temperature (float) – the temperature to be applied on the logits
num_crops (int) – number of positives
buffer_params – world_size (int): total number of trainers in training embedding_dim (int): output dimensions of the features projects effective_batch_size (int): total batch size used (includes positives)
-
precompute_pos_neg_mask
()[source]¶ We precompute the positive and negative masks to speed up the loss calculation
-
forward
(embedding: torch.Tensor)[source]¶ Calculate the loss. Operates on embeddings tensor.
vissl.losses.swav_loss¶
-
class
vissl.losses.swav_loss.
SwAVLoss
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Bases:
classy_vision.losses.classy_loss.ClassyLoss
This loss is proposed by the SwAV paper https://arxiv.org/abs/2006.09882 by Caron et al. See the paper for more details about the loss.
- Config params:
embedding_dim (int): the projection head output dimension temperature (float): temperature to be applied to the logits use_double_precision (bool): whether to use double precision for the loss.
This could be a good idea to avoid NaNs.
normalize_last_layer (bool): whether to normalize the last layer num_iters (int): number of sinkhorn algorithm iterations to make epsilon (float): see the paper for details num_crops (int): number of crops used crops_for_assign (List[int]): what crops to use for assignment num_prototypes (List[int]): number of prototypes temp_hard_assignment_iters (int): whether to do hard assignment for the initial
few iterations
- output_dir (str): for dumping the debugging info in case loss
becomes NaN
- queue:
queue_length (int): number of features to store and used in the scores start_iter (int): when to start using the queue for the scores local_queue_length (int): length of queue per gpu
-
classmethod
from_config
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Instantiates SwAVLoss from configuration.
- Parameters
loss_config – configuration for the loss
- Returns
SwAVLoss instance.
-
forward
(output: torch.Tensor, target: torch.Tensor)[source]¶
-
class
vissl.losses.swav_loss.
SwAVCriterion
(temperature: float, crops_for_assign: List[int], num_crops: int, num_iters: int, epsilon: float, use_double_prec: bool, num_prototypes: List[int], local_queue_length: int, embedding_dim: int, temp_hard_assignment_iters: int, output_dir: str)[source]¶ Bases:
torch.nn.modules.module.Module
This criterion is used by the SwAV paper https://arxiv.org/abs/2006.09882 by Caron et al. See the paper for more details about the loss.
- Config params:
embedding_dim (int): the projection head output dimension temperature (float): temperature to be applied to the logits
num_iters (int): number of sinkhorn algorithm iterations to make epsilon (float): see the paper for details num_crops (int): number of crops used crops_for_assign (List[int]): what crops to use for assignment num_prototypes (List[int]): number of prototypes temp_hard_assignment_iters (int): whether to do hard assignment for the initial
few iterations
- output_dir (str): for dumping the debugging info in case loss
becomes NaN
local_queue_length (int): length of queue per gpu
-
distributed_sinkhornknopp
(Q: torch.Tensor)[source]¶ Apply the distributed sinknorn optimization on the scores matrix to find the assignments
-
forward
(scores: torch.Tensor, head_id: int)[source]¶
vissl.losses.bce_logits_multiple_output_single_target¶
-
class
vissl.losses.bce_logits_multiple_output_single_target.
BCELogitsMultipleOutputSingleTargetLoss
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Bases:
classy_vision.losses.classy_loss.ClassyLoss
-
__init__
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Intializer for the sum cross-entropy loss. For a single tensor, this is equivalent to the cross-entropy loss. For a list of tensors, this computes the sum of the cross-entropy losses for each tensor in the list against the target.
- Config params:
reduction: specifies reduction to apply to the output, optional normalize_output: Whether to L2 normalize the outputs world_size: total number of gpus in training. automatically inferred by vissl
-
classmethod
from_config
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Instantiates BCELogitsMultipleOutputSingleTargetLoss from configuration.
- Parameters
loss_config – configuration for the loss
- Returns
BCELogitsMultipleOutputSingleTargetLoss instance.
-
forward
(output: Union[torch.Tensor, List[torch.Tensor]], target: torch.Tensor)[source]¶ For each output and single target, loss is calculated. The returned loss value is the sum loss across all outputs.
-
vissl.losses.swav_momentum_loss¶
-
class
vissl.losses.swav_momentum_loss.
SwAVMomentumLoss
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Bases:
classy_vision.losses.classy_loss.ClassyLoss
This loss extends the SwAV loss proposed in paper https://arxiv.org/abs/2006.09882 by Caron et al. The loss combines the benefits of using the SwAV approach with the momentum encoder as used in MoCo.
- Config params:
momentum (float): for the momentum encoder momentum_eval_mode_iter_start (int): from what iteration should the momentum encoder
network be in eval mode
embedding_dim (int): the projection head output dimension temperature (float): temperature to be applied to the logits use_double_precision (bool): whether to use double precision for the loss.
This could be a good idea to avoid NaNs.
normalize_last_layer (bool): whether to normalize the last layer num_iters (int): number of sinkhorn algorithm iterations to make epsilon (float): see the paper for details num_crops (int): number of crops used crops_for_assign (List[int]): what crops to use for assignment num_prototypes (List[int]): number of prototypes queue:
queue_length (int): number of features to store and used in the scores start_iter (int): when to start using the queue for the scores local_queue_length (int): length of queue per gpu
-
classmethod
from_config
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Instantiates SwAVMomentumLoss from configuration.
- Parameters
loss_config – configuration for the loss
- Returns
SwAVMomentumLoss instance.
-
load_state_dict
(state_dict, *args, **kwargs)[source]¶ Restore the loss state given a checkpoint
- Parameters
state_dict (serialized via torch.save) –
-
forward
(output: torch.Tensor, *args, **kwargs)[source]¶
-
distributed_sinkhornknopp
(Q: torch.Tensor)[source]¶ Apply the distributed sinknorn optimization on the scores matrix to find the assignments
vissl.losses.moco_loss¶
-
class
vissl.losses.moco_loss.
MoCoLossConfig
(embedding_dim, queue_size, momentum, temperature)[source]¶ Bases:
vissl.losses.moco_loss._MoCoLossConfig
Settings for the MoCo loss
-
static
defaults
() → vissl.losses.moco_loss.MoCoLossConfig[source]¶
-
static
-
class
vissl.losses.moco_loss.
MoCoLoss
(config: vissl.losses.moco_loss.MoCoLossConfig)[source]¶ Bases:
classy_vision.losses.classy_loss.ClassyLoss
This is the loss which was proposed in the “Momentum Contrast for Unsupervised Visual Representation Learning” paper, from Kaiming He et al. See http://arxiv.org/abs/1911.05722 for details and https://github.com/facebookresearch/moco for a reference implementation, reused here
- Config params:
embedding_dim (int): head output output dimension queue_size (int): number of elements in queue momentum (float): encoder momentum value for the update temperature (float): temperature to use on the logits
-
classmethod
from_config
(config: vissl.losses.moco_loss.MoCoLossConfig)[source]¶ Instantiates MoCoLoss from configuration.
- Parameters
loss_config – configuration for the loss
- Returns
MoCoLoss instance.
-
forward
(query: torch.Tensor, *args, **kwargs) → torch.Tensor[source]¶ Given the encoder queries, the key and the queue of the previous queries, compute the cross entropy loss for this batch
- Parameters
query – output of the encoder given the current batch
- Returns
loss
vissl.losses.nce_loss¶
-
class
vissl.losses.nce_loss.
NCELossWithMemory
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Bases:
classy_vision.losses.classy_loss.ClassyLoss
Distributed version of the NCE loss. It performs an “all_gather” to gather the allocated buffers like memory no a single gpu. For this, Pytorch distributed backend is used. If using NCCL, one must ensure that all the buffer are on GPU. This class supports training using both NCE and CrossEntropy (InfoNCE).
This loss is used by NPID (https://arxiv.org/pdf/1805.01978.pdf), NPID++ and PIRL (https://arxiv.org/abs/1912.01991) approaches.
Written by: Ishan Misra (imisra@fb.com)
- Config params:
norm_embedding (bool): whether to normalize embeddings temperature (float): the temperature to apply to logits norm_constant (int): Z parameter in the NCEAverage update_mem_with_emb_index (int): In case we have multiple embeddings used
in the nce loss, specify which embedding to use to update the memory.
- loss_type (str): options are “nce” | “cross_entropy”. Using the
cross_entropy turns the loss into InfoNCE loss.
- loss_weights (List[float]): if the NCE loss is computed between multiple pairs,
we can set a loss weight per term can be used to weight different pair contributions differently
- negative_sampling_params:
num_negatives (int): how many negatives to contrast with type (str): how to select the negatives. options “random”
- memory_params:
- memory_size (int): number of training samples as all the samples are
stored in memory
embedding_dim (int): the projection head output dimension momentum (int): momentum to use to update the memory norm_init (bool): whether to L2 normalize the initialized memory bank update_mem_on_forward (bool): whether to update memory on the forward pass
num_train_samples (int): number of unique samples in the training dataset
-
classmethod
from_config
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Instantiates NCELossWithMemory from configuration.
- Parameters
loss_config – configuration for the loss
- Returns
NCELossWithMemory instance.
-
forward
(output: Union[torch.Tensor, List[torch.Tensor]], target: torch.Tensor)[source]¶ For each output and single target, loss is calculated.
-
sync_memory
()[source]¶ Sync memory across all processes before first forward pass. Only needed in the distributed case. After the first forward pass, the update_memory function in NCEAverage does a gather over all embeddings, so memory stays in sync. Doing a gather over embeddings is O(batch size). Syncing memory is O(num items in memory). Generally, batch size << num items in memory. So, we prefer doing the syncs in update_memory.
-
class
vissl.losses.nce_loss.
NCEAverage
(memory_params, negative_sampling_params, T=0.07, Z=- 1, loss_type='nce')[source]¶ Bases:
torch.nn.modules.module.Module
Computes the scores of the model embeddings against the `positive’ and `negative’ samples from the Memory Bank. This class does NOT compute the actual loss, just the scores, i.e., inner products followed by normalizations/exponentiation etc.
vissl.losses.deepclusterv2_loss¶
-
class
vissl.losses.deepclusterv2_loss.
DeepClusterV2Loss
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Bases:
classy_vision.losses.classy_loss.ClassyLoss
Loss used for DeepClusterV2 approach as provided in SwAV paper https://arxiv.org/abs/2006.09882
- Config params:
DROP_LAST (bool): automatically inferred from DATA.TRAIN.DROP_LAST BATCHSIZE_PER_REPLICA (int): 256 # automatically inferred from
DATA.TRAIN.BATCHSIZE_PER_REPLICA
num_crops (int): 2 # automatically inferred from DATA.TRAIN.TRANSFORMS temperature (float): 0.1 num_clusters (List[int]): [3000, 3000, 3000] kmeans_iters (int): 10 crops_for_mb: [0] embedding_dim: 128 num_train_samples (int): -1 # @auto-filled
-
classmethod
from_config
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Instantiates DeepClusterV2Loss from configuration.
- Parameters
loss_config – configuration for the loss
- Returns
DeepClusterV2Loss instance.
-
forward
(output: torch.Tensor, idx: int)[source]¶
vissl.losses.cross_entropy_multiple_output_single_target¶
-
class
vissl.losses.cross_entropy_multiple_output_single_target.
CrossEntropyMultipleOutputSingleTargetLoss
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Bases:
classy_vision.losses.classy_loss.ClassyLoss
Intializer for the sum cross-entropy loss. For a single tensor, this is equivalent to the cross-entropy loss. For a list of tensors, this computes the sum of the cross-entropy losses for each tensor in the list against the target.
- Config params:
weight: weight of sample, optional ignore_index: sample should be ignored for loss, optional reduction: specifies reduction to apply to the output, optional temperature: specify temperature for softmax. Default 1.0
-
classmethod
from_config
(loss_config: vissl.utils.hydra_config.AttrDict)[source]¶ Instantiates CrossEntropyMultipleOutputSingleTargetLoss from configuration.
- Parameters
loss_config – configuration for the loss
- Returns
CrossEntropyMultipleOutputSingleTargetLoss instance.
-
forward
(output: Union[torch.Tensor, List[torch.Tensor]], target: torch.Tensor)[source]¶ For each output and single target, loss is calculated. The returned loss value is the sum loss across all outputs.