vissl.data package

class vissl.data.GenericSSLDataset(cfg, split, dataset_source_map)[source]

Bases: torch.utils.data.dataset.Dataset

Base Self Supervised Learning Dataset Class.

The GenericSSLDataset class is defined to support reading data from multiple data sources. For example: data = [dataset1, dataset2] and the minibatches generated will have the corresponding data from each dataset.

For this reason, we also support labels from multiple sources. For example targets = [dataset1 targets, dataset2 targets].

In order to support multiple data sources, the dataset configuration always has list inputs.

  • DATA_SOURCES, LABEL_SOURCES, DATASET_NAMES, DATA_PATHS, LABEL_PATHS

For several data sources, we also support specifying on what dataset the transforms should be applied. By default, apply the transforms on data from all datasets.

Parameters
  • cfg (AttrDict) – configuration defined by user

  • split (str) – the dataset split for which we are constructing the Dataset object

  • dataset_source_map (Dict[str, Callable]) –

    The dictionary that maps what data sources are supported and what object to use to read data from those sources. For example: DATASET_SOURCE_MAP = {

    ”disk_filelist”: DiskImageDataset, “disk_folder”: DiskImageDataset, “synthetic”: SyntheticImageDataset,

    }

load_single_label_file(path)[source]

Load the single data file. We only support user specifying the numpy label files if user is specifying a data_filelist source of labels.

To save memory, if the mmap_mode is set to True for loading, we try to load the images in mmap_mode. If it fails, we simply load the labels without mmap

__getitem__(idx)[source]

Get the input sample for the minibatch for a specified data index. For each data object (if we are loading several datasets in a minibatch), we get the sample: consisting of {

  • image data,

  • label (if applicable) otherwise idx

  • data_valid: 0 or 1 indicating if the data is valid image

  • data_idx : index of the data in the dataset for book-keeping and debugging

}

Once the sample data is available, we apply the data transform on the sample.

The final transformed sample is returned to be added into the minibatch.

__len__()[source]

Size of the dataset. Assumption made there is only one data source

get_image_paths()[source]

Get the image paths for all the data sources.

Returns

image_paths (List[List[str]])

list containing image paths list for each

data source.

get_available_splits(dataset_config)[source]

Get the available splits in the dataset confir. Not specific to this split for which the SSLDataset is being constructed.

NOTE: this is deprecated method.

num_samples(source_idx=0)[source]

Size of the dataset. Assumption made there is only one data source

get_batchsize_per_replica()[source]

Get the batch size per trainer

get_global_batchsize()[source]

The global batch size across all the trainers

vissl.data.get_data_files(split, dataset_config)[source]
Get the path to the dataset (images and labels).
  1. If the user has explicitly specified the data_sources, we simply use those and don’t do lookup in the datasets registered with VISSL from the dataset catalog.

  2. If the user hasn’t specified the path, look for the dataset in the datasets catalog registered with VISSL. For a given list of datasets and a given partition (train/test), we first verify that we have the dataset and the correct source as specified by the user. Then for each dataset in the list, we get the data path (make sure it exists, sources match). For the label file, the file is optional.

Once we have the dataset original paths, we replace the path with the local paths if the data was copied to local disk.

vissl.data.register_datasets(json_catalog_path)[source]

If the json dataset_catalog file is found, we register the datasets specified in the catalog with VISSL. If the catalog also specified VOC or coco datasets, we resister them

Parameters

json_catalog_path (str) – the path to the json dataset catalog

class vissl.data.VisslDatasetCatalog[source]

Bases: object

A catalog that stores information about the datasets and how to obtain them. It contains a mapping from strings (which are names that identify a dataset, e.g. “imagenet1k”) to a dict which contains:

  1. mapping of various data splits (train, test, val) to the data source (path on the disk whether a folder path or a filelist)

  2. source of the data (disk_filelist | disk_folder)

The purpose of having this catalog is to make it easy to choose different datasets, by just using the strings in the config.

static register_json(json_catalog_path)[source]
Parameters

filepath – a .json filepath that contains the data to be registered

static register_dict(dict_catalog)[source]
Parameters

dict – a dict with a bunch of datasets to be registered

static register_data(name, data_dict)[source]
Parameters
  • name (str) – the name that identifies a dataset, e.g. “imagenet1k_folder”.

  • func (callable) – a callable which takes no arguments and returns a list of dicts. It must return the same results if called multiple times.

static get(name)[source]

Get the registered dict and return it.

Parameters

name (str) – the name that identifies a dataset, e.g. “imagenet1k”.

Returns

dict – dataset information (paths, source)

static list() → List[str][source]

List all registered datasets.

Returns

list[str]

static clear()[source]

Remove all registered dataset.

static remove(name)[source]

Remove the dataset registered by name.

static has_data(name)[source]

Check whether the data with name exists.

vissl.data.collators module

vissl.data.collators.register_collator(name)[source]

Registers Self-Supervision data collators.

This decorator allows VISSL to add custom data collators, even if the collator itself is not part of VISSL. To use it, apply this decorator to a collator function, like this:

@register_collator('my_collator_name')
def my_collator_name():
    ...

To get a collator from a configuration file, see get_collator().

vissl.data.collators.get_collator(collator_name, collate_params)[source]

Given the collator name and the collator params, return the collator if registered with VISSL. Also supports pytorch default collators.

vissl.data.collators.mixup_collator module

vissl.data.collators.mixup_collator.multicrop_mixup_collator(batch)[source]

This collator is used to mix-up 2 images at a time. 2*N input images becomes N images This collator can handle multi-crop input. For each crop, it mixes-up the corresponding crop of the next image.

Input:
batch: Example
batch = [

{“data” : [img1_0, …, img1_k], ..}, {“data” : [img2_0, …, img2_k], …}, … {“data” : [img2N_0, …, img2N_k], …},

]

Returns: Example output:
output = [
{
“data”: [

torch.tensor([img1_2_0, …, img1_2_k]), torch.tensor([img3_4_0, …, img3_4_k]) …

]

},

]

vissl.data.collators.moco_collator module

vissl.data.collators.moco_collator.moco_collator(batch: List[Dict[str, Any]]) → Dict[str, List[torch.Tensor]][source]

This collator is specific to MoCo approach http://arxiv.org/abs/1911.05722

The collators collates the batch for the following input (assuming k-copies of image):

Input:
batch: Example
batch = [

{“data” : [img1_0, …, img1_k], ..}, {“data” : [img2_0, …, img2_k], …}, …

]

Returns: Example output:
output = [
{

“data”: torch.tensor([img1_0, …, img1_k], [img2_0, …, img2_k]) ..

},

]

Dimensions become [num_positives x Batch x C x H x W]

vissl.data.collators.multicrop_collator module

vissl.data.collators.multicrop_collator.multicrop_collator(batch)[source]

This collator is used in SwAV approach.

The collators collates the batch for the following input (assuming k-copies of image):

Input:
batch: Example
batch = [

{“data” : [img1_0, …, img1_k], ..}, {“data” : [img2_0, …, img2_k], …}, …

]

Returns: Example output:
output = [
{

“data”: torch.tensor([img1_0, …, imgN_0], [img1_k, …, imgN_k]) ..

},

]

vissl.data.collators.patch_and_image_collator module

vissl.data.collators.patch_and_image_collator.patch_and_image_collator(batch)[source]

This collator is used in PIRL approach.

batch contains two keys “data” and “label”.
  • data is a list of N+1 elements. 1st element is the “image” and remainder N are patches.

  • label is an integer (image index in the dataset)

We collate this to

image: batch_size tensor containing images patches: N * batch_size tensor containing patches

vissl.data.collators.siamese_collator module

vissl.data.collators.siamese_collator.siamese_collator(batch)[source]

This collator is used in Jigsaw approach.

Input:
batch: Example
batch = [

{“data”: [img1,], “label”: [lbl1, ]}, #img1 {“data”: [img2,], “label”: [lbl2, ]}, #img2 . . {“data”: [imgN,], “label”: [lblN, ]}, #imgN

]

where:

img{x} is a tensor of size: num_towers x C x H x W lbl{x} is an integer

Returns: Example output:
output = [
{

“data”: torch.tensor([img1_0, …, imgN_0]) ..

},

] where the output is of dimension: (N * num_towers) x C x H x W

vissl.data.collators.simclr_collator module

vissl.data.collators.simclr_collator.simclr_collator(batch)[source]

This collator is used in SimCLR approach.

The collators collates the batch for the following input (each image has k-copies):

input: [[img1_0, …, img1_k], [img2_0, …, img2_k], …, [imgN_0, …, imgN_k]] output: [img1_0, img2_0, ….., img1_1, img2_1,…]

Input:
batch: Example
batch = [

{“data”: [img1_0, …, img1_k], “label”: [lbl1, ]}, #img1 {“data”: [img2_0, …, img2_k], “label”: [lbl2, ]}, #img2 . . {“data”: [imgN_0, …, imgN_k], “label”: [lblN, ]}, #imgN

]

where:

img{x} is a tensor of size: C x H x W lbl{x} is an integer

Returns: Example output:
output = [
{

“data”: torch.tensor([img1_0, img2_0, ….., img1_1, img2_1,…]) ..

},

]

vissl.data.collators.targets_one_hot_default_collator module

vissl.data.collators.targets_one_hot_default_collator.convert_to_one_hot(pos_lbl, neg_lbl, num_classes: int)torch.Tensor[source]

This function converts target class indices to one-hot vectors, given the number of classes.

-> 1 for positive labels, -> 0 for negative and -> -1 for ignore labels.

vissl.data.collators.targets_one_hot_default_collator.targets_one_hot_default_collator(batch, num_classes: int)[source]

The collators collates the batch for the following input:

Input:

input : [[img0, …, imgk]] label: [

[[1, 3, 6], [4, 9]] [[1, 5], [6, 8, 10, 11]] …..

]

Output:

output: [img0, img0, …..,] label: [[0, 1, 0, 1, …, -1, 0, 0, 1], [0, 1, 0, 0, 0, 1, 0], ….]

vissl.data.ssl_transforms module

class vissl.data.ssl_transforms.SSLTransformsWrapper(indices, **args)[source]

Bases: classy_vision.dataset.transforms.classy_transform.ClassyTransform

VISSL wraps around transforms so that they work with the multimodal input. VISSL supports batches that come from several datasets and sources. Hence the input batch (images, labels) always is a list.

To apply the user defined transforms, VISSL takes “indices” as input which defines on what dataset/source data in the sample should the transform be applied to. For example:

Assuming input sample is {

“data”: [dataset1_imgX, dataset2_imgY], “label”: [dataset1_lblX, dataset2_lblY]

} and the transform is:

TRANSFORMS:
  • name: RandomGrayscale p: 0.2 indices: 0

then the transform is applied only on dataset1_imgX. If however, the indices are either not specified or set to 0, 1 then the transform is applied on both dataset1_imgX and dataset2_imgY

Since this structure of data is introduced by vissl, the SSLTransformsWrapper takes care of dealing with the multi-modality input by wrapping the original transforms (pytorch transforms or custom transforms defined by user) and calling each transform on each index.

VISSL also supports _TRANSFORMS_WITH_LABELS transforms that modify the label or are used to generate the labels used in self-supervised learning tasks like Jigsaw. When the transforms in _TRANSFORMS_WITH_LABELS are called, the new label is also returned besides the transformed image.

VISSL also supports the _TRANSFORMS_WITH_COPIES which are transforms that basically generate several copies of image. Common example of self-supervised training methods that do this is SimCLR, SwAV, MoCo etc When a transform from _TRANSFORMS_WITH_COPIES is used, the SSLTransformsWrapper will flatten the transform output. For example for the input [img1], if we apply ImgReplicatePil to replicate the image 2 times:

SSLTransformsWrapper(

ImgReplicatePil(num_times=2), [img1]

) will output [img1_1, img1_2] instead of nested list [[img1_1, img1_2]].

The benefit of this is that the next set of transforms specified by user can now operate on img1_1 and img1_2 as the input becomes multi-modal nature.

VISSL also supports _TRANSFORMS_WITH_GROUPING which essentially means that a single transform should be applied on the full multi-modal input together instead of separately. This is common transform used in BYOL/ For example:

SSLTransformsWrapper(
ImgPilMultiCropRandomApply(

RandomApply, prob=[0.0, 0.2]

), [img1_1, img1_2]

) this will apply RandomApply on img1_1 with prob=0.0 and on img1_2 with prob=0.2

__init__(indices, **args)[source]
Parameters
  • indices (List[int]) (Optional) – the indices list on which transform should be applied for the input which is always a list Example: minibatch of size=2 looks like [[img1], [img2]]). If indices is not specified, transform is applied to all the multi-modal input.

  • args (dict) – the arguments that the transform takes

__call__(sample)[source]

Apply each transform on the specified indices of each entry in the input sample.

classmethod from_config(config: Dict[str, Any])vissl.data.ssl_transforms.SSLTransformsWrapper[source]
vissl.data.ssl_transforms.get_transform(input_transforms_list)[source]

Given the list of user specified transforms, return the torchvision.transforms.Compose() version of the transforms. Each transform in the composition is SSLTransformsWrapper which wraps the original transforms to handle multi-modal nature of input.

vissl.data.ssl_transforms.img_patches_tensor module

class vissl.data.ssl_transforms.img_patches_tensor.ImgPatchesFromTensor(num_patches=9, patch_jitter=21)[source]

Bases: classy_vision.dataset.transforms.classy_transform.ClassyTransform

Create image patches from a torch Tensor or numpy array. This transform was proposed in Jigsaw - https://arxiv.org/abs/1603.09246

Parameters
  • num_patches (int) – how many image patches to create

  • patch_jitter (int) – space to leave between patches

__call__(image)[source]

Input image which is a torch.Tensor object of shape 3 x H x W

classmethod from_config(config: Dict[str, Any])vissl.data.ssl_transforms.img_patches_tensor.ImgPatchesFromTensor[source]

Instantiates ImgPatchesFromTensor from configuration.

Parameters

config (Dict) – arguments for for the transform

Returns

ImgPatchesFromTensor instance.

vissl.data.ssl_transforms.img_pil_color_distortion module

class vissl.data.ssl_transforms.img_pil_color_distortion.ImgPilColorDistortion(strength)[source]

Bases: classy_vision.dataset.transforms.classy_transform.ClassyTransform

Apply Random color distortions to the input image. There are multiple different ways of applying these distortions. This implementation follows SimCLR - https://arxiv.org/abs/2002.05709 It randomly distorts the hue, saturation, brightness of an image and can randomly convert the image to grayscale.

__init__(strength)[source]
Parameters

strength (float) – A number used to quantify the strength of the color distortion.

classmethod from_config(config: Dict[str, Any])vissl.data.ssl_transforms.img_pil_color_distortion.ImgPilColorDistortion[source]

Instantiates ImgPilColorDistortion from configuration.

Parameters

config (Dict) – arguments for for the transform

Returns

ImgPilColorDistortion instance.

vissl.data.ssl_transforms.img_pil_gaussian_blur module

class vissl.data.ssl_transforms.img_pil_gaussian_blur.ImgPilGaussianBlur(p, radius_min, radius_max)[source]

Bases: classy_vision.dataset.transforms.classy_transform.ClassyTransform

Apply Gaussian Blur to the PIL image. Take the radius and probability of application as the parameter.

This transform was used in SimCLR - https://arxiv.org/abs/2002.05709

__init__(p, radius_min, radius_max)[source]
Parameters
  • p (float) – probability of applying gaussian blur to the image

  • radius_min (float) – blur kernel minimum radius used by ImageFilter.GaussianBlur

  • radius_max (float) – blur kernel maximum radius used by ImageFilter.GaussianBlur

classmethod from_config(config: Dict[str, Any])vissl.data.ssl_transforms.img_pil_gaussian_blur.ImgPilGaussianBlur[source]

Instantiates ImgPilGaussianBlur from configuration.

Parameters

config (Dict) – arguments for for the transform

Returns

ImgPilGaussianBlur instance.

vissl.data.ssl_transforms.img_pil_multicrop_random_apply module

class vissl.data.ssl_transforms.img_pil_multicrop_random_apply.ImgPilMultiCropRandomApply(transforms: List[Dict[str, Any]], prob: float)[source]

Bases: classy_vision.dataset.transforms.classy_transform.ClassyTransform

Apply a list of transforms on multi-crop input. The transforms are Randomly applied to each crop using the specified probability. This is used in BYOL https://arxiv.org/pdf/2006.07733.pdf

Multi-crops are several crops of a given image. This is most commonly used in contrastive learning. For example SimCLR, SwAV approaches use multi-crop input.

__init__(transforms: List[Dict[str, Any]], prob: float)[source]
Parameters
  • transforms (List(tranforms)) – List of transforms that should be applied to each crop.

  • prob (List(float)) –

    Probability of RandomApply for the transforms composition on each crop. example: for 2 crop in BYOL, for solarization:

    prob = [0.0, 0.2]

classmethod from_config(config: Dict[str, Any])vissl.data.ssl_transforms.img_pil_multicrop_random_apply.ImgPilMultiCropRandomApply[source]

Instantiates ImgPilMultiCropRandomApply from configuration.

Parameters

config (Dict) – arguments for for the transform

Returns

ImgPilMultiCropRandomApply instance.

vissl.data.ssl_transforms.img_pil_random_color_jitter module

class vissl.data.ssl_transforms.img_pil_random_color_jitter.ImgPilRandomColorJitter(strength, prob)[source]

Bases: classy_vision.dataset.transforms.classy_transform.ClassyTransform

Apply Random color jitter to the input image. It randomly distorts the hue, saturation, brightness of an image.

__init__(strength, prob)[source]
Parameters
  • strength (float) – A number used to quantify the strength of the color distortion.

  • p (float) – probability of random application

classmethod from_config(config: Dict[str, Any])vissl.data.ssl_transforms.img_pil_random_color_jitter.ImgPilRandomColorJitter[source]

Instantiates ImgPilRandomColorJitter from configuration.

Parameters

config (Dict) – arguments for for the transform

Returns

ImgPilRandomColorJitter instance.

vissl.data.ssl_transforms.img_pil_random_photometric module

class vissl.data.ssl_transforms.img_pil_random_photometric.ImgPilRandomPhotometric(p)[source]

Bases: classy_vision.dataset.transforms.classy_transform.ClassyTransform

Randomly apply some photometric transforms to an image. This was used in PIRL - https://arxiv.org/abs/1912.01991

The photometric transforms applied includes:

AutoContrast, RandomPosterize, RandomSharpness, RandomSolarize

__init__(p)[source]
Parameters

p (float) – Probability of applying the transforms

classmethod from_config(config: Dict[str, Any])vissl.data.ssl_transforms.img_pil_random_photometric.ImgPilRandomPhotometric[source]

Instantiates ImgPilRandomPhotometric from configuration.

Parameters

config (Dict) – arguments for for the transform

Returns

ImgPilRandomPhotometric instance.

vissl.data.ssl_transforms.img_pil_random_solarize module

class vissl.data.ssl_transforms.img_pil_random_solarize.ImgPilRandomSolarize(prob: float)[source]

Bases: classy_vision.dataset.transforms.classy_transform.ClassyTransform

Randomly apply solarization transform to an image. This was used in BYOL - https://arxiv.org/abs/2006.07733

__init__(prob: float)[source]
Parameters

p (float) – Probability of applying the transform

classmethod from_config(config: Dict[str, Any])vissl.data.ssl_transforms.img_pil_random_solarize.ImgPilRandomSolarize[source]

Instantiates ImgPilRandomSolarize from configuration.

Parameters

config (Dict) – arguments for for the transform

Returns

ImgPilRandomSolarize instance.

vissl.data.ssl_transforms.img_pil_to_lab_tensor module

class vissl.data.ssl_transforms.img_pil_to_lab_tensor.ImgPil2LabTensor(indices)[source]

Bases: classy_vision.dataset.transforms.classy_transform.ClassyTransform

Convert a PIL image to LAB tensor of shape C x H x W This transform was proposed in Colorization - https://arxiv.org/abs/1603.08511

The input image is PIL Image. We first convert it to tensor HWC which has channel order RGB. We then convert the RGB to BGR and use OpenCV to convert the image to LAB. The LAB image is 8-bit image in range > L [0, 255], A [0, 255], B [0, 255]. We rescale it to: L [0, 100], A [-128, 127], B [-128, 127]

The output is image torch tensor.

classmethod from_config(config: Dict[str, Any])vissl.data.ssl_transforms.img_pil_to_lab_tensor.ImgPil2LabTensor[source]

Instantiates ImgPil2LabTensor from configuration.

Parameters

config (Dict) – arguments for for the transform

Returns

ImgPil2LabTensor instance.

vissl.data.ssl_transforms.img_pil_to_multicrop module

class vissl.data.ssl_transforms.img_pil_to_multicrop.ImgPilToMultiCrop(total_num_crops, num_crops, size_crops, crop_scales)[source]

Bases: classy_vision.dataset.transforms.classy_transform.ClassyTransform

Convert a PIL image to Multi-resolution Crops. The input is a PIL image and output is the list of image crops.

This transform was proposed in SwAV - https://arxiv.org/abs/2006.09882

__init__(total_num_crops, num_crops, size_crops, crop_scales)[source]

Returns total_num_crops square crops of an image. Each crop is a random crop extracted according to the parameters specified in size_crops and crop_scales. For ease of use, one can specify num_crops which removes the need to repeat parameters.

Parameters
  • total_num_crops (int) – Total number of crops to extract

  • num_crops (List or Tuple of ints) – Specifies the number of `type’ of crops.

  • size_crops (List or Tuple of ints) – Specifies the height (height = width) of each patch

  • crop_scales (List or Tuple containing [float, float]) – Scale of the crop

Example usage: - (total_num_crops=2, num_crops=[1, 1],

size_crops=[224, 96], crop_scales=[(0.14, 1.), (0.05, 0.14)]) Extracts 2 crops total of size 224x224 and 96x96

  • (total_num_crops=2, num_crops=[1, 2],

    size_crops=[224, 96], crop_scales=[(0.14, 1.), (0.05, 0.14)]) Extracts 3 crops total: 1 of size 224x224 and 2 of size 96x96

classmethod from_config(config: Dict[str, Any])vissl.data.ssl_transforms.img_pil_to_multicrop.ImgPilToMultiCrop[source]

Instantiates ImgPilToMultiCrop from configuration.

Parameters

config (Dict) – arguments for for the transform

Returns

ImgPilToMultiCrop instance.

vissl.data.ssl_transforms.img_pil_to_patches_and_image module

class vissl.data.ssl_transforms.img_pil_to_patches_and_image.ImgPilToPatchesAndImage(crop_scale_image=0.08, 1.0, crop_size_image=224, crop_scale_patches=0.6, 1.0, crop_size_patches=255, permute_patches=True, num_patches=9)[source]

Bases: classy_vision.dataset.transforms.classy_transform.ClassyTransform

Convert an input PIL image to Patches and Image This transform was proposed in PIRL - https://arxiv.org/abs/1912.01991.

Input:

PIL Image

Returns

list containing N+1 elements
  • zeroth element: a RandomResizedCrop of the image

  • remainder: N patches extracted uniformly from a RandomResizedCrop

__init__(crop_scale_image=0.08, 1.0, crop_size_image=224, crop_scale_patches=0.6, 1.0, crop_size_patches=255, permute_patches=True, num_patches=9)[source]
Parameters
  • crop_scale_image (tuple of floats) – scale for RandomResizedCrop of image

  • crop_size_image (int) – size for RandomResizedCrop of image

  • crop_scale_patches (tuple of floats) – scale for RandomResizedCrop of patches

  • crop_size_patches (int) – size for RandomResizedCrop of patches

  • permute_patches (bool) – permute the patches in any order

  • num_patches (int) – number of patches to create. should be a square integer.

classmethod from_config(config: Dict[str, Any])vissl.data.ssl_transforms.img_pil_to_patches_and_image.ImgPilToPatchesAndImage[source]

Instantiates ImgPilToPatchesAndImage from configuration.

Parameters

config (Dict) – arguments for for the transform

Returns

ImgPilToPatchesAndImage instance.

vissl.data.ssl_transforms.img_pil_to_raw_tensor module

class vissl.data.ssl_transforms.img_pil_to_raw_tensor.ImgPilToRawTensor[source]

Bases: classy_vision.dataset.transforms.classy_transform.ClassyTransform

Convert a PIL image to Raw Tensor if we don’t want to apply the default division by 255 by torchvision.transforms.ToTensor()

classmethod from_config(config: Dict[str, Any])vissl.data.ssl_transforms.img_pil_to_raw_tensor.ImgPilToRawTensor[source]

Instantiates ImgPilToRawTensor from configuration.

Parameters

config (Dict) – arguments for for the transform

Returns

ImgPilToRawTensor instance.

vissl.data.ssl_transforms.img_pil_to_tensor module

class vissl.data.ssl_transforms.img_pil_to_tensor.ImgToTensor[source]

Bases: classy_vision.dataset.transforms.classy_transform.ClassyTransform

The Transform that overrides the PyTorch transform to provide better transformation speed.

# credits: mannatsingh@fb.com

vissl.data.ssl_transforms.img_replicate_pil module

class vissl.data.ssl_transforms.img_replicate_pil.ImgReplicatePil(num_times: int = 2)[source]

Bases: classy_vision.dataset.transforms.classy_transform.ClassyTransform

Adds the same image multiple times to the batch K times so that the batch. Size is now N*K. Use the simclr_collator to convert into batches.

This transform is useful when generating multiple copies of the same image, for example, when training contrastive methods.

__init__(num_times: int = 2)[source]
Parameters

num_times (int) – how many times should the image be replicated.

classmethod from_config(config: Dict[str, Any])vissl.data.ssl_transforms.img_replicate_pil.ImgReplicatePil[source]

Instantiates ImgReplicatePil from configuration.

Parameters

config (Dict) – arguments for for the transform

Returns

ImgReplicatePil instance.

vissl.data.ssl_transforms.img_rotate_pil module

class vissl.data.ssl_transforms.img_rotate_pil.ImgRotatePil(num_angles=4, num_rotations_per_img=1)[source]

Bases: classy_vision.dataset.transforms.classy_transform.ClassyTransform

Apply rotation to a PIL Image. Samples rotation angle from a set of predefined rotation angles.

Predefined rotation angles are sampled at equal intervals in the [0, 360) angle space where the number of angles is specified by num_angles.

This transform was used in RotNet - https://arxiv.org/abs/1803.07728

__init__(num_angles=4, num_rotations_per_img=1)[source]
Parameters
  • num_angles (int) – Number of angles in the [0, 360) space

  • num_rotations_per_img (int) – Number of rotations to apply to each image.

classmethod from_config(config: Dict[str, Any])vissl.data.ssl_transforms.img_rotate_pil.ImgRotatePil[source]

Instantiates ImgRotatePil from configuration.

Parameters

config (Dict) – arguments for for the transform

Returns

ImgRotatePil instance.

vissl.data.ssl_transforms.pil_photometric_transforms_lib module

class vissl.data.ssl_transforms.pil_photometric_transforms_lib.TransformObject[source]

Bases: object

Helper object to that prints information about the transformation and other transforms can inherit from this.

class vissl.data.ssl_transforms.pil_photometric_transforms_lib.RandomValueApplier(min_v, max_v, root_transform, vtype='float', closed_interval=False)[source]

Bases: vissl.data.ssl_transforms.pil_photometric_transforms_lib.TransformObject

__init__(min_v, max_v, root_transform, vtype='float', closed_interval=False)[source]

Applies a transform by sampling a random value between [min_v, max_v]

Parameters
  • min_v (float or int) – minimum value

  • max_v (float or int) – maximum value

  • root_transform (transform object) – transform that will be applied. must accept a value as input.

  • vtype (string) – value type - either “float” or “int”

  • closed_interval (bool) – sample from [min_v, max_v] (when True) or [min_v, max_v) when False

sample_value()[source]

Randomly sample the value from min_v and max_v depending on float or int type and also whether to use open or closed interval for sampleing

vissl.data.ssl_transforms.pil_photometric_transforms_lib.Sharpness(img, v)[source]

Applies PIL.ImageEnhance.Sharpness to the image

vissl.data.ssl_transforms.pil_photometric_transforms_lib.Solarize(img, v)[source]

Applies PIL.ImageOps.solarize to the image

vissl.data.ssl_transforms.pil_photometric_transforms_lib.Posterize(img, v)[source]

Applies PIL.ImageOps.posterize to the image

vissl.data.ssl_transforms.pil_photometric_transforms_lib.AutoContrast(img, _)[source]

Applies PIL.ImageOps.autocontrast to the image

class vissl.data.ssl_transforms.pil_photometric_transforms_lib.RandomSharpnessTransform(min_v=0.1, max_v=1.9, root_transform=<function Sharpness>, vtype='float')[source]

Bases: vissl.data.ssl_transforms.pil_photometric_transforms_lib.RandomValueApplier

Randomly apply the Sharpness transformation with the random value selected from an interval.

__init__(min_v=0.1, max_v=1.9, root_transform=<function Sharpness>, vtype='float')[source]
Parameters
  • min_v (float) – minimum value

  • max_v (float) – maximum value

  • root_transform (transform object) – transform that will be applied. must accept a value as input.

  • vtype (string) – value type - “float”

class vissl.data.ssl_transforms.pil_photometric_transforms_lib.RandomPosterizeTransform(min_v=4, max_v=8, root_transform=<function Posterize>, vtype='int')[source]

Bases: vissl.data.ssl_transforms.pil_photometric_transforms_lib.RandomValueApplier

__init__(min_v=4, max_v=8, root_transform=<function Posterize>, vtype='int')[source]
Parameters
  • min_v (int) – minimum value

  • max_v (int) – maximum value

  • root_transform (transform object) – transform that will be applied. must accept a value as input.

  • vtype (string) – value type - “int”

class vissl.data.ssl_transforms.pil_photometric_transforms_lib.RandomSolarizeTransform(min_v=0, max_v=256, root_transform=<function Solarize>, vtype='int')[source]

Bases: vissl.data.ssl_transforms.pil_photometric_transforms_lib.RandomValueApplier

__init__(min_v=0, max_v=256, root_transform=<function Solarize>, vtype='int')[source]
Parameters
  • min_v (int) – minimum value

  • max_v (int) – maximum value

  • root_transform (transform object) – transform that will be applied. must accept a value as input.

  • vtype (string) – value type - “int”

class vissl.data.ssl_transforms.pil_photometric_transforms_lib.AutoContrastTransform[source]

Bases: vissl.data.ssl_transforms.pil_photometric_transforms_lib.TransformObject

Wraps the AutoContrast method

vissl.data.ssl_transforms.shuffle_img_patches module

class vissl.data.ssl_transforms.shuffle_img_patches.ShuffleImgPatches(perm_file: str)[source]

Bases: classy_vision.dataset.transforms.classy_transform.ClassyTransform

This transform is used to shuffle the list of tensors (usually image patches of shape C x H x W) according to a randomly selected permutation from a pre-defined set of permutations.

This is a common operation used in Jigsaw approach https://arxiv.org/abs/1603.09246

__init__(perm_file: str)[source]
Parameters

perm_file (string) – path to the file containing pre-defined permutations.

__call__(input_patches)[source]

The interface __call__ is used to transform the input data. It should contain the actual implementation of data transform.

Parameters

input_patches (List[torch.tensor]) – list of torch tensors

classmethod from_config(config: Dict[str, Any])vissl.data.ssl_transforms.shuffle_img_patches.ShuffleImgPatches[source]

Instantiates ShuffleImgPatches from configuration.

Parameters

config (Dict) – arguments for for the transform

Returns

ShuffleImgPatches instance.

vissl.data.data_helper module

vissl.data.data_helper.get_mean_image(crop_size)[source]

Helper function that returns a gray PIL image of the size specified by user.

Parameters

crop_size (int) – used to generate (crop_size x crop_size x 3) image.

Returns

img – PIL Image

class vissl.data.data_helper.StatefulDistributedSampler(dataset, batch_size=None)[source]

Bases: torch.utils.data.distributed.DistributedSampler

More fine-grained state DataSampler that uses training iteration and epoch both for shuffling data. PyTorch DistributedSampler only uses epoch for the shuffling and starts sampling data from the start. In case of training on very large data, we train for one epoch only and when we resume training, we want to resume the data sampler from the training iteration.

__init__(dataset, batch_size=None)[source]

Initializes the instance of StatefulDistributedSampler. Random seed is set for the epoch set and data is shuffled. For starting the sampling, use the start_iter (set to 0 or set by checkpointing resuming) to sample data from the remaining images.

Parameters
  • dataset (Dataset) – Pytorch dataset that sampler will shuffle

  • batch_size (int) – batch size we want the sampler to sample

set_start_iter(start_iter)[source]

Set the iteration number from which the sampling should start. This is used to find the marker in the data permutation order from where the sampler should start sampling.

class vissl.data.data_helper.QueueDataset(queue_size)[source]

Bases: torch.utils.data.dataset.Dataset

This class helps dealing with the invalid images in the dataset by using two queue. One queue is used to enqueue seen and valid images from previous batches. The other queue is used to dequeue. The class is implemented such that the same batch will never have duplicate images. If we can’t dequeue a valid image, we return None for that instance.

Parameters

queue_size – size the the queue (ideally set it to batch_size). Both queues will be of the same size

on_sucess(sample)[source]

If we encounter a successful image and the queue is not full, we store it in the queue. One consideration we make further is: if the image is very large, we don’t add it to the queue as otherwise the CPU memory will grow a lot.

on_failure()[source]

If there was a failure in getting the origin image, we look into the queue if there is any valid seen image available. If yes, we dequeue and use this image in place of the failed image.

vissl.data.dataloader_sync_gpu_wrapper module

class vissl.data.dataloader_sync_gpu_wrapper.DataloaderSyncGPUWrapper(dataloader: Iterable)[source]

Bases: classy_vision.dataset.dataloader_wrapper.DataloaderWrapper

Dataloader which wraps another dataloader, and moves the data to GPU in async manner so as to overlap the cost of copying data from cpu to gpu with the previous model iteration.

vissl.data.ssl_dataset module

class vissl.data.ssl_dataset.GenericSSLDataset(cfg, split, dataset_source_map)[source]

Bases: torch.utils.data.dataset.Dataset

Base Self Supervised Learning Dataset Class.

The GenericSSLDataset class is defined to support reading data from multiple data sources. For example: data = [dataset1, dataset2] and the minibatches generated will have the corresponding data from each dataset.

For this reason, we also support labels from multiple sources. For example targets = [dataset1 targets, dataset2 targets].

In order to support multiple data sources, the dataset configuration always has list inputs.

  • DATA_SOURCES, LABEL_SOURCES, DATASET_NAMES, DATA_PATHS, LABEL_PATHS

For several data sources, we also support specifying on what dataset the transforms should be applied. By default, apply the transforms on data from all datasets.

Parameters
  • cfg (AttrDict) – configuration defined by user

  • split (str) – the dataset split for which we are constructing the Dataset object

  • dataset_source_map (Dict[str, Callable]) –

    The dictionary that maps what data sources are supported and what object to use to read data from those sources. For example: DATASET_SOURCE_MAP = {

    ”disk_filelist”: DiskImageDataset, “disk_folder”: DiskImageDataset, “synthetic”: SyntheticImageDataset,

    }

load_single_label_file(path)[source]

Load the single data file. We only support user specifying the numpy label files if user is specifying a data_filelist source of labels.

To save memory, if the mmap_mode is set to True for loading, we try to load the images in mmap_mode. If it fails, we simply load the labels without mmap

__getitem__(idx)[source]

Get the input sample for the minibatch for a specified data index. For each data object (if we are loading several datasets in a minibatch), we get the sample: consisting of {

  • image data,

  • label (if applicable) otherwise idx

  • data_valid: 0 or 1 indicating if the data is valid image

  • data_idx : index of the data in the dataset for book-keeping and debugging

}

Once the sample data is available, we apply the data transform on the sample.

The final transformed sample is returned to be added into the minibatch.

__len__()[source]

Size of the dataset. Assumption made there is only one data source

get_image_paths()[source]

Get the image paths for all the data sources.

Returns

image_paths (List[List[str]])

list containing image paths list for each

data source.

get_available_splits(dataset_config)[source]

Get the available splits in the dataset confir. Not specific to this split for which the SSLDataset is being constructed.

NOTE: this is deprecated method.

num_samples(source_idx=0)[source]

Size of the dataset. Assumption made there is only one data source

get_batchsize_per_replica()[source]

Get the batch size per trainer

get_global_batchsize()[source]

The global batch size across all the trainers

vissl.data.disk_dataset module

class vissl.data.disk_dataset.DiskImageDataset(cfg, data_source, path, split, dataset_name)[source]

Bases: vissl.data.data_helper.QueueDataset

Base Dataset class for loading images from Disk. Can load a predefined list of images or all images inside a folder.

Inherits from QueueDataset class in VISSL to provide better handling of the invalid images by replacing them with the valid and seen images.

Parameters
  • cfg (AttrDict) – configuration defined by user

  • data_source (string) – data source either of “disk_filelist” or “disk_folder”

  • path (string) –

    can be either of the following 1. A .npy file containing a list of filepaths.

    In this case data_source = “disk_filelist”

    1. A folder such that folder/split contains images. In this case data_source = “disk_folder”

  • split (string) – specify split for the dataset. Usually train/val/test. Used to read images if reading from a folder path and retrieve settings for that split from the config path.

  • dataset_name (string) – name of dataset. For information only.

NOTE: This dataset class only returns images (not labels or other metdata). To load labels you must specify them in LABEL_SOURCES (See ssl_dataset.py). LABEL_SOURCES follows a similar convention as the dataset and can either be a filelist or a torchvision ImageFolder compatible folder - 1. Store labels in a numpy file 2. Store images in a nested directory structure so that torchvision ImageFolder

dataset can infer the labels.

num_samples()[source]

Size of the dataset

get_image_paths()[source]

Get paths of all images in the datasets. See load_data()

__len__()[source]

Size of the dataset

__getitem__(idx)[source]
  • We do delayed loading of data to reduce the memory size due to pickling of dataset across dataloader workers.

  • Loads the data if not already loaded.

  • Sets and initializes the queue if not already initialized

  • Depending on the data source (folder or filelist), get the image. If using the QueueDataset and image is valid, save the image in queue if not full. Otherwise return a valid seen image from the queue if queue is not empty.

vissl.data.synthetic_dataset module

class vissl.data.synthetic_dataset.SyntheticImageDataset(cfg, path, split, dataset_name, data_source='synthetic')[source]

Bases: torch.utils.data.dataset.Dataset

Synthetic dataset class. Mean image is returned always. This dataset is used/recommended to use for testing purposes only.

Parameters
  • path (string) – can be “” [not used]

  • split (string) – specify split for the dataset. Usually train/val/test. Used to read images if reading from a folder `path’ and retrieve settings for that split from the config path [not used]

  • dataset_name (string) – name of dataset. For information only. [not used]

  • data_source (string, Optional) – data source (“synthetic”) [not used]

num_samples()[source]

Size of the dataset

__len__()[source]

Size of the dataset

__getitem__(idx)[source]

Simply return the mean dummy image of the specified size and mark it as a success.

vissl.data.dataset_catalog module

Data and labels file for various datasets.

class vissl.data.dataset_catalog.VisslDatasetCatalog[source]

Bases: object

A catalog that stores information about the datasets and how to obtain them. It contains a mapping from strings (which are names that identify a dataset, e.g. “imagenet1k”) to a dict which contains:

  1. mapping of various data splits (train, test, val) to the data source (path on the disk whether a folder path or a filelist)

  2. source of the data (disk_filelist | disk_folder)

The purpose of having this catalog is to make it easy to choose different datasets, by just using the strings in the config.

static register_json(json_catalog_path)[source]
Parameters

filepath – a .json filepath that contains the data to be registered

static register_dict(dict_catalog)[source]
Parameters

dict – a dict with a bunch of datasets to be registered

static register_data(name, data_dict)[source]
Parameters
  • name (str) – the name that identifies a dataset, e.g. “imagenet1k_folder”.

  • func (callable) – a callable which takes no arguments and returns a list of dicts. It must return the same results if called multiple times.

static get(name)[source]

Get the registered dict and return it.

Parameters

name (str) – the name that identifies a dataset, e.g. “imagenet1k”.

Returns

dict – dataset information (paths, source)

static list() → List[str][source]

List all registered datasets.

Returns

list[str]

static clear()[source]

Remove all registered dataset.

static remove(name)[source]

Remove the dataset registered by name.

static has_data(name)[source]

Check whether the data with name exists.

vissl.data.dataset_catalog.get_local_path(input_file, dest_dir)[source]

If user specified copying data to a local directory, get the local path where the data files were copied.

  • If input_file is just a file, we return the dest_dir/filename

  • If the intput_file is a directory, then we check if the environemt is SLURM and use slurm_dir or otherwise dest_dir to look up copy_complete file is available. If available, we return the directory.

  • If both above fail, we return the input_file as is.

vissl.data.dataset_catalog.get_local_output_filepaths(input_files, dest_dir)[source]

If we have copied the files to local disk as specified in the config, we return those local paths. Otherwise return the original paths.

vissl.data.dataset_catalog.check_data_exists(data_files)[source]

Check that the input data files exist. If the data_files is a list, we iteratively check for each file in the list.

vissl.data.dataset_catalog.register_pascal_voc()[source]

Register PASCAL VOC 2007 and 2012 datasets to the data catalog. We first look up for these datasets paths in the dataset catalog, if the paths exist, we register, otherwise we remove the voc_data from the catalog registry.

vissl.data.dataset_catalog.register_coco()[source]

Register COCO 2004 datasets to the data catalog. We first look up for these datasets paths in the dataset catalog, if the paths exist, we register, otherwise we remove the coco2014_folder from the catalog registry.

vissl.data.dataset_catalog.register_datasets(json_catalog_path)[source]

If the json dataset_catalog file is found, we register the datasets specified in the catalog with VISSL. If the catalog also specified VOC or coco datasets, we resister them

Parameters

json_catalog_path (str) – the path to the json dataset catalog

vissl.data.dataset_catalog.get_data_files(split, dataset_config)[source]
Get the path to the dataset (images and labels).
  1. If the user has explicitly specified the data_sources, we simply use those and don’t do lookup in the datasets registered with VISSL from the dataset catalog.

  2. If the user hasn’t specified the path, look for the dataset in the datasets catalog registered with VISSL. For a given list of datasets and a given partition (train/test), we first verify that we have the dataset and the correct source as specified by the user. Then for each dataset in the list, we get the data path (make sure it exists, sources match). For the label file, the file is optional.

Once we have the dataset original paths, we replace the path with the local paths if the data was copied to local disk.