Using Custom Datasets¶

VISSL allows adding custom datasets easily. Using a new custom dataset has 2 requirements:

Requirement1: The dataset name must be registered with VisslDatasetCatalog.
Requirement2: Users should ensure that the data source is supported by VISSL. By default, VISSL supports reading data from disk. If user data is loaded from a different data source, please add the new data source following the documentation.

Follow the steps below to register and use the new dataset:

Step1: Register the dataset with VISSL. Given user dataset with dataset name my_new_dataset_name and path to the dataset train and test splits, users can register the dataset following:

from vissl.data.dataset_catalog import VisslDatasetCatalog

VisslDatasetCatalog.register_data(name="my_new_dataset_name", data_dict={"train": ... , "test": ...})

Note

VISSL also supports registering the dataset via a custom json file or or registering a python dict with your datasets. Please see our documentation on Using dataset_catalog.json

Step2 (Optional): If the dataset requires a new data source other than disk or supported disk formats (disk_folder or disk_filelist), please add the new data source to VISSL. Follow our documentation on Adding new dataset.
Step3: Test your dataset

DATA:
  TRAIN:
    DATA_SOURCES: [my_data_source]
    DATASET_NAMES: [my_new_dataset_name]