vissl.engines package¶
vissl.engines.train module¶
-
vissl.engines.train.
train_main
(cfg: vissl.utils.hydra_config.AttrDict, dist_run_id: str, checkpoint_path: str, checkpoint_folder: str, local_rank: int = 0, node_id: int = 0, hook_generator: Callable[[Any], List[classy_vision.hooks.classy_hook.ClassyHook]] = <function default_hook_generator>)[source]¶ Sets up and executes training workflow per machine.
- Parameters
cfg (AttrDict) – user specified input config that has optimizer, loss, meters etc settings relevant to the training
dist_run_id (str) –
For multi-gpu training with PyTorch, we have to specify how the gpus are going to rendezvous. This requires specifying the communication method: file, tcp and the unique rendezvous run_id that is specific to 1 run. We recommend:
for 1node: use init_method=tcp and run_id=auto
2) for multi-node, use init_method=tcp and specify run_id={master_node}:{port}
checkpoint_path (str) – if the training is being resumed from a checkpoint, path to the checkpoint. The tools/run_distributed_engines.py automatically looks for the checkpoint in the checkpoint directory.
checkpoint_folder (str) – what directory to use for checkpointing. The tools/run_distributed_engines.py creates the directory based on user input in the yaml config file.
local_rank (int) – id of the current device on the machine. If using gpus, local_rank = gpu number on the current machine
node_id (int) – id of the current machine. starts from 0. valid for multi-gpu
hook_generator (Callable) – The utility function that prepares all the hoooks that will be used in training based on user selection. Some basic hooks are used by default.
vissl.engines.extract_features module¶
-
vissl.engines.extract_features.
extract_main
(cfg: vissl.utils.hydra_config.AttrDict, dist_run_id: str, local_rank: int = 0, node_id: int = 0)[source]¶ Sets up and executes feature extraction workflow per machine.
- Parameters
cfg (AttrDict) – user specified input config that has optimizer, loss, meters etc settings relevant to the training
dist_run_id (str) –
For multi-gpu training with PyTorch, we have to specify how the gpus are going to rendezvous. This requires specifying the communication method: file, tcp and the unique rendezvous run_id that is specific to 1 run. We recommend:
for 1node: use init_method=tcp and run_id=auto
2) for multi-node, use init_method=tcp and specify run_id={master_node}:{port}
local_rank (int) – id of the current device on the machine. If using gpus, local_rank = gpu number on the current machine
node_id (int) – id of the current machine. starts from 0. valid for multi-gpu