Command-Line-Interface of the Train Module¶

PhysioEx provides a fast and customizable way to train, evaluate and save state-of-the-art models for different physiological signal analysis tasks with different physiological signal datasets. This functionality is provided by the train, test_model and finetune commands provided by this repository.

train CLI

Training script for training and testing a model.

This script allows you to train a model using specified configurations and parameters.

Usage

$ train [PARAMS] You can use the train -h, --help command to access the command documentation.

Parameters:

Name	Type	Description	Default
`--model`	`str`	Specify the model to train, can be a yaml file if the model is not registered. Defaults to "chambon2018". If a yaml file is provided, it should contain the model configuration details.	required
`--checkpoint_dir`	`str`	Specify where to save the checkpoint. Defaults to None. Note: Provide the path to the directory where the model checkpoints will be saved.	required
`--datasets`	`list`	Specify the datasets list to train the model on. Defaults to ['mass']. Note: Provide a list of dataset names to be used for training.	required
`--selected_channels`	`list`	Specify the channels to train the model. Defaults to ['EEG']. Note: Channels refer to the data modalities (e.g., EEG, EOG) used for training.	required
`--sequence_length`	`int`	Specify the sequence length for the model. Defaults to 21. Note: Sequence length refers to the number of time steps in each input sequence.	required
`--loss`	`str`	Specify the loss function to use. Defaults to "cel". Note: The loss function determines how the model's performance is measured during training.	required
`--max_epoch`	`int`	Specify the maximum number of epochs for training. Defaults to 20. Note: An epoch is one complete pass through the training dataset.	required
`--num_validations`	`int`	Specify the number of validations steps to be done in each epoch. Defaults to 10. Note: Validation steps are used to evaluate the model's performance on a validation set during training.	required
`--batch_size`	`int`	Specify the batch size for training. Defaults to 32. Note: Batch size refers to the number of samples processed before the model's weights are updated.	required
`--data_folder`	`str`	The absolute path of the directory where the physioex dataset are stored, if None the home directory is used. Defaults to None. Note: Provide the path to the directory containing the datasets.	required
`--test`	`bool`	Test the model after training. Defaults to False. Note: If specified, the model will be tested on the validation set after training.	required
`--aggregate`	`bool`	Aggregate the results of the test. Defaults to False. Note: If specified, the test results will be aggregated across multiple datasets.	required
`--hpc`	`bool`	Using high performance computing setups or not, need to be called when datasets have been compressed into .h5 format with the compress_datasets command. Defaults to False. Note: Use this option if you are running the script on a high-performance computing cluster.	required
`--num_nodes`	`int`	Specify the number of nodes to be used for distributed training, only used when hpc is True. Defaults to 1. Note: In slurm this value needs to be coherent with '--ntasks-per-node' or 'ppn' in torque. This option is relevant for distributed training setups.	required
`--config`	`str`	Specify the path to the configuration file where to store the options to train the model with. Defaults to None. Note: The configuration file can override command line arguments.	required

Example

The basic usage is as follows:

train --model chambon2018 --datasets mass --checkpoint_dir ./checkpoints --max_epoch 20 --batch_size 32

or you can specify a yaml file containing the configuration details:

.yamlbash

model_package: physioex.train.networks.seqsleepnet
model_class: SeqSleepNet
module_config:
    seq_len: 21
    in_channels: 1
    loss_call: cel # in this case you can pass the loss call as a string
    loss_params: {}
preprocessing: xsleepnet
target_transform: get_mid_label
# check the train documentaion for more details

train --model my_model_config.yaml --datasets mass hmc --checkpoint_dir ./checkpoints --max_epoch 20 --batch_size 32

Notes

Ensure that the datasets are properly formatted and stored in the specified data folder using the preprocess script.
The script supports both single-node and multi-node training setups.
The configuration file, if provided, should be in YAML format and contain valid key-value pairs for the script options.

test CLI

Testing script for evaluating a model.

This script allows you to test a pre-trained model using specified configurations and parameters.

Usage

$ test_model [PARAMS] You can use the test_model -h, --help command to access the command documentation.

Parameters:

Name	Type	Description	Default
`--model`	`str`	Specify the model to test, can be a yaml file if the model is not registered. Defaults to "chambon2018". If a yaml file is provided, it should contain the model configuration details.	required
`--checkpoint_path`	`str`	Specify the model checkpoint. Defaults to None. Note: Provide the path to a specific checkpoint file to load the model state.	required
`--datasets`	`list`	Specify the datasets list to test the model on. Defaults to ['mass']. Note: Provide a list of dataset names to be used for testing.	required
`--selected_channels`	`list`	Specify the channels to test the model. Defaults to ['EEG']. Note: Channels refer to the data modalities (e.g., EEG, EOG) used for testing.	required
`--sequence_length`	`int`	Specify the sequence length for the model. Defaults to 21. Note: Sequence length refers to the number of time steps in each input sequence.	required
`--loss`	`str`	Specify the loss function to use. Defaults to "cel". Note: The loss function determines how the model's performance is measured during testing.	required
`--batch_size`	`int`	Specify the batch size for testing. Defaults to 32. Note: Batch size refers to the number of samples processed before the model's weights are updated.	required
`--data_folder`	`str`	The absolute path of the directory where the physioex dataset are stored, if None the home directory is used. Defaults to None. Note: Provide the path to the directory containing the datasets.	required
`--aggregate`	`bool`	Aggregate the results of the test. Defaults to False. Note: If specified, the test results will be aggregated across multiple datasets.	required
`--hpc`	`bool`	Using high performance computing setups or not, need to be called when datasets have been compressed into .h5 format with the compress_datasets command. Defaults to False. Note: Use this option if you are running the script on a high-performance computing cluster.	required
`--num_nodes`	`int`	Specify the number of nodes to be used for distributed testing, only used when hpc is True. Defaults to 1. Note: In slurm this value needs to be coherent with '--ntasks-per-node' or 'ppn' in torque. This option is relevant for distributed testing setups.	required
`--config`	`str`	Specify the path to the configuration file where to store the options to test the model with. Defaults to None. Note: The configuration file can override command line arguments.	required

Example

$ test_model --model tinysleepnet --loss cel --sequence_length 21 --selected_channels EEG --checkpoint_path /path/to/checkpoint

This command tests the tinysleepnet model using the CrossEntropy Loss

Notes

Ensure that the datasets are properly formatted and stored in the specified data folder using the preprocess script.
The script supports both single-node and multi-node testing setups.
The configuration file, if provided, should be in YAML format and contain valid key-value pairs for the script options.

finetune CLI

Finetuning script for training and testing a model.

This script allows you to fine-tune a pre-trained model using specified configurations and parameters.

Usage

$ finetune [PARAMS] You can use the finetune -h --help command to access the command documentation.

Parameters:

Name	Type	Description	Default
`--model`	`str`	Specify the model to train, can be a yaml file if the model is not registered. Defaults to "chambon2018". If a yaml file is provided, it should contain the model configuration details.	required
`--learning_rate`	`float`	Specify the learning rate for the model. Defaults to 1e-7. Note: A smaller learning rate is often used for fine-tuning to avoid large updates that could disrupt the pre-trained weights.	required
`--checkpoint_path`	`str`	Specify the model checkpoint, if None physioex searches into its pretrained models. Defaults to None. Note: Provide the path to a specific checkpoint file to resume training from a saved state.	required
`--checkpoint_dir`	`str`	Specify the checkpoint directory where to store the new finetuned model checkpoints. Defaults to None. Note: This directory will be used to save checkpoints during training.	required
`--datasets`	`list`	Specify the datasets list to train the model on. Defaults to ['mass']. Note: Provide a list of dataset names to be used for training.	required
`--selected_channels`	`list`	Specify the channels to train the model. Defaults to ['EEG']. Note: Channels refer to the data modalities (e.g., EEG, EOG) used for training.	required
`--sequence_length`	`int`	Specify the sequence length for the model. Defaults to 21. Note: Sequence length refers to the number of time steps in each input sequence.	required
`--loss`	`str`	Specify the loss function to use. Defaults to "cel". Note: The loss function determines how the model's performance is measured during training.	required
`--max_epoch`	`int`	Specify the maximum number of epochs for training. Defaults to 20. Note: An epoch is one complete pass through the training dataset.	required
`--num_validations`	`int`	Specify the number of validations steps to be done in each epoch. Defaults to 10. Note: Validation steps are used to evaluate the model's performance on a validation set during training.	required
`--batch_size`	`int`	Specify the batch size for training. Defaults to 32. Note: Batch size refers to the number of samples processed before the model's weights are updated.	required
`--data_folder`	`str`	The absolute path of the directory where the physioex dataset are stored, if None the home directory is used. Defaults to None. Note: Provide the path to the directory containing the datasets.	required
`--test`	`bool`	Test the model after training. Defaults to False. Note: If specified, the model will be tested on the validation set after training.	required
`--aggregate`	`bool`	Aggregate the results of the test. Defaults to False. Note: If specified, the test results will be aggregated across multiple datasets.	required
`--hpc`	`bool`	Using high performance computing setups or not, need to be called when datasets have been compressed into .h5 format with the compress_datasets command. Defaults to False. Note: Use this option if you are running the script on a high-performance computing cluster.	required
`--num_nodes`	`int`	Specify the number of nodes to be used for distributed training, only used when hpc is True. Defaults to 1. Note: In slurm this value needs to be coherent with '--ntasks-per-node' or 'ppn' in torque. This option is relevant for distributed training setups.	required
`--config`	`str`	Specify the path to the configuration file where to store the options to train the model with. Defaults to None. Note: The configuration file can override command line arguments.	required

Example

$ finetune --model tinysleepnet --loss cel --sequence_length 21 --selected_channels EEG --checkpoint_path /path/to/checkpoint

This command fine-tunes the tinysleepnet model using the CrossEntropy Loss (cel), with a sequence length of 21 and the EEG channel, starting from the specified checkpoint.

Notes

Ensure that the datasets are properly formatted and stored in the specified data folder using the preprocess script.
The script supports both single-node and multi-node training setups.
The configuration file, if provided, should be in YAML format and contain valid key-value pairs for the script options.