Skip to content

Command-Line-Interface of the Train Module

PhysioEx provides a fast and customizable way to train, evaluate and save state-of-the-art models for different physiological signal analysis tasks with different physiological signal datasets. This functionality is provided by the train, test_model and finetune commands provided by this repository.


train CLI

Training script for training and testing a model.

This script allows you to train a model using specified configurations and parameters.

Usage

$ train [PARAMS] You can use the train -h, --help command to access the command documentation.

Parameters:

Name Type Description Default
--model str

Specify the model to train, can be a yaml file if the model is not registered. Defaults to "chambon2018". If a yaml file is provided, it should contain the model configuration details.

required
--checkpoint_dir str

Specify where to save the checkpoint. Defaults to None. Note: Provide the path to the directory where the model checkpoints will be saved.

required
--datasets list

Specify the datasets list to train the model on. Defaults to ['mass']. Note: Provide a list of dataset names to be used for training.

required
--selected_channels list

Specify the channels to train the model. Defaults to ['EEG']. Note: Channels refer to the data modalities (e.g., EEG, EOG) used for training.

required
--sequence_length int

Specify the sequence length for the model. Defaults to 21. Note: Sequence length refers to the number of time steps in each input sequence.

required
--loss str

Specify the loss function to use. Defaults to "cel". Note: The loss function determines how the model's performance is measured during training.

required
--max_epoch int

Specify the maximum number of epochs for training. Defaults to 20. Note: An epoch is one complete pass through the training dataset.

required
--num_validations int

Specify the number of validations steps to be done in each epoch. Defaults to 10. Note: Validation steps are used to evaluate the model's performance on a validation set during training.

required
--batch_size int

Specify the batch size for training. Defaults to 32. Note: Batch size refers to the number of samples processed before the model's weights are updated.

required
--data_folder str

The absolute path of the directory where the physioex dataset are stored, if None the home directory is used. Defaults to None. Note: Provide the path to the directory containing the datasets.

required
--test bool

Test the model after training. Defaults to False. Note: If specified, the model will be tested on the validation set after training.

required
--aggregate bool

Aggregate the results of the test. Defaults to False. Note: If specified, the test results will be aggregated across multiple datasets.

required
--hpc bool

Using high performance computing setups or not, need to be called when datasets have been compressed into .h5 format with the compress_datasets command. Defaults to False. Note: Use this option if you are running the script on a high-performance computing cluster.

required
--num_nodes int

Specify the number of nodes to be used for distributed training, only used when hpc is True. Defaults to 1. Note: In slurm this value needs to be coherent with '--ntasks-per-node' or 'ppn' in torque. This option is relevant for distributed training setups.

required
--config str

Specify the path to the configuration file where to store the options to train the model with. Defaults to None. Note: The configuration file can override command line arguments.

required
Example

The basic usage is as follows:

train --model chambon2018 --datasets mass --checkpoint_dir ./checkpoints --max_epoch 20 --batch_size 32

or you can specify a yaml file containing the configuration details:

model_package: physioex.train.networks.seqsleepnet
model_class: SeqSleepNet
module_config:
    seq_len: 21
    in_channels: 1
    loss_call: cel # in this case you can pass the loss call as a string
    loss_params: {}
preprocessing: xsleepnet
target_transform: get_mid_label
# check the train documentaion for more details
train --model my_model_config.yaml --datasets mass hmc --checkpoint_dir ./checkpoints --max_epoch 20 --batch_size 32
Notes
  • Ensure that the datasets are properly formatted and stored in the specified data folder using the preprocess script.
  • The script supports both single-node and multi-node training setups.
  • The configuration file, if provided, should be in YAML format and contain valid key-value pairs for the script options.

test CLI

Testing script for evaluating a model.

This script allows you to test a pre-trained model using specified configurations and parameters.

Usage

$ test_model [PARAMS] You can use the test_model -h, --help command to access the command documentation.

Parameters:

Name Type Description Default
`--model` str

Specify the model to test, can be a yaml file if the model is not registered. Defaults to "chambon2018". If a yaml file is provided, it should contain the model configuration details.

required
`--checkpoint_path` str

Specify the model checkpoint. Defaults to None. Note: Provide the path to a specific checkpoint file to load the model state.

required
`--datasets` list

Specify the datasets list to test the model on. Defaults to ['mass']. Note: Provide a list of dataset names to be used for testing.

required
`--selected_channels` list

Specify the channels to test the model. Defaults to ['EEG']. Note: Channels refer to the data modalities (e.g., EEG, EOG) used for testing.

required
`--sequence_length` int

Specify the sequence length for the model. Defaults to 21. Note: Sequence length refers to the number of time steps in each input sequence.

required
`--loss` str

Specify the loss function to use. Defaults to "cel". Note: The loss function determines how the model's performance is measured during testing.

required
`--batch_size` int

Specify the batch size for testing. Defaults to 32. Note: Batch size refers to the number of samples processed before the model's weights are updated.

required
`--data_folder` str

The absolute path of the directory where the physioex dataset are stored, if None the home directory is used. Defaults to None. Note: Provide the path to the directory containing the datasets.

required
`--aggregate` bool

Aggregate the results of the test. Defaults to False. Note: If specified, the test results will be aggregated across multiple datasets.

required
`--hpc` bool

Using high performance computing setups or not, need to be called when datasets have been compressed into .h5 format with the compress_datasets command. Defaults to False. Note: Use this option if you are running the script on a high-performance computing cluster.

required
`--num_nodes` int

Specify the number of nodes to be used for distributed testing, only used when hpc is True. Defaults to 1. Note: In slurm this value needs to be coherent with '--ntasks-per-node' or 'ppn' in torque. This option is relevant for distributed testing setups.

required
`--config` str

Specify the path to the configuration file where to store the options to test the model with. Defaults to None. Note: The configuration file can override command line arguments.

required
Example
$ test_model --model tinysleepnet --loss cel --sequence_length 21 --selected_channels EEG --checkpoint_path /path/to/checkpoint

This command tests the tinysleepnet model using the CrossEntropy Loss

Notes
  • Ensure that the datasets are properly formatted and stored in the specified data folder using the preprocess script.
  • The script supports both single-node and multi-node testing setups.
  • The configuration file, if provided, should be in YAML format and contain valid key-value pairs for the script options.

finetune CLI

Finetuning script for training and testing a model.

This script allows you to fine-tune a pre-trained model using specified configurations and parameters.

Usage

$ finetune [PARAMS] You can use the finetune -h --help command to access the command documentation.

Parameters:

Name Type Description Default
--model str

Specify the model to train, can be a yaml file if the model is not registered. Defaults to "chambon2018". If a yaml file is provided, it should contain the model configuration details.

required
--learning_rate float

Specify the learning rate for the model. Defaults to 1e-7. Note: A smaller learning rate is often used for fine-tuning to avoid large updates that could disrupt the pre-trained weights.

required
--checkpoint_path str

Specify the model checkpoint, if None physioex searches into its pretrained models. Defaults to None. Note: Provide the path to a specific checkpoint file to resume training from a saved state.

required
--checkpoint_dir str

Specify the checkpoint directory where to store the new finetuned model checkpoints. Defaults to None. Note: This directory will be used to save checkpoints during training.

required
--datasets list

Specify the datasets list to train the model on. Defaults to ['mass']. Note: Provide a list of dataset names to be used for training.

required
--selected_channels list

Specify the channels to train the model. Defaults to ['EEG']. Note: Channels refer to the data modalities (e.g., EEG, EOG) used for training.

required
--sequence_length int

Specify the sequence length for the model. Defaults to 21. Note: Sequence length refers to the number of time steps in each input sequence.

required
--loss str

Specify the loss function to use. Defaults to "cel". Note: The loss function determines how the model's performance is measured during training.

required
--max_epoch int

Specify the maximum number of epochs for training. Defaults to 20. Note: An epoch is one complete pass through the training dataset.

required
--num_validations int

Specify the number of validations steps to be done in each epoch. Defaults to 10. Note: Validation steps are used to evaluate the model's performance on a validation set during training.

required
--batch_size int

Specify the batch size for training. Defaults to 32. Note: Batch size refers to the number of samples processed before the model's weights are updated.

required
--data_folder str

The absolute path of the directory where the physioex dataset are stored, if None the home directory is used. Defaults to None. Note: Provide the path to the directory containing the datasets.

required
--test bool

Test the model after training. Defaults to False. Note: If specified, the model will be tested on the validation set after training.

required
--aggregate bool

Aggregate the results of the test. Defaults to False. Note: If specified, the test results will be aggregated across multiple datasets.

required
--hpc bool

Using high performance computing setups or not, need to be called when datasets have been compressed into .h5 format with the compress_datasets command. Defaults to False. Note: Use this option if you are running the script on a high-performance computing cluster.

required
--num_nodes int

Specify the number of nodes to be used for distributed training, only used when hpc is True. Defaults to 1. Note: In slurm this value needs to be coherent with '--ntasks-per-node' or 'ppn' in torque. This option is relevant for distributed training setups.

required
--config str

Specify the path to the configuration file where to store the options to train the model with. Defaults to None. Note: The configuration file can override command line arguments.

required
Example

$ finetune --model tinysleepnet --loss cel --sequence_length 21 --selected_channels EEG --checkpoint_path /path/to/checkpoint
This command fine-tunes the tinysleepnet model using the CrossEntropy Loss (cel), with a sequence length of 21 and the EEG channel, starting from the specified checkpoint.

Notes
  • Ensure that the datasets are properly formatted and stored in the specified data folder using the preprocess script.
  • The script supports both single-node and multi-node training setups.
  • The configuration file, if provided, should be in YAML format and contain valid key-value pairs for the script options.