Command-Line-Interface of the Train Module¶
PhysioEx provides a fast and customizable way to train, evaluate and save state-of-the-art models for different physiological signal analysis tasks with different physiological signal datasets. This functionality is provided by the train
, test_model
and finetune
commands provided by this repository.
train
CLI
Training script for training and testing a model.
This script allows you to train a model using specified configurations and parameters.
Usage
$ train [PARAMS]
You can use the train -h, --help
command to access the command documentation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
--model
|
str
|
Specify the model to train, can be a yaml file if the model is not registered. Defaults to "chambon2018". If a yaml file is provided, it should contain the model configuration details. |
required |
--checkpoint_dir
|
str
|
Specify where to save the checkpoint. Defaults to None. Note: Provide the path to the directory where the model checkpoints will be saved. |
required |
--datasets
|
list
|
Specify the datasets list to train the model on. Defaults to ['mass']. Note: Provide a list of dataset names to be used for training. |
required |
--selected_channels
|
list
|
Specify the channels to train the model. Defaults to ['EEG']. Note: Channels refer to the data modalities (e.g., EEG, EOG) used for training. |
required |
--sequence_length
|
int
|
Specify the sequence length for the model. Defaults to 21. Note: Sequence length refers to the number of time steps in each input sequence. |
required |
--loss
|
str
|
Specify the loss function to use. Defaults to "cel". Note: The loss function determines how the model's performance is measured during training. |
required |
--max_epoch
|
int
|
Specify the maximum number of epochs for training. Defaults to 20. Note: An epoch is one complete pass through the training dataset. |
required |
--num_validations
|
int
|
Specify the number of validations steps to be done in each epoch. Defaults to 10. Note: Validation steps are used to evaluate the model's performance on a validation set during training. |
required |
--batch_size
|
int
|
Specify the batch size for training. Defaults to 32. Note: Batch size refers to the number of samples processed before the model's weights are updated. |
required |
--data_folder
|
str
|
The absolute path of the directory where the physioex dataset are stored, if None the home directory is used. Defaults to None. Note: Provide the path to the directory containing the datasets. |
required |
--test
|
bool
|
Test the model after training. Defaults to False. Note: If specified, the model will be tested on the validation set after training. |
required |
--aggregate
|
bool
|
Aggregate the results of the test. Defaults to False. Note: If specified, the test results will be aggregated across multiple datasets. |
required |
--hpc
|
bool
|
Using high performance computing setups or not, need to be called when datasets have been compressed into .h5 format with the compress_datasets command. Defaults to False. Note: Use this option if you are running the script on a high-performance computing cluster. |
required |
--num_nodes
|
int
|
Specify the number of nodes to be used for distributed training, only used when hpc is True. Defaults to 1. Note: In slurm this value needs to be coherent with '--ntasks-per-node' or 'ppn' in torque. This option is relevant for distributed training setups. |
required |
--config
|
str
|
Specify the path to the configuration file where to store the options to train the model with. Defaults to None. Note: The configuration file can override command line arguments. |
required |
Example
The basic usage is as follows:
or you can specify a yaml file containing the configuration details:
Notes
- Ensure that the datasets are properly formatted and stored in the specified data folder using the preprocess script.
- The script supports both single-node and multi-node training setups.
- The configuration file, if provided, should be in YAML format and contain valid key-value pairs for the script options.
test
CLI
Testing script for evaluating a model.
This script allows you to test a pre-trained model using specified configurations and parameters.
Usage
$ test_model [PARAMS]
You can use the test_model -h, --help
command to access the command documentation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
`--model`
|
str
|
Specify the model to test, can be a yaml file if the model is not registered. Defaults to "chambon2018". If a yaml file is provided, it should contain the model configuration details. |
required |
`--checkpoint_path`
|
str
|
Specify the model checkpoint. Defaults to None. Note: Provide the path to a specific checkpoint file to load the model state. |
required |
`--datasets`
|
list
|
Specify the datasets list to test the model on. Defaults to ['mass']. Note: Provide a list of dataset names to be used for testing. |
required |
`--selected_channels`
|
list
|
Specify the channels to test the model. Defaults to ['EEG']. Note: Channels refer to the data modalities (e.g., EEG, EOG) used for testing. |
required |
`--sequence_length`
|
int
|
Specify the sequence length for the model. Defaults to 21. Note: Sequence length refers to the number of time steps in each input sequence. |
required |
`--loss`
|
str
|
Specify the loss function to use. Defaults to "cel". Note: The loss function determines how the model's performance is measured during testing. |
required |
`--batch_size`
|
int
|
Specify the batch size for testing. Defaults to 32. Note: Batch size refers to the number of samples processed before the model's weights are updated. |
required |
`--data_folder`
|
str
|
The absolute path of the directory where the physioex dataset are stored, if None the home directory is used. Defaults to None. Note: Provide the path to the directory containing the datasets. |
required |
`--aggregate`
|
bool
|
Aggregate the results of the test. Defaults to False. Note: If specified, the test results will be aggregated across multiple datasets. |
required |
`--hpc`
|
bool
|
Using high performance computing setups or not, need to be called when datasets have been compressed into .h5 format with the compress_datasets command. Defaults to False. Note: Use this option if you are running the script on a high-performance computing cluster. |
required |
`--num_nodes`
|
int
|
Specify the number of nodes to be used for distributed testing, only used when hpc is True. Defaults to 1. Note: In slurm this value needs to be coherent with '--ntasks-per-node' or 'ppn' in torque. This option is relevant for distributed testing setups. |
required |
`--config`
|
str
|
Specify the path to the configuration file where to store the options to test the model with. Defaults to None. Note: The configuration file can override command line arguments. |
required |
Example
This command tests the tinysleepnet
model using the CrossEntropy Loss
Notes
- Ensure that the datasets are properly formatted and stored in the specified data folder using the preprocess script.
- The script supports both single-node and multi-node testing setups.
- The configuration file, if provided, should be in YAML format and contain valid key-value pairs for the script options.
finetune
CLI
Finetuning script for training and testing a model.
This script allows you to fine-tune a pre-trained model using specified configurations and parameters.
Usage
$ finetune [PARAMS]
You can use the finetune -h --help
command to access the command documentation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
--model
|
str
|
Specify the model to train, can be a yaml file if the model is not registered. Defaults to "chambon2018". If a yaml file is provided, it should contain the model configuration details. |
required |
--learning_rate
|
float
|
Specify the learning rate for the model. Defaults to 1e-7. Note: A smaller learning rate is often used for fine-tuning to avoid large updates that could disrupt the pre-trained weights. |
required |
--checkpoint_path
|
str
|
Specify the model checkpoint, if None physioex searches into its pretrained models. Defaults to None. Note: Provide the path to a specific checkpoint file to resume training from a saved state. |
required |
--checkpoint_dir
|
str
|
Specify the checkpoint directory where to store the new finetuned model checkpoints. Defaults to None. Note: This directory will be used to save checkpoints during training. |
required |
--datasets
|
list
|
Specify the datasets list to train the model on. Defaults to ['mass']. Note: Provide a list of dataset names to be used for training. |
required |
--selected_channels
|
list
|
Specify the channels to train the model. Defaults to ['EEG']. Note: Channels refer to the data modalities (e.g., EEG, EOG) used for training. |
required |
--sequence_length
|
int
|
Specify the sequence length for the model. Defaults to 21. Note: Sequence length refers to the number of time steps in each input sequence. |
required |
--loss
|
str
|
Specify the loss function to use. Defaults to "cel". Note: The loss function determines how the model's performance is measured during training. |
required |
--max_epoch
|
int
|
Specify the maximum number of epochs for training. Defaults to 20. Note: An epoch is one complete pass through the training dataset. |
required |
--num_validations
|
int
|
Specify the number of validations steps to be done in each epoch. Defaults to 10. Note: Validation steps are used to evaluate the model's performance on a validation set during training. |
required |
--batch_size
|
int
|
Specify the batch size for training. Defaults to 32. Note: Batch size refers to the number of samples processed before the model's weights are updated. |
required |
--data_folder
|
str
|
The absolute path of the directory where the physioex dataset are stored, if None the home directory is used. Defaults to None. Note: Provide the path to the directory containing the datasets. |
required |
--test
|
bool
|
Test the model after training. Defaults to False. Note: If specified, the model will be tested on the validation set after training. |
required |
--aggregate
|
bool
|
Aggregate the results of the test. Defaults to False. Note: If specified, the test results will be aggregated across multiple datasets. |
required |
--hpc
|
bool
|
Using high performance computing setups or not, need to be called when datasets have been compressed into .h5 format with the compress_datasets command. Defaults to False. Note: Use this option if you are running the script on a high-performance computing cluster. |
required |
--num_nodes
|
int
|
Specify the number of nodes to be used for distributed training, only used when hpc is True. Defaults to 1. Note: In slurm this value needs to be coherent with '--ntasks-per-node' or 'ppn' in torque. This option is relevant for distributed training setups. |
required |
--config
|
str
|
Specify the path to the configuration file where to store the options to train the model with. Defaults to None. Note: The configuration file can override command line arguments. |
required |
Example
tinysleepnet
model using the CrossEntropy Loss (cel
), with a sequence length of 21 and the EEG
channel, starting from the specified checkpoint.
Notes
- Ensure that the datasets are properly formatted and stored in the specified data folder using the preprocess script.
- The script supports both single-node and multi-node training setups.
- The configuration file, if provided, should be in YAML format and contain valid key-value pairs for the script options.