Run experiments for EEG-based emotion recognition for most popular datasets and models with just one command
EEG-based emotion recognition has become increasingly popular direction in recent years. Although lots of researchers are working on this task, running experiments is still very difficult. The main challenges are related to datasets and models. Running experiments on new datasets means that the researcher would need to implement it with pytorch or tensorflow, get detailed information how dataset is recorded and saved, and much more. Running experiments on new models is also tricky where the researcher would need to implement it from scratch with pytorch or tensorflow. This process, on one hand, takes too much time and can cause lots of bugs and effort. On the other hand it is not helpful for the researcher for furthering their research. To solve this problem, to make the process easier and get more researchers in this field, we created EEGain. It is a novel framework for EEG-based emotion recognition to run experiments on different datasets and models easily, with one command. You can implement your custom models and datasets too.
Models that are implemented in EEGain for now :-
- EEGNet
- TSception
- DeepConvNet
- ShallowConvNet
- RANDOM_most_occurring (for testing random baseline using most occuring class as the output)
- RANDOM_class_distribution (for testing random baseline using class distribution based output)
Datasets that are implemented in EEGain for now :-
Some other models and datasets are coming.
You can simply run the code in Google Colab. First you need to clone repo with this command:
!git clone https://github.com/EmotionLab/EEGain.git
Then you can run it with this command:
!python3 run_cli.py \
--model_name=ShallowConvNet \
--data_name=MAHNOB \
--data_path='...' \
--data_config=MAHNOBConfig \
--split_type="LOTO" \
--num_classes=2 \
--sampling_r=128 \
--channels=32 \
--window=4 \
--overlap=0 \
--label_type="V" \
--num_epochs=200 \
--batch_size=32 \
--lr=0.001 \
--weight_decay=0 \
--label_smoothing=0.01 \
--dropout_rate=0.5 \
--train_val_split=0.8 \
--random_seed=2025 \
--log_dir="logs/" \
--overal_log_file="log_file_name.txt" \
--log_predictions=True \
--log_predictions_dir=".../..." \
Here you can change some important arguments. For example, to change dataset here you need to change just 5 arguments - data_name, data_path, data_config, num_classes and channels.
You can see results on tensorboard using the logs stored in logs/ directory.
- clone the repo
- Enter in EEGain folder. run
pip install . - Change run_cli.sh based on dataset/splitting/model requirements
- Run run_cli.sh
- to check the results:
- run
tensorboard --logdir ./logs - logs will be saved in log.log file as well
- run
You can adapt arguments within the sh file according to your specific intentions:
| Argument | Description |
|---|---|
--model_name |
Selects a model. The implemented predefined models are: - TSception- EEGNet- DeepconvNet- ShallowConvNet- RANDOM_most_occurring (for testing random baseline using most occuring class as the output)- RANDOM_class_distribution (for testing random baseline using class distribution based output)NOTE: The RANDOM_most_occurring model always predicts the most occurring class in the training set, so it is not recommended to use it for F1-score calculations. For F1-score calculations, please use the RANDOM_class_distribution model, that predicts a random class based on class distribution.You can add your custom model as well. |
--data_name |
Chooses a custom name or a name of predefined datasets. The predefined datasets are: - DEAP- MAHNOB- AMIGOS- DREAMER- Seed- SeedIVYou can add your custom dataset as well. |
--data_path |
Specifies the directory where the data files are saved. You can check exact path for each dataset below in "Key Arguments to Modify" section. |
--data_config |
Specifies the dataset config that you want to load with the default arguements present in the config.py file. |
--log_dir |
Specifies the directory where the log files will be saved. |
--overal_log_file |
Specifies the name of the log file that will be created. |
--label_type |
Specifies whether data is separated into classes based on valence or arousal. This argument has no effect on the Seed and Seed IV dataset because these datasets have fixed splits based on categorical labels. You can choose the following options: - V: Valence- A: Arousal
|
--num_epochs |
Sets the number of epochs for training. |
--batch_size |
Specifies the batch size for training. |
--lr |
Specifies the learning rate for training. |
--sampling_r |
Specifies the sampling rate of the EEG data. |
--window |
Specifies the length of the EEG segments (in seconds). |
--overlap |
Specifies the overlap between the EEG segments (in seconds). |
--weight_decay |
Specifies the weight decay ratio for regularization. |
--label_smoothing |
Smoothing factor applied to the labels to make them less extreme. |
--dropout_rate |
Probability at which outputs of the layer are dropped out. |
--num_classes |
Specifies the number of classes of the classification problem. Set this argument 2 for AMIGOS, MAHNOB, DEAP and DREAMER; 3 for SEED and 4 for SEED IV. |
--channels |
Specifies the number of channels for the dataset. Set this argument to 14 for AMIGOS and DREAMER, 32 for MAHNOB and DEAP, and to 62 for SEED and SEED IV. |
--split_type |
Specifies the type of train-test splitting. There are three different types of splitting: - LOTO: Leave one trial out. Use this split for the person-dependent task.- LOSO: Leave one subject out. Use this split for the person-independent task.- LOSO_Fixed: Creates a fixed 75/25 train-test split that is mandatory for the person-independent task.
|
--train_val_split |
Specifies the training and validation split for the data (default value = 0.8). |
--random_seed |
Sets the random seed value to ensure reproducibility (default value = 2025). |
--log_predictions |
Specifies whether the user wants to log the predictions from the chosen model and dataset combination for the Test sets. Set this argument to True if you want to log the predicitions, otherwise leave it out or manually set to False. |
--log_predictions_dir |
Specifies the directory where the logged predicitions will be stored in CSV format. |
MAHNOB Setup:
- Data Path: Follow "your_path_to/mahnob-hci-raw/Sessions", with session-associated folders. Each session folder must have a .xml file for labels and a .bdf file.
- Data Name: MAHNOB
- Channels: 32
- Number of Classes: 2
- For this dataset,
self.inception_window = [0.25, 0.125, 0.0625]is automatically set for TSception model to replicate the TSception (2022) paper.
DEAP Setup:
- Data Path: Structure your directory as "your_path_to/data_preprocessed_python", which should contain .dat files.
- Data Name: DEAP
- Channels: 32
- Number of Classes: 2
AMIGOS Setup:
- Data Path: Organize your path as "your_path_to/AMIGOS/", which should lead to a Physiological recordings folder, then to a "Matlab Preprocessed Data" folder containing .mat files.
- Data Name: AMIGOS
- Channels: 14
- Number of Classes: 2
DREAMER Setup:
- Data Path: Ensure your file path is "your_path_to/DREAMER.mat".
- Data Name: DREAMER
- Channels: 14
- Number of Classes: 2
Seed Setup:
- Data Path: Use the structure "your_path_to/Preprocessed_EEG", which should contain .mat files, a channel-order.xlsx file, and a label.mat file.
- Data Name: Seed
- Channels: 62
- Number of Classes: 3
SeedIV Setup:
- Data Path: Ensure your directory structure follows "your_path_to/eeg_raw_data", containing three session folders. Each session folder must include .mat files. The "eeg_raw_data" folder should also contain "Channel Order.xlsx" and "ReadMe.txt" files.
- Data Name: SeedIV
- Channels: 62
- Number of Classes: 4
The following table shows the pre-processing done on each dataset:
| Dataset | Cropping | Channels Dropped | Band-pass Filter | Notch Filter | Ground Truth |
|---|---|---|---|---|---|
| MAHNOB-HCI | 30 secs pre and post-baseline | EXG1, EXG2, EXG3, EXG4, EXG5, EXG6, EXG7, EXG8, GSR1, GSR2, Erg1, Erg2, Resp, Temp, Status | [0.3Hz, 45Hz] | 50Hz | ≤ 4.5 |
| DEAP | 3 secs pre-baseline | EXG1, EXG2, EXG3, EXG4, GSR1, Plet, Resp, Temp | - | 50Hz | ≤ 4.5 |
| AMIGOS | - | ECG_Right, ECG_Left, GSR | - | 50Hz | ≤ 4.5 |
| DREAMER | - | - | [0.3Hz, 45Hz] | 50Hz | ≤ 3 |
| SEED | - | - | [0.3Hz, 45Hz] | 50Hz | - |
| SEED IV | - | - | [0.3Hz, 45Hz] | 50Hz | - |
All datasets were resampled using a sampling rate of 128Hz. Segments of the signal are created using a window size of 4 with an overlap of 0. All experiments were run for 200 epochs, with a batch size of 32. The learning rate used was 0.001 with no weight decay. For the Cross-entropy loss function, a label smoothing of 0.01 was used as we found that it slightly increased Accuracies (≈1%) for some models. For training the different methods, the data was split by subject into 80% training and 20% validation sets. In each fold of the LOSO loop, we select the model with the best accuracy on the validation set for evaluation on the test subject. To ensure reproducibility, we ran all our experiments using the random seed value of 2025.
(⭐= Best performance)
| Task | Model | Accuracy | F1 | F1 Weighted |
|---|---|---|---|---|
| Arousal | TSception | 0.54 ± 0.11 ⭐ | 0.49 ± 0.24 ⭐ | 0.51 ± 0.12 |
| EEGNet | 0.52 ± 0.11 | 0.43 ± 0.21 | 0.49 ± 0.12 | |
| DeepConvNet | 0.54 ± 0.13 ⭐ | 0.44 ± 0.27 | 0.49 ± 0.16 | |
| ShallowConvNet | 0.52 ± 0.11 | 0.44 ± 0.21 | 0.49 ± 0.12 | |
| Trivial Baseline | 0.34 ± 0.11 | 0.48 ± 0.11 | 0.52 ± 0.04 ⭐ | |
| Valence | TSception | 0.53 ± 0.07 | 0.52 ± 0.16 | 0.50 ± 0.08 |
| EEGNet | 0.56 ± 0.08 | 0.58 ± 0.15 ⭐ | 0.53 ± 0.10 | |
| DeepConvNet | 0.56 ± 0.08 | 0.56 ± 0.18 | 0.53 ± 0.11 | |
| ShallowConvNet | 0.57 ± 0.08 ⭐ | 0.57 ± 0.18 | 0.54 ± 0.10 ⭐ | |
| Trivial Baseline | 0.55 ± 0.10 | 0.53 ± 0.06 | 0.50 ± 0.03 |
| Task | Model | Accuracy | F1 | F1 Weighted |
|---|---|---|---|---|
| Arousal | TSception | 0.54 ± 0.09 | 0.56 ± 0.20 | 0.53 ± 0.10 ⭐ |
| EEGNet | 0.53 ± 0.13 | 0.53 ± 0.21 | 0.50 ± 0.13 | |
| DeepConvNet | 0.57 ± 0.14 | 0.52 ± 0.31 | 0.49 ± 0.16 | |
| ShallowConvNet | 0.56 ± 0.13 | 0.57 ± 0.26 | 0.49 ± 0.13 | |
| Trivial Baseline | 0.59 ± 0.15 ⭐ | 0.58 ± 0.08 ⭐ | 0.53 ± 0.04 ⭐ | |
| Valence | TSception | 0.51 ± 0.08 | 0.55 ± 0.16 | 0.47 ± 0.08 |
| EEGNet | 0.53 ± 0.10 | 0.56 ± 0.21 | 0.45 ± 0.12 | |
| DeepConvNet | 0.51 ± 0.11 | 0.47 ± 0.28 | 0.41 ± 0.13 | |
| ShallowConvNet | 0.53 ± 0.08 | 0.61 ± 0.18 ⭐ | 0.45 ± 0.12 | |
| Trivial Baseline | 0.57 ± 0.09 ⭐ | 0.56 ± 0.05 | 0.51 ± 0.03 ⭐ |
| Task | Model | Accuracy | F1 | F1 Weighted |
|---|---|---|---|---|
| Arousal | TSception | 0.58 ± 0.18 | 0.64 ± 0.24 | 0.57 ± 0.20 |
| EEGNet | 0.60 ± 0.23 | 0.69 ± 0.24 | 0.56 ± 0.25 | |
| DeepConvNet | 0.56 ± 0.21 | 0.65 ± 0.24 | 0.55 ± 0.22 | |
| ShallowConvNet | 0.59 ± 0.23 | 0.70 ± 0.23 ⭐ | 0.55 ± 0.25 | |
| Trivial Baseline | 0.66 ± 0.26 ⭐ | 0.62 ± 0.19 | 0.59 ± 0.11 ⭐ | |
| Valence | TSception | 0.53 ± 0.10 | 0.56 ± 0.18 | 0.51 ± 0.13 |
| EEGNet | 0.55 ± 0.11 | 0.59 ± 0.19 | 0.50 ± 0.14 | |
| DeepConvNet | 0.55 ± 0.11 | 0.56 ± 0.18 | 0.52 ± 0.13 ⭐ | |
| ShallowConvNet | 0.55 ± 0.13 | 0.60 ± 0.20 ⭐ | 0.51 ± 0.15 | |
| Trivial Baseline | 0.56 ± 0.14 ⭐ | 0.55 ± 0.07 | 0.51 ± 0.05 |
| Task | Model | Accuracy | F1 | F1 Weighted |
|---|---|---|---|---|
| Arousal | TSception | 0.47 ± 0.07 | 0.43 ± 0.14 | 0.46 ± 0.08 |
| EEGNet | 0.46 ± 0.09 | 0.47 ± 0.19 ⭐ | 0.42 ± 0.13 | |
| DeepConvNet | 0.48 ± 0.11 | 0.41 ± 0.18 | 0.45 ± 0.12 | |
| ShallowConvNet | 0.48 ± 0.08 | 0.41 ± 0.18 | 0.45 ± 0.10 | |
| Trivial Baseline | 0.52 ± 0.15 ⭐ | 0.46 ± 0.07 | 0.51 ± 0.02 ⭐ | |
| Valence | TSception | 0.60 ± 0.06 ⭐ | 0.42 ± 0.15 ⭐ | 0.57 ± 0.07 ⭐ |
| EEGNet | 0.60 ± 0.08 ⭐ | 0.26 ± 0.20 | 0.53 ± 0.11 | |
| DeepConvNet | 0.60 ± 0.09 ⭐ | 0.38 ± 0.20 | 0.56 ± 0.10 | |
| ShallowConvNet | 0.60 ± 0.08 ⭐ | 0.35 ± 0.19 | 0.56 ± 0.10 | |
| Trivial Baseline | 0.59 ± 0.09 | 0.40 ± 0.04 | 0.51 ± 0.03 |
| Task | Model | Accuracy | F1 | F1 Weighted |
|---|---|---|---|---|
| Categorical | TSception | 0.48 ± 0.07 | 0.46 ± 0.08 | 0.46 ± 0.08 |
| EEGNet | 0.46 ± 0.06 | 0.44 ± 0.06 | 0.44 ± 0.06 | |
| DeepConvNet | 0.55 ± 0.08 ⭐ | 0.52 ± 0.10 ⭐ | 0.52 ± 0.10 ⭐ | |
| ShallowConvNet | 0.49 ± 0.06 | 0.47 ± 0.07 | 0.47 ± 0.07 | |
| Trivial Baseline | 0.34 ± 0.00 | 0.34 ± 0.02 | 0.34 ± 0.02 |
| Task | Model | Accuracy | F1 | F1 Weighted |
|---|---|---|---|---|
| Categorical | TSception | 0.40 ± 0.08 | 0.33 ± 0.10 | 0.35 ± 0.10 |
| EEGNet | 0.32 ± 0.03 | 0.20 ± 0.06 | 0.22 ± 0.05 | |
| DeepConvNet | 0.45 ± 0.09 ⭐ | 0.42 ± 0.11 ⭐ | 0.42 ± 0.11 ⭐ | |
| ShallowConvNet | 0.37 ± 0.06 | 0.32 ± 0.06 | 0.33 ± 0.07 | |
| Trivial Baseline | 0.31 ± 0.00 | 0.24 ± 0.03 | 0.24 ± 0.03 |
These results showcase the LOTO experiments on DEAP dataset that were conducted to replicate the TSception paper.
To run the LOTO experiments on DEAP with TSception model, please follow the instruction:-
- In
helpers.pyfile, uncomment the four extra channels ("Oz", "Pz", "Fz", "Cz") in DEAPConfig that are dropped in the TSception paper. - Comment out the notch filtering as well from DEAPConfig.
- Use the following
run_cli.shfile:
#!/bin/bash
python run_cli.py \
--model_name=TSception \
--data_name=DEAP \
--data_path='path_to_DEAP/data_preprocessed_python' \
--data_config=DEAPConfig \
--split_type="LOTO" \
--num_classes=2 \
--ground_truth_threshold=5 \
--sampling_r=128 \
--window=4 \
--overlap=0 \
--label_type="A" \
--num_epochs=500 \
--batch_size=64 \
--lr=0.001 \
--weight_decay=0 \
--label_smoothing=0 \
--dropout_rate=0.5 \
--channels=28 \
--train_val_split=0.8 \
--random_seed=2021 \
--log_dir="logs/..." \
--overal_log_file="DEAP_TSception_A_LOTO.txt" \
--log_predictions=True \
--log_predictions_dir="Logged_predictions/DEAP_TSception_A_LOTO" \
| Method | Arousal ACC | Arousal F1 | Valence ACC | Valence F1 |
|---|---|---|---|---|
| SVM | 62.00% | 58.30% | 57.60% | 56.30% |
| UL | 62.34% | 60.44% | 56.25% | 61.25% |
| CSP | 58.26% | -- | 57.59% | -- |
| FBCSP | 59.13% | -- | 59.19% | -- |
| FgMDM | 60.04% | -- | 58.87% | -- |
| TSC | 60.04% | -- | 59.47% | -- |
| FBFgMDM | 60.30% | -- | 61.01% | -- |
| FBTSC | 60.60% | -- | 61.09% | -- |
| TSception | 63.75% | 63.35% | 62.27% | 65.37% |
| TSception (ours) | 60.67% | 61.40% | 59.32% | 62.49% |
| Trivial Baseline (ours) | 62.73% | 55.82% | 50.31% | 53.81% |
NOTE: In the trivial baseline results, the ACCURACY values are calculated using RANDOM_most_occurring model and the F1 values are calculated using RANDOM_class_distribution model.
This code repository is licensed under the CC BY 4.0 License.
@misc{kukhilava2025evaluationeegemotionrecognition,
title={Evaluation in EEG Emotion Recognition: State-of-the-Art Review and Unified Framework},
author={Natia Kukhilava and Tatia Tsmindashvili and Rapael Kalandadze and Anchit Gupta and Sofio Katamadze and François Brémond and Laura M. Ferrari and Philipp Müller and Benedikt Emanuel Wirth},
year={2025},
eprint={2505.18175},
archivePrefix={arXiv},
primaryClass={eess.SP},
url={https://arxiv.org/abs/2505.18175},
}



