This repository stores a Brain Language Model implemented for a research project in my graduate program. The project implements fMRI-to-text translation using prompt tuning techniques. The pipeline leveraged a small brain adapter to project the fMRI embeddings to the text embedding space for a frozen LLM. The notebooks test various configurations for the brain adapter.
See the paper for more details.
Install the necessary dependencies:
pip install -q -r environment.txt
If you do not already have HF_TOKEN set as an env variable, run cp example.env .env and add your HuggingFace Token.
The data should be stored at the root of the project in a narratives_subset/ directory.
Use the following steps to pull and get a new subset of the dataset:
Install git-annex to pull a clone of the narratives dataset.
sudo apt-get -qq update
sudo apt-get -qq install git git-annex
Note: github username and password must be configured for datalad. To check if they are configured:
git config --global user.name
git config --global user.email
If this returns no results, you must configure them:
git config --global user.name "Your Name"
git config --global user.email "you@example.com"
Run the following script with optional args to clone the data and setup the data subset:
python3 get_narratives_subset.py --clearoot
The script has the following optional args:
--path (str, `default=/narratives`): the path to the git annex clone of the narratives directory. The script will clone to this path if it does not yet exist.
--out (str, `default=narratives_subset`): the output directory for the subset.
--clean_root: set thi flag to cleanup the git annex clone of the narratives directory after creating the subset
Alternatively, you can download the data from Google Drive. However, this approach may take longer.
Run the notebooks after installing dependencies and downloading data.
There is a separate notebook for each subject corresponding to the experiments below. Note: all *_tune.ipynb notebooks include hyperparameter tuning done with sub-052 and a smaller subset of stories. See the paper for a more detailed description of the brain adapter used in each experiment.
- Experiment 01A-B: PCA/MLP Brain Adapter with and without pretraining stage.
- Experiment 02A-B: 3D CNN Brain Adapter with and without pretraining stage.
- Experiment 03A-B: 3D CNN + MLP Brain Adapter with and without pretraining stage.