GitHub - lichen-lab/MDeep

Microbiome based deep learning method for predicting continuous and binary outcome

We develop a deep learning prediction method "Microbimoe Based Deep Learning Method,"(MDeep) for microbiome-based prediction. The advantage of MDeep is that it uses the convolutional kernel to capture the signals of microbiome species with close evolutionary relationship in a local receptive field. Moreover, MDeep uses different convolutional layer to capture different taxonomic rank (e.g. species, genus, family, etc). Together, the convolutional layers with its built-in convolutional kernels capture microbiome signals at different taxonomic levels while encouraging local smoothing induced by the phylogenetic tree.

Requirements and Installation

MDeep is implemented by TensorFlow. Both CPU and GPU mode are supported. Please check the guide on official website for detail instruction of installing TensorFlow-GPU.

Python 3.6
TensorFlow == 1.12.0
numpy >= 1.15.4
scipy >= 1.2.1
scikit-learn >= 0.20.3
seaborn >=0.9.0
matplotlib >=3.1.0

Download MDeep:

git clone https://github.com/alfredyewang/MDeep

Install requirements

pip3 install -r requirements.txt

Usage

You can see the hyper paramenters for MDeep by help option:

usage: MDeep.py [-h] [--train] [--evaluation] [--test]
               [--data_dir <data_directory>] [--test_file TEST_FILE]
               [--correlation_file CORRELATION_FILE] [--model_dir MODEL_DIR]
               [--result_dir <data_directory>] [--outcome_type OUTCOME_TYPE]
               [--batch_size BATCH_SIZE] [--max_epoch MAX_EPOCH]
               [--learning_rate LEARNING_RATE] [--dropout_rate DROPOUT_RATE]
               [--L2_regularizer L2_REGULARIZER]
               [--window_size WINDOW_SIZE [WINDOW_SIZE ...]]
               [--kernel_size KERNEL_SIZE [KERNEL_SIZE ...]]
               [--strides STRIDES [STRIDES ...]]

A Phylogeny-regularized Convolutional NeuralNetwork for Microbiome-based
Predictions

optional arguments:
  -h, --help            show this help message and exit
  --train               Use this option for train model
  --evaluation          Use this option for evaluate model
  --test                Use this option for test model
  --data_dir <data_directory>
                        The data directory for training and evaluation
  --test_file TEST_FILE
                        The unlabelled test file
  --correlation_file CORRELATION_FILE
                        The correlation matrix for unlabelled test file
  --model_dir MODEL_DIR
                        The directory to save or restore the trained models.
  --result_dir <data_directory>
                        The directory to save test / evaluation result
  --outcome_type OUTCOME_TYPE
                        The outcome type
  --batch_size BATCH_SIZE
                        The batch size for training
  --max_epoch MAX_EPOCH
                        The max epoch for training
  --learning_rate LEARNING_RATE
                        The learning rate for training
  --dropout_rate DROPOUT_RATE
                        The dropout rate for training
  --L2_regularizer L2_REGULARIZER
                        The L2 lambda
  --window_size WINDOW_SIZE [WINDOW_SIZE ...]
                        The window size for convolutional layers
  --kernel_size KERNEL_SIZE [KERNEL_SIZE ...]
                        The kernel size for convolutional layers
  --strides STRIDES [STRIDES ...]
                        The strides size for convolutional layers

Example

USA Human Gut Microbiome data (Continous-Outcome)

Train the model

The USA Human Gut Microbiome data contains 308 samples with 1087 OTUs. For details of description, please check our paper.

python3 src/MDeep.py --train --data_dir data/USA --model_dir model --outcome_type continous --batch_size 16 --max_epoch 2000 --learning_rate 5e-3 --dropout_rate 0.5 --window_size 8 8 8 --kernel_size 64 64 32 --strides 4 4 4

After training, the well-trained model will be saved to model directory.

Evaluate the well-trained model

python3 src/MDeep.py --evaluation --data_dir data/USA --result_dir result/USA --model_dir model --outcome_type continous --batch_size 16 --max_epoch 2000 --learning_rate 5e-3 --dropout_rate 0.5 --window_size 8 8 8 --kernel_size 64 64 32 --strides 4 4 4

The program will evaluate the well-trained model, draw a R-squared figure, and save it to result directory.

Test the model with unlabelled data

python3 src/MDeep.py --test --test_file data/USA/X_test.npy  --correlation_file data/USA/c.npy --result_dir result/USA --model_dir model --outcome_type continous --batch_size 16 --max_epoch 2000 --learning_rate 5e-3 --dropout_rate 0.5 --window_size 8 8 8 --kernel_size 64 64 32 --strides 4 4 4

The program will take the unlabelled test file and save the prediction result to result directory.

Malawian Twin pairs Human Gut Microbiome data (Binary-Outcome)

Train the model

The USA Human Gut Microbiome data contains 995 samples with 2291 OTUs.

python3 src/MDeep.py --train --data_dir data/Malawiantwin_pairs --model_dir model --outcome_type binary --batch_size 32 --max_epoch 500 --learning_rate 1e-4 --dropout_rate 0.5 --window_size 128 4 --kernel_size 32 32 --strides 64 2

Evaluate the well-trained model

python3 src/MDeep.py --evaluation --data_dir data/Malawiantwin_pairs --result_dir result/Malawiantwin_pairs --model_dir model --outcome_type binary --batch_size 32 --max_epoch 500 --learning_rate 1e-4 --dropout_rate 0.5 --window_size 128 4 --kernel_size 32 32 --strides 64 2

The program will draw a ROC figure and save it to result directory.

Test the model with unlabelled data

python3 src/MDeep.py --test --test_file data/Malawiantwin_pairs/X_test.npy --correlation_file data/Malawiantwin_pairs/c.npy --result_dir result/Malawiantwin_pairs --model_dir model --outcome_type binary --batch_size 32 --max_epoch 500 --learning_rate 1e-4 --dropout_rate 0.5 --window_size 128 4 --kernel_size 32 32 --strides 64 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Microbiome based deep learning method for predicting continuous and binary outcome

Requirements and Installation

Usage

Example

USA Human Gut Microbiome data (Continous-Outcome)

Train the model

Evaluate the well-trained model

Test the model with unlabelled data

Malawian Twin pairs Human Gut Microbiome data (Binary-Outcome)

Train the model

Evaluate the well-trained model

Test the model with unlabelled data

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
data		data
docs		docs
model		model
result		result
src		src
README.md		README.md
requirements.txt		requirements.txt

lichen-lab/MDeep

Folders and files

Latest commit

History

Repository files navigation

Microbiome based deep learning method for predicting continuous and binary outcome

Requirements and Installation

Usage

Example

USA Human Gut Microbiome data (Continous-Outcome)

Train the model

Evaluate the well-trained model

Test the model with unlabelled data

Malawian Twin pairs Human Gut Microbiome data (Binary-Outcome)

Train the model

Evaluate the well-trained model

Test the model with unlabelled data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages