Skip to content

repo for DynamicBind: Predicting ligand-specific protein-ligand complex structure with a deep equivariant generative model

License

Notifications You must be signed in to change notification settings

xpluspro/DynamicBind

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DynAlloBind

Source code and benchmark for the paper Dynamics-Inspired Generative Discovery of Allosteric Ligands Reveals HCAR1 as a Therapeutic Target in Inflammation

The DynAlloBind model was developed by fine-tuning the original DynamicBind4 framework on a strategically augmented training set. We began with the complete DynamicBind dataset and expanded it by applying the identical data collection pipeline to encompass all relevant Protein Data Bank (PDB) depositions through the end of 2023. This extended set was further supplemented with a small, curated collection of additional GPCR–ligand complexes to ensure comprehensive coverage.

Setup Environment

Create a new environment for inference. While in the project directory run

conda env create -f environment.yml

Or you setup step by step:

conda create -n dynamicbind python=3.10

Activate the environment

conda activate dynamicbind

Install

conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
conda install -c conda-forge rdkit
conda install pyg  pyyaml  biopython -c pyg
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu117.html
pip install e3nn  fair-esm spyrmsd

Create a new environment for structural Relaxation.

conda create --name relax python=3.8

Activate the environment

conda activate relax

Install

conda install -c conda-forge openmm pdbfixer libstdcxx-ng openmmforcefields openff-toolkit ambertools=22 compilers biopython

Checkpoints Download

Download and unzip the workdir.zip containing the model checkpoint from https://drive.google.com/file/d/1-GTtaEFavlYkPpsC9S60HzmZ6TjO3kj4/view?usp=drive_link

Inference

Dynamic Docking

By default: 40 poses will be predicted, poses will be ranked (rank1 is the best-scoring pose, rank40 the lowest), relax processes are included.

Inputs:

  1. Protein (PDB File): protein.pdb
    • Automatically cleaned to remove non-standard amino acids, water molecules, or small molecules.
  2. Ligand (CSV File): ligand.csv
    • Must contain a column named 'ligand' listing smiles.
  3. Number of Animations:
    • outputs intermediate pkl data, not the final animation PDB. (After --savings_per_complex, default is 40)
  4. Frames in Animation/inference_steps:
    • default is 20.

Additional Options:

  • --header: Name of the result folder.
  • --device: GPU device ID.
  • --python: Python environment for inference.
  • --relax_python: Python environment for relaxation.
  • --num_workers: Number of processes for final output relaxation.

Example Command:

python run_single_protein_inference.py ./data/8y6y.pdb ./data/8y6y.csv --savings_per_complex 40 --inference_steps 20 --header test --device $1 --python /path/to/dynamicbind/python --relax_python /path/to/relax/python

Docking Outputs

The results of the docking step, typically found in the results/test folder, include:

  1. Affinity Score for Each Complex: affinity_prediction.csv
  2. Pose Score and Conformation of Each Animation: Example files like rank1_ligand_lddt0.63_affinity5.67_relaxed.sdf (where 0.63 is the pose score) and corresponding protein .pdb files.
  3. Data for Animation Generation: Such as rank1_reverseprocess_data_list.pkl and rank2_reverseprocess_data_list.pkl.

Movie Generation

Inputs:

  1. Data from Docking Output: Indicated by paths like results/test/index0_idx_0/. The notation "1+2" implies that movies for rank1 and rank2 poses are needed.
  2. Number of Animations: Specified by the user (default is "1").

Example command for generating movies:

python movie_generation.py results/test/index0_idx_0/ 1+2 --device $1 --python /path/to/dynamicbind/python --relax_python /path/to/relax/python

Outputs:

  • Final Animation PDB Files: Located in results/test_8y6y/index0_idx_0/, with files like rank1_receptor_reverseprocess_relaxed.pdb and rank1_ligand_reverseprocess_relaxed.pdb.

High-Throughput Screening (HTS)

Example command for HTS:

python run_single_protein_inference.py protein.pdb ligand.csv --hts --savings_per_complex 3 --inference_steps 20 --header test --device $1 --python /path/to/dynamicbind/python --relax_python /path/to/relax/python

HTS Output files:

  • complete_affinity_prediction.csv
  • affinity_prediction.csv

How to Run Our Benchmark

  • Use AlphaFold2/3 to predict the protein structure.
  • Use the predicted protein structure to run DynAlloBind for prediction as described above.
  • Compare the results with the ground truth to calculate the RMSD.

Reference

@article{lu2024dynamicbind,
  title={DynamicBind: predicting ligand-specific protein-ligand complex structure with a deep equivariant generative model},
  author={Lu, Wei and Zhang, Jixian and Huang, Weifeng and Zhang, Ziqiao and Jia, Xiangyu and Wang, Zhenyu and Shi, Leilei and Li, Chengtao and Wolynes, Peter G and Zheng, Shuangjia},
  journal={Nature Communications},
  volume={15},
  number={1},
  pages={1071},
  year={2024},
  publisher={Nature Publishing Group UK London}
}

About

repo for DynamicBind: Predicting ligand-specific protein-ligand complex structure with a deep equivariant generative model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 87.9%
  • Python 11.9%
  • Other 0.2%