Author: Marcin Gradowski, Marianna Krysińska
The script auto_HMM_beta.py automates the processing of protein sequences in FASTA format. It splits sequences into shorter and longer ones based on a specified length threshold, clusters similar sequences using CD-HIT, aligns them with Clustal Omega, and builds Hidden Markov Model (HMM) profiles using HMMER.
- Python 3.6+
- Biopython
- CD-HIT
- Clustal Omega (clustalo)
- HMMER (hmmbuild)
-
Clone the repository:
git clone https://github.com/username/auto_HMM_beta.git cd auto_HMM_beta -
Install Python dependencies:
pip install biopython
-
Ensure the following tools are installed and accessible in your PATH:
cd-hit clustalo hmmbuild
python auto_HMM_beta.pyDefault parameters:
cutoff(sequence length): 50cd_hit_threshold(CD-HIT similarity threshold): 0.80cd_hit_word_length(CD-HIT word length): 5
You can modify these parameters directly in the script or by specifying command-line arguments if implemented.
/ShortsAndLong/ # Results of sequence splitting
/Clustered/ # CD-HIT clustering outputs
/ClustalO/ # Aligned FASTA files
/HMM/ # Generated HMM profiles
HMM_names.txt # List of HMM profile names and metadata
BSD 2-Clause "Simplified" License