# First, install hifiasm and ensure that it is added to the environment variables (requiring g++ and zlib)
git clone https://github.com/chhylp123/hifiasm
cd hifiasm && make
nano ~/.bashrc
export PATH="/<your_dir>/hifiasm:$PATH"
source ~/.bashrc
# Then, install minimap2 and hificcl (requires Python3.10 and the pysam package)
git clone https://github.com/lh3/minimap2
cd minimap2 && make
nano ~/.bashrc
export PATH="/<your_dir>/minimap2:$PATH"
git clone https://github.com/zjjbuqi/HiFiCCL.git
pip install pysam
or
conda create -n hificcl python=3.10.11 pysam=0.21.0 -c conda-forge
conda activate hificcl
#[optional] if you want to use the optional mode of HiFiCCL, you also need to install minigraph and add it to the system path.
git clone https://github.com/lh3/minigraph
cd minigraph && make
nano ~/.bashrc
export PATH="/<your_dir>/minigraph:$PATH"
source ~/.bashrc
# Assembly under the main mode of HiFiCCL with low coverage HiFi reads
python /<your_path>/hificcl.py -o ./ -t 30 -f <absolute_path/Input.fasta> -r <absolute_path/T2T-reference.fasta>
# Assembly under the optional mode of HiFiCCL with low coverage HiFi reads
python /<your_path>/hificcl.py -m p -o ./ -t 30 -f <absolute_path/Input.fasta> -r <absolute_path/T2T-reference.fasta> -R <absolute_path/Pan-reference.gfa>
- minimap2
- hifiasm or flye or lja
- pysam 0.21.0
- python 3.10.11
- minigraph (optional)
- Getting Started
- Dependency
- Introduction
- Why HiFiCCL?
- Usage
- Results
- Tutorial
- Contact
- Citing HiFiCCL
- License
Population genomics using short-read resequencing captures single nucleotide polymorphisms and small insertions and deletions but struggles with structural variants (SVs), leading to a loss of heritability in genome-wide association studies. In recent years, long-read sequencing has improved pangenome construction for key eukaryotic species, addressing this issue to some extent. Sufficient-coverage high-fidelity (HiFi) data for population genomics is often prohibitively expensive, limiting its use in large-scale populations and broader eukaryotic species and creating an urgent need for robust low coverage assemblies. However, current assemblers underperform in such conditions. To address this, we propose HiFiCCL, the first assembly framework specifically designed for low-coverage high-fidelity reads, using a reference-guided, chromosome-by chromosome assembly approach. We demonstrate that HiFiCCL improves ultra-low coverage assembly performance of existing assemblers and outperforms the state-of-the-art assemblers on human and plant datasets. Tested on 45 human datasets (~5x coverage), HiFiCCL combined with hifiasm reduces the length of misassembled contigs relative to hifiasm by an average of 21.19% and up to 38.58%. These improved assemblies enhance large germline structural variant detection, reduce chromosome-level mis-scaffolding, enable more accurate pangenome graph construction, and improve the detection of rare and somatic structural variants based on the pangenome graph under low-coverage conditions.
-
HiFiCCL improves the assembly performance of different assemblers, such as Hifiasm, HiFlye, and LJA, under low coverage conditions.
-
HiFiCCL's improvement in assembly performance is also reflected in its enhancement of assembly-based SV detection, particularly in detecting challenging medically relevant SVs.
-
Upon scaffolding the HiFiCCL assembly results, it was found that inter-chromosomal mis-scaffolding were significantly reduced compared to the base assemblers.
-
HiFiCCL demonstrates exceptional generalizability, as testing on 45 low coverage human datasets revealed that HiFiCCL statistically achieved better assembly quality than Hifiasm.
-
At about 5x coverage human datasets, HiFiCCL runs faster than Hifiasm while using a comparable amount of memory.
A typical HiFiCCL command line looks like:
python /<your_path>/hificcl.py -f /your_dir/HG002_5x.fasta -r /your_dir/CHM13v2.0.fasta -o <your_dir> -t 32where -f specifies the input reads, -r specifies the linear reference genome used to guide the assembly, -t sets the number of CPUs in use and -o specifies the output directory. Finally, the primary contigs are written to output.fasta.
HiFiCCL uses Hifiasm as the default base assembler, but you can specify a different assembler using the -a option, such as -a flye or -a lja, provided that these base assemblers are already installed and added to the system path.
HiFiCCL also generates assembly results for each chromosome. For more details, refer to the [tutorial] section.
HiFiCCL can utilize not only the linear reference genome to guide assembly, but also the pangenome graph simultaneously for assembly guidance.
python /<your_path>/hificcl.py -m p -f /your_dir/HG002_5x.fasta -r /your_dir/CHM13v2.0.fasta -R /your_dir/hprc-v1.0-minigraph-chm13.gfa -o <your_dir> -t 32In this mode, you need to use the -m option to specify the optional mode, use the -R option to specify the pangenome graph for assembly guidance.
HiFiCCL will generate the alignment information of the reads to the reference genome, which is written to prefix.map_to_reference.sam, and the pairwise alignment information between the reads, which is written to prefix.all_vs_all.paf. Additionally, it will output the assembly results for different chromosomes, as well as the merged assembly results. For more details, refer to the [tutorial] section.
The following table shows the statistics of HiFiCCL combined with Hifiasm(0.19.5-r592).
| Dataset | Species size | Cov. | Asm options | Wall time | Maximum resident set size | NG50 |
|---|---|---|---|---|---|---|
| HG002 | 3.1Gb | ×5 | -t20 --primary | 5.0h | 55G | 199.0Kb |
| NA19240 | 3.1Gb | ×5 | -t20 --primary | 4.1h | 53G | 123.4Kb |
| Rice | 390Mb | ×5 | -t20 --primary | 0.67h | 19.0G | 61.0Kb |
| Arabidopsisthaliana | 135Mb | ×5 | -t20 --primary | 0.2h | 17.4G | 90.4Kb |
HiFiCCL relies on minimap2 for alignment and various assemblers for assembly. If you intend to use HiFiCCL with Hifiasm for assembly, you first need to visit minimap2's GitHub page [https://github.com/lh3/minimap2], install minimap2, and add it to your system path to ensure the tool can be invoked directly by typing minimap2 in the command line. Then, you need to visit Hifiasm's GitHub page [https://github.com/chhylp123/hifiasm], install Hifiasm, and similarly add it to your system path. If you wish to use other base assemblers, follow the same process. Then, install HiFiCCL directly from GitHub. You will need to install the pysam package. The command is as follows:
git clone https://github.com/zjjbuqi/HiFiCCL.git
pip install pysamSimilarly, you need to add HiFiCCL to the system path.
usage: hificcl.py [-h] [-m model [str]] [-r linear_reference_genome] [-R Pangenome_graph] -f input_reads [file] [-p prefix [str]] [-t threads [str]]
[--overwrite] [-a assembler [str]] [--hifiasmoption [str]] [--ljaoption [str]] [--flyeoption [str]] [--minimapoption1 [str]]
[--minimapoption2 [str]] [--max_hang [int]] [--int_frac [float]] [--minigraphoption [float]] [-o output_dir [str]] [-W weight [int]]
[-N number [int]] [--iterations [int]] [--process [int]] [-v]
options:
-h, --help show this help message and exit
-m model [str] Select the reference genome, normal or pan-genome. Please enter n or p! [n]
-r Linear_reference_genome [file]
Linear reference genome file, FASTA format.
-R Pangenome_graph [file] Pan-reference genome file, GFA format. [optional]
-f input_reads [file]
(*Required) raw reads file, FASTA format.
-p prefix [str] The prefix used on generated files, default: hificcl
-t threads [str] Use number of threads, default: 10
--overwrite Overwrite existing alignment file instead of reuse.
-a assembler [str] Specify assembler (support hifiasm, lja, and flye), default: hifiasm.
--hifiasmoption [str]
Pass additional parameters to hifiasm program for primary assembly [default --primary].
--ljaoption [str] Pass additional parameters to lja program for primary assembly [default --diploid].
--flyeoption [str] Pass additional parameters to flye program for primary
--minimapoption1 [str]
Pass additional parameters to minimap2 program for all-vs-all mapping.
--minimapoption2 [str]
Pass additional parameters to minimap2 program for mapping to reference.
--max_hang [int] Maximum overhang length [1000]. An overhang is an unmapped region that should be mapped given a true overlap or true containment. If
the overhang is too long, the mapping is considered an internal match and will be ignored.
--int_frac [float] Minimal ratio of mapping length to mapping+overhang length for a mapping considered a containment or an overlap [0.8].
--minigraphoption [float]
Pass additional parameters to minigraph program for mapping to pan-reference-genome.
-o output_dir [str] Output files path [default current directory]
-W weight [int] Lainning drop weight [0.75].
-N number [int] Supporting the numebr of lainning reads [3]
--iterations [int] Chromosome label correction rounds [200]
--process [int] Number of processes used, with options of 1 or 2 [1]
-v, --version The version of HiFiCCL
Example: python /<your_path>/hificcl.py -r <Linear_Reference.fasta> -f <your_input.fasta> -t <threads> -o <your_dir>
I hope this tool proves helpful for your research!
HiFiCCL will generate the alignment information of the reads to the reference genome, which is written to *prefix*.map_to_reference.sam, and the pairwise alignment information between the reads, which is written to *prefix*.all_vs_all.paf. Additionally, it will output the assembly results for different chromosomes, as well as the merged assembly results. The chr_by_chr_reads file contains the results after applying the chromosome binning algorithm. chr* files represent the assembly results for different chromosomes. output.fasta is the final assembly result, used for assembly evaluation. The optional mode will also output the *prefix*.gaf file, which represents the alignment information of sequences to the pangenome graph.
The analyses included in the paper and the commands used can be found in paper_analysis.txt. The assembly results referenced in the manuscript are available on Zenodo and can be accessed using the DOI: 10.5281/zenodo.17204429.
If you experience any problems or have suggestions please create an issue or a pull request.
If you use HiFiCCL in your work, please cite:
Jiang Z, Pan W, Gao R, et al. Reference-Guided Chromosome-by-Chromosome de novo Assembly at Scale Using Low-Coverage High-Fidelity Long-Reads with HiFiCCL. Adv Sci (Weinh). Published online December 25, 2025. doi:10.1002/advs.202515308
The project is licensed under the MIT License.