Kmers.jl provide the Kmer <: BioSequence type which implement the concept of a
k-mer, a biological sequence of exactly length k.
K-mers are used frequently in bioinformatics because, when k is small and known at compile time, these sequences can be efficiently represented as integers and stored directly in CPU registers, allowing for much more efficient computation than arbitrary-length sequences.
In Kmers.jl, the Kmer type is parameterized by its length, and its data is stored in an NTuple. This makes Kmers bitstypes and highly efficient.
Conceptually, one may use the following analogy:
BioSequenceis likeAbstractVectorLongSequenceis likeVectorKmeris likeSVectorfromStaticArrays
Kmers.jl is tightly coupled to the
BioSequences.jl package,
and relies on its internals.
Hence, you should expect strict compat bounds on BioSequences.jl.
Kmers are parameterized by their length. That means any operation on Kmers that change their length, such as push, pop, slicing, or masking (logical indexing) will be type unstable and hence slow and memory inefficient, unless you write your code in such as way that the compiler can use constant folding.
Further, as Kmers are immutable and their operations are aggressively inlined and unrolled,
they become inefficent as they get longer.
For example, reverse-complementing a 32-mer takes 26 ns, compared to 102 ns for the equivalent LongSequence. However, for 512-mers, the LongSequence takes 126 ns, and the Kmer 16 μs!
Kmers.jl is intended for high-performance computing. If you do not need the extra performance that register-stored sequences provide, you might consider using LongSequence from BioSequences.jl instead
You can install BioSequences from the julia
REPL. Press ] to enter pkg mode, and enter the following:
pkg> add KmersIf you are interested in the cutting edge of development, please check out the master branch to try new features before release.
We appreciate contributions from users including reporting bugs, fixing issues, improving performance and adding new features.
Take a look at the contributing files detailed contributor and maintainer guidelines, and code of conduct.
If you have a question about contributing or using BioJulia software, come on over and chat to us on the Julia Slack workspace, or you can try the Bio category of the Julia discourse site.