Non-backtracking NFA regex engine

Implementation of a regex engine with stable running time and memory consumption. It runs in O(m*n) time with O(m) memory, where m is the size of the regex, and n is the size of the input.

The project consists of two parts: frontend written in C# and C backend. IT also includes x86-64 JIT compiler that makes matching ~5x faster compared to the standard implementation.

I haven't systematically collected and published benchmarks, but here's what I've found:

JIT produces code ~5x faster than the standard C implementation.
My engine (JIT) matches as fast as .NET and Python on small regexes, but slower than Perl and DFA engines.
Performance degrades linearly with the growth of the regex.
It can handle cases like (a*)*bz with the string aaa...az in linear time, while implementations mentioned above suffer from catastrophic backtracking.
It compiles and matches expressions like (a|b)*a(a|b){n} in linear time, while DFA implementations require exponential run time and memory, since the minimal DFA for that expression has 2^n nodes.

Compared to DFA-based approaches, it is relatively easy to implement extensions for non-backtracking NFAs, such as submatches and assertions. However, backreferences require backtracking.

Basic optimizations in the JIT code generator may significantly increase performance. The current implementation is far from being effective - it creates a number of unnecessary branches, suffers from branch misprediction and CPU stalls.

The reason I separated this project into C# and C parts is that I was doing it for my university course, and the requirement was to make interop between C# program and C library.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
regex-runtime		regex-runtime
regex.Test		regex.Test
regex		regex
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
regex-cs.sln		regex-cs.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Non-backtracking NFA regex engine

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Non-backtracking NFA regex engine

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages