patchouly is an experimental copy-and-patch JIT framework
for building base line JITs in Rust, made available by Rust's
explicit_tail_calls and rust_preserve_none_cc.
Copy-and-patch is a runtime compilation technique
where you precompile tiny machine-code templates (called "stencils"),
then build native code at runtime by copying selected stencils
and patching constants, addresses, and jump targets.
The result, as is demonstrated in the original paper,
is a very fast compilation pipeline (by only copying and patching things)
and decent code quality as a baseline JIT
(by using snippets generated by other compilers under -O3).
There are some tutorials on copy-and-patch online already, and I'm also working on a tiny book on the subject under ./book:
Here is a short outline of how you can build a brainf**k interpreter
using the copy-and-patch technique (taken from our examples/bf crate):
-
Write a code generator, either with Rust macros or a scripting language, to batch-generate stencils for you. We are using the
#[stencil]macro provided by ourpatchouly-macroscrate here. -
Define your stencils:
struct Ptr(*mut u8); impl Ptr { /* your #[inline(always)] code */ } #[stencil] fn add1(mut dp: Ptr, i: usize) { *unsafe { dp.get_mut(i) } += 1; } #[stencil] fn addn(mut dp: Ptr, i: usize, #[hole] n: usize) { *unsafe { dp.get_mut(i) } += n as u8; } #[stencil] fn print(mut dp: Ptr, i: usize, #[hole] print: OutputFn) { print.0(*unsafe { dp.get_mut(i) }); } #[stencil] fn if_zero( dp: Ptr, len: usize, i: usize, #[target] then: _, #[target] or_else: _, #[target] panic: _, ) { if let Some(v) = dp.to_slice(len).get(i) { if *v == 0 { then } else { or_else } } else { panic } }
[!NOTE] We are using a lot of
unsafehere. It's because, to keep the stencils small and fast, we absolutely don't wantpanic!routes in every single stencil. Instead, what I use to ensure safety inexamples/bfis bound-checking only at some regularly insertedcheckstencils and loop stencils, with a bit of extra allocation outside the bounds. -
Generate all register permutations with your code generator, compile the stencils with
--releaseflag, and extract the output from the objects files into something like (or unlike) this://! Generated by patchouly-build use patchouly_core::{relocation::Relocation, Stencil, StencilFamily, StencilLibrary}; pub const BF_STENCIL_LIBRARY: StencilLibrary<4> = StencilLibrary { code: include_bytes!(r#"/somewhere/bf_stencils.bin"#), }; pub const BF_ADD1: StencilFamily<3, 0, 4, 0, 1> = StencilFamily { relocation_data: &[ Relocation::from_bits(0x10002001000b), Relocation::from_bits(0x210002001001a), Relocation::from_bits(0x2ffc2081002c), Relocation::from_bits(0x0), // ... ], stencils: &[ Stencil::from_bits(0x3000000111), Stencil::from_bits(0x4001f00000141), Stencil::from_bits(0x4001f00000160), // ... ], };
-
Now, for whatever IR or AST you have, generate code by copying the stencils over:
// Pseudo-code: let mut block = PatchBlock::new(&BF_STENCIL_LIBRARY); // We use three registers to represent the bf state: let params = (0..3).map(|i| Location::register(0)).collect(); block.add(&BF_ADDN, ¶ms, &['B'])?; block.branch(&BF_IF_ZERO, ¶ms, &[BlockId(1), BlockId(2)])?; let mut block2 = ...
-
After copying all code for all blocks, now that we know the sizes of each block, we now finalize the blocks by allocating the actual memmap, patching in all the exact addresses of each blocks with relocation info extracted from the objects files.
// Pseudo-code: let base = mmap.as_ptr() as usize; relocations.iter().for_each(|(target, reloc)| { reloc.apply(mmap[range], blocks.resolve_target(base, target)); });
[!NOTE] There are different kinds of relocations, like constants, branches, or external function calls. Inter-block branches can only be patched after all blocks are finalized, but many others can be done during your "copying" stage, like constants or next-block offsets.
-
And finally, make the memmap executable, and voila!
let map = memmap2::MmapMut::map_anon(30000).unwrap(); let map = map.make_exec().unwrap(); let run = unsafe { std::mem::transmute::<*const u8, patchouly::RawFn3<()>>(map.as_ptr()) }; let len = 30000; let data = vec![0u8; len + 4096 * 2]; let result = run(&mut (), data[4096..].as_ptr() as usize, len, 0) as isize;
I ran the same benchmarks as Rodrigodd's bf-compiler repo, and here are the results (on an AMD 5600G machine):
| optimized-jit | optimized-cranelift-jit | patchouly | patchouly-sat | |
|---|---|---|---|---|
| mandelbrot | 1.09 | 1.11 | 0.96 | 1.38 |
| factor | 0.35 | 0.34 | 0.42 | 0.52 |
optimized-jit: hand-written assembly with dynasm, with out-of-bound wrappingoptimized-cranelift-jit: cranelift, with some brainf**k optimizations, with out-of-bound wrappingpatchouly: theexamples/bfcrate, using copy-and-patch, with some optimizations for common brainf**k patterns, and with ocasional out-of-bound checking + panic return routespatchouly-sat: likepatchouly, but switched to out-of-bound wrapping for fairness
Please note that that repo is from 2022, and I think cranelift might be quite a bit faster now. But, I feel quite content with being able to compete with the JITs above anyway.
- Some of our examples: examples/README.md
- A work-in-progress book:
This project is very much experimental. And I won't consider building a brainf**k runtime dogfooding. So... have fun, but use it at your own risk. Bug reports and contributions are very welcome!