A static transpiler that converts x86 assembly source code to semantically equivalent ARMv8-A (AArch64) assembly through LLVM IR.
x86 Assembly (.s) → Lexer → Parser → AST → LLVM IR → [Optimization] → ARM Assembly (.s)
TransISA uses traditional compiler techniques to give developers full transparency and control over the translation process. Every intermediate stage (tokens, AST, IR, optimized IR) is inspectable, making the tool suitable for legacy code migration, compiler research, and cross-ISA analysis.
- Full pipeline: Lexer → Parser → AST → LLVM IR → ARM assembly in a single invocation
- Configurable optimization: Three levels (O0, O1, O2) to control the fidelity-vs-efficiency tradeoff
- IR inspection: Dump LLVM IR before and after optimization with
--emit-ir - Cross-platform: Builds and runs on macOS (Apple Silicon / Intel) and Linux
- Tested: Unit tests for lexer, parser, and IR generation via Google Test; CI on both platforms
| Category | Instructions |
|---|---|
| Data movement | mov, lea, push, pop |
| Arithmetic | add, sub, mul, div, inc, dec, neg |
| Bitwise | and, or, xor, shl, shr, sar |
| Comparison | cmp, test |
| Control flow | jmp, je, jne, jg, jge, jl, jle, ja, jae, jb, jbe, jz, jnz, jo, jno, js, jns |
| Procedure | call, ret |
| System | syscall, int |
| Directives | .text, .data, .bss, .global, .asciz, .byte, .word, .long, .quad, resb, resw, resd |
- CMake >= 3.16
- LLVM 19 (with AArch64 and X86 backends)
- C++17 compiler (Clang recommended)
- Ninja (optional, recommended)
macOS (Homebrew):
brew install llvm@19 cmake ninjaUbuntu/Debian:
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo add-apt-repository "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-19 main"
sudo apt-get update
sudo apt-get install -y cmake ninja-build llvm-19 llvm-19-dev clang-19 lld-19git clone https://github.com/fnhirwa/TransISA.git
cd TransISA
# Configure
cmake -B build -S . -G Ninja -DCMAKE_BUILD_TYPE=Release
# Build
cmake --build build
# Verify
./build/src/TransISA --helpIf CMake cannot find LLVM, set the path manually:
# macOS
cmake -B build -S . -G Ninja -DLLVM_DIR=/opt/homebrew/opt/llvm@19/lib/cmake/llvm
# Linux
cmake -B build -S . -G Ninja -DLLVM_DIR=/usr/lib/llvm-19/lib/cmake/llvmUsage: TransISA <input.s> [options]
Options:
-o <file> Output assembly file (default: output.s)
--opt-level=N Optimization level: 0, 1, 2 (default: 0)
--emit-ir Dump LLVM IR to stderr before and after optimization
--verbose Print tokens, AST, and pipeline stages
--help Show this message
# Transpile x86 to ARM with no optimization (raw lifted output)
./build/src/TransISA benchmarking/x86/hello.s -o hello_arm.s --opt-level=0
# With LLVM O2 optimization
./build/src/TransISA benchmarking/x86/hello.s -o hello_arm_O2.s --opt-level=2
# Inspect the LLVM IR at each stage
./build/src/TransISA benchmarking/x86/add.s -o add_arm.s --opt-level=2 --emit-ir 2> add_ir.ll# Transpile
./build/src/TransISA benchmarking/x86/hello.s -o hello_arm.s --opt-level=2
# Assemble and link
as -arch arm64 hello_arm.s -o hello_arm.o
ld -macosx_version_min 11.0 -o hello_arm hello_arm.o \
-lSystem -syslibroot $(xcrun --show-sdk-path) -e __start
# Run
./hello_armRun a benchmark at all three levels to see the impact of LLVM passes:
for level in 0 1 2; do
./build/src/TransISA benchmarking/x86/add.s \
-o benchmarking/arm/add_O${level}.s \
--opt-level=${level}
doneThe benchmarking/ directory contains x86 source programs and a Python script that automates metric collection.
cd benchmarking
python3 analyzefiles.pyThis compiles both x86 and ARM assembly files, links them, and prints a comparison table with:
- Instruction count
- Text section size (bytes)
- Syscall count
- Stack size (bytes)
Requirements: macOS with both x86_64 and arm64 toolchains available (standard on Apple Silicon Macs). Requires llvm-objdump and otool.
cd build
ctest --output-on-failureOr run individual test suites:
./build/tests/test_lexer
./build/tests/test_parser
./build/tests/test_llvm_codegenTransISA/
├── include/
│ ├── lexer/ # Lexer, Token, Trie headers
│ ├── parser/ # Parser, AST node definitions
│ ├── llvm_ir/ # LLVM IR generator header
│ └── codegen/ # Backend codegen + optimization header
├── src/
│ ├── lexer/ # Trie-based tokenizer
│ ├── parser/ # AST construction from token stream
│ ├── llvm_ir/ # x86 → LLVM IR mapping (1500+ lines)
│ ├── codegen/ # LLVM optimization passes + ARM emission
│ └── main.cpp # CLI entry point
├── tests/ # Google Test suites (lexer, parser, IR gen)
├── benchmarking/
│ ├── x86/ # Source x86 assembly programs
│ ├── arm/ # Transpiled ARM output
│ └── analyzefiles.py # Automated metric collection
├── .github/workflows/ # CI: tests (macOS + Ubuntu) + code quality
├── CMakeLists.txt
└── LICENSE # MIT
-
Lexer: Tokenizes x86 assembly using a trie-based classifier. Handles registers, instructions, immediates (decimal/hex/binary/octal), labels, directives, memory operands, and strings.
-
Parser: Constructs a hierarchical AST:
RootNode→FunctionNode→BasicBlockNode→InstructionNodewith typed operands (RegisterNode,IntLiteralNode,MemoryNode, etc.). -
IR Generator: Walks the AST and emits LLVM IR. x86 registers are modeled as
allocavariables. CPU flags (ZF, SF, CF, OF) are explicitly tracked. Stack operations use a simulated memory buffer with pointer arithmetic. Syscalls are translated to platform-appropriate inline assembly. -
Optimizer: Applies LLVM's pass pipeline at the requested level. The
mem2regpass (active at O1+) promotes register allocas to SSA values, which is the most impactful transformation for transpiled code quality. -
Backend: Feeds optimized IR to LLVM's AArch64 code generator for register allocation, instruction selection, and assembly emission.
See CONTRIBUTING.md. The project uses clang-format and cmake-format via pre-commit hooks.
pip install pre-commit
pre-commit installIf you use TransISA in your research, please cite:
@inproceedings{nshuti2026transisa,
author = {Nshuti, Felix Hirwa},
title = {{TransISA}: A Static Transpiler for Migrating Legacy x86 Assembly to {ARM} in Scientific Computing},
booktitle = {Proceedings of the 2026 Improving Scientific Software Conference (ISS26)},
year = {2026},
address = {Boulder, CO, USA}
}