TransISA

A static transpiler that converts x86 assembly source code to semantically equivalent ARMv8-A (AArch64) assembly through LLVM IR.

x86 Assembly (.s) → Lexer → Parser → AST → LLVM IR → [Optimization] → ARM Assembly (.s)

TransISA uses traditional compiler techniques to give developers full transparency and control over the translation process. Every intermediate stage (tokens, AST, IR, optimized IR) is inspectable, making the tool suitable for legacy code migration, compiler research, and cross-ISA analysis.

Features

Full pipeline: Lexer → Parser → AST → LLVM IR → ARM assembly in a single invocation
Configurable optimization: Three levels (O0, O1, O2) to control the fidelity-vs-efficiency tradeoff
IR inspection: Dump LLVM IR before and after optimization with --emit-ir
Cross-platform: Builds and runs on macOS (Apple Silicon / Intel) and Linux
Tested: Unit tests for lexer, parser, and IR generation via Google Test; CI on both platforms

Supported x86 Instructions

Category	Instructions
Data movement	`mov`, `lea`, `push`, `pop`
Arithmetic	`add`, `sub`, `mul`, `div`, `inc`, `dec`, `neg`
Bitwise	`and`, `or`, `xor`, `shl`, `shr`, `sar`
Comparison	`cmp`, `test`
Control flow	`jmp`, `je`, `jne`, `jg`, `jge`, `jl`, `jle`, `ja`, `jae`, `jb`, `jbe`, `jz`, `jnz`, `jo`, `jno`, `js`, `jns`
Procedure	`call`, `ret`
System	`syscall`, `int`
Directives	`.text`, `.data`, `.bss`, `.global`, `.asciz`, `.byte`, `.word`, `.long`, `.quad`, `resb`, `resw`, `resd`

Prerequisites

CMake >= 3.16
LLVM 19 (with AArch64 and X86 backends)
C++17 compiler (Clang recommended)
Ninja (optional, recommended)

Installing LLVM 19

macOS (Homebrew):

brew install llvm@19 cmake ninja

Ubuntu/Debian:

wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo add-apt-repository "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-19 main"
sudo apt-get update
sudo apt-get install -y cmake ninja-build llvm-19 llvm-19-dev clang-19 lld-19

Building

git clone https://github.com/fnhirwa/TransISA.git
cd TransISA

# Configure
cmake -B build -S . -G Ninja -DCMAKE_BUILD_TYPE=Release

# Build
cmake --build build

# Verify
./build/src/TransISA --help

If CMake cannot find LLVM, set the path manually:

# macOS
cmake -B build -S . -G Ninja -DLLVM_DIR=/opt/homebrew/opt/llvm@19/lib/cmake/llvm

# Linux
cmake -B build -S . -G Ninja -DLLVM_DIR=/usr/lib/llvm-19/lib/cmake/llvm

Usage

Usage: TransISA <input.s> [options]

Options:
  -o <file>       Output assembly file (default: output.s)
  --opt-level=N   Optimization level: 0, 1, 2 (default: 0)
  --emit-ir       Dump LLVM IR to stderr before and after optimization
  --verbose       Print tokens, AST, and pipeline stages
  --help          Show this message

Basic transpilation

# Transpile x86 to ARM with no optimization (raw lifted output)
./build/src/TransISA benchmarking/x86/hello.s -o hello_arm.s --opt-level=0

# With LLVM O2 optimization
./build/src/TransISA benchmarking/x86/hello.s -o hello_arm_O2.s --opt-level=2

# Inspect the LLVM IR at each stage
./build/src/TransISA benchmarking/x86/add.s -o add_arm.s --opt-level=2 --emit-ir 2> add_ir.ll

Building and running the output on ARM (macOS Apple Silicon)

# Transpile
./build/src/TransISA benchmarking/x86/hello.s -o hello_arm.s --opt-level=2

# Assemble and link
as -arch arm64 hello_arm.s -o hello_arm.o
ld -macosx_version_min 11.0 -o hello_arm hello_arm.o \
   -lSystem -syslibroot $(xcrun --show-sdk-path) -e __start

# Run
./hello_arm

Comparing optimization levels

Run a benchmark at all three levels to see the impact of LLVM passes:

for level in 0 1 2; do
  ./build/src/TransISA benchmarking/x86/add.s \
    -o benchmarking/arm/add_O${level}.s \
    --opt-level=${level}
done

Benchmarking

The benchmarking/ directory contains x86 source programs and a Python script that automates metric collection.

cd benchmarking
python3 analyzefiles.py

This compiles both x86 and ARM assembly files, links them, and prints a comparison table with:

Instruction count
Text section size (bytes)
Syscall count
Stack size (bytes)

Requirements: macOS with both x86_64 and arm64 toolchains available (standard on Apple Silicon Macs). Requires llvm-objdump and otool.

Running Tests

cd build
ctest --output-on-failure

Or run individual test suites:

./build/tests/test_lexer
./build/tests/test_parser
./build/tests/test_llvm_codegen

Project Structure

TransISA/
├── include/
│   ├── lexer/           # Lexer, Token, Trie headers
│   ├── parser/          # Parser, AST node definitions
│   ├── llvm_ir/         # LLVM IR generator header
│   └── codegen/         # Backend codegen + optimization header
├── src/
│   ├── lexer/           # Trie-based tokenizer
│   ├── parser/          # AST construction from token stream
│   ├── llvm_ir/         # x86 → LLVM IR mapping (1500+ lines)
│   ├── codegen/         # LLVM optimization passes + ARM emission
│   └── main.cpp         # CLI entry point
├── tests/               # Google Test suites (lexer, parser, IR gen)
├── benchmarking/
│   ├── x86/             # Source x86 assembly programs
│   ├── arm/             # Transpiled ARM output
│   └── analyzefiles.py  # Automated metric collection
├── .github/workflows/   # CI: tests (macOS + Ubuntu) + code quality
├── CMakeLists.txt
└── LICENSE              # MIT

How It Works

Lexer: Tokenizes x86 assembly using a trie-based classifier. Handles registers, instructions, immediates (decimal/hex/binary/octal), labels, directives, memory operands, and strings.
Parser: Constructs a hierarchical AST: RootNode → FunctionNode → BasicBlockNode → InstructionNode with typed operands (RegisterNode, IntLiteralNode, MemoryNode, etc.).
IR Generator: Walks the AST and emits LLVM IR. x86 registers are modeled as alloca variables. CPU flags (ZF, SF, CF, OF) are explicitly tracked. Stack operations use a simulated memory buffer with pointer arithmetic. Syscalls are translated to platform-appropriate inline assembly.
Optimizer: Applies LLVM's pass pipeline at the requested level. The mem2reg pass (active at O1+) promotes register allocas to SSA values, which is the most impactful transformation for transpiled code quality.
Backend: Feeds optimized IR to LLVM's AArch64 code generator for register allocation, instruction selection, and assembly emission.

Contributing

See CONTRIBUTING.md. The project uses clang-format and cmake-format via pre-commit hooks.

pip install pre-commit
pre-commit install

Citation

If you use TransISA in your research, please cite:

@inproceedings{nshuti2026transisa,
  author    = {Nshuti, Felix Hirwa},
  title     = {{TransISA}: A Static Transpiler for Migrating Legacy x86 Assembly to {ARM} in Scientific Computing},
  booktitle = {Proceedings of the 2026 Improving Scientific Software Conference (ISS26)},
  year      = {2026},
  address   = {Boulder, CO, USA}
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
benchmarking		benchmarking
docs		docs
examples		examples
include		include
src		src
tests		tests
.clang-format		.clang-format
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
merge_main.sh		merge_main.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TransISA

Features

Supported x86 Instructions

Prerequisites

Installing LLVM 19

Building

Usage

Basic transpilation

Building and running the output on ARM (macOS Apple Silicon)

Comparing optimization levels

Benchmarking

Running Tests

Project Structure

How It Works

Contributing

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TransISA

Features

Supported x86 Instructions

Prerequisites

Installing LLVM 19

Building

Usage

Basic transpilation

Building and running the output on ARM (macOS Apple Silicon)

Comparing optimization levels

Benchmarking

Running Tests

Project Structure

How It Works

Contributing

Citation

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages