The universal translator for incompatible systems
A high-performance Rust engine that bridges data between incompatible systems using declarative YAML schemas. Transform between 30+ formats (JSON, Binary, CSV, XML, YAML, images, 3D models, audio, video) with zero-copy performance and pre-compiled execution. No code requiredβjust define translation schemas.
When two systems speak different languages, use different protocols, or understand data in fundamentally different ways, they cannot communicate. UDBE is the bridge that translates between these alien systemsβenabling data to flow seamlessly from one world to another, regardless of format, structure, or encoding.
In the real world, systems are built in isolation:
- Legacy mainframes speak fixed-width binary
- Modern APIs expect JSON
- Industrial PLCs output Modbus registers
- Cloud services require REST/JSON
- Embedded devices use compact binary protocols
- Enterprise systems use XML/SOAP
These systems are alien to each otherβthey cannot understand each other's data formats, structures, or protocols. Traditional integration requires:
- Writing custom parsers for each source format
- Writing custom serializers for each target format
- Implementing transformation logic in code
- Maintaining this code for every system combination
Result: Exponential complexity as systems multiply.
UDBE acts as a universal translator between any two systems. You define a translation schema that describes:
- How to read the source system's language (source contract)
- How to write the target system's language (target contract)
- How to translate between them (mapping rules)
The engine becomes the bridgeβit speaks both languages fluently and translates data in real-time with zero-copy performance.
Result: Linear complexityβone schema per translation, regardless of system count.
UDBE operates as a three-stage translation pipeline:
Alien System A β [Decoder] β [Translator] β [Encoder] β Alien System B
(Format X) (Read) (Transform) (Write) (Format Y)
Stage 1: Decode (Read Source Language)
- Format handler interprets the source system's data format
- Extracts values according to the source contract
- Converts to UDBE's internal type system
Stage 2: Translate (Transform Between Languages)
- StreamProcessor executes pre-compiled translation instructions
- Applies transformations (cast, compute, conditional, etc.)
- Operates on type-safe values in registers
Stage 3: Encode (Write Target Language)
- Format handler serializes values to target system's format
- Structures data according to target contract
- Outputs data the target system understands
Manifest (Translation Schema): The blueprint for translation. It defines:
- Source Contract: How to interpret System A's data
- Target Contract: How to structure data for System B
- Mapping Rules: How to translate between them
Compiler: Converts the schema into optimized translation instructions. Compiled once, executed many times.
StreamProcessor: The translation engine. Executes instructions with:
- Zero-copy decoding: Reads data directly from source
- Register-based translation: Efficient value manipulation
- Zero-copy encoding: Writes data directly to target
Format Handlers: Language interpreters for each system:
- Decoders (
StreamValueProvider): Understand source system's language - Encoders (
StreamWriter): Speak target system's language
The compiler generates atomic translation operations:
- Move: Direct field mapping (no transformation needed)
- Cast: Type conversion (translate between type systems)
- Compute: Arithmetic translation (convert units, calculate derived values)
- StringOp: Text manipulation (format strings, concatenate)
- Conditional: Logic translation (apply business rules)
- Write: Direct value assignment
Each instruction is a microsecond-fast atomic operation.
- π Universal Bridge: Connect any two systems, regardless of format incompatibility
- π£οΈ Language Interpreters: 30+ format handlers that understand different system languages
- π Translation Schemas: Define how to translate between systems in YAMLβno code required
- β‘ Real-Time Translation: Zero-copy, microsecond-fast translation for high-throughput systems
- π Bidirectional: Every system can both send and receiveβtrue universal communication
- π Streaming: Handle continuous data flows between systems in real-time
- π§© Composable: Build complex translations from simple parts
- π§ Rich Translation Logic: Unit conversion, type casting, calculations, conditional logic
- π― Type-Safe: Strong typing prevents translation errors
- π¦ Portable: Minimal dependencies, runs anywhere Rust runs
git clone <repository>
cd udl
cargo build --releaseTwo systems need to communicate, but one speaks JSON and the other speaks YAML. Build a bridge:
use udl::*;
// Define the translation schema between the two systems
let schema = r#"
source:
type: json # System A's language
fields:
- name: name
type: string
path: name
- name: age
type: i64
path: age
target:
type: yaml # System B's language
fields:
- name: full_name
type: string
path: full_name
- name: years_old
type: i64
path: years_old
mapping:
- source: name
target: full_name
- source: age
target: years_old
"#;
// Create the bridge
let manifest = Manifest::from_yaml(schema)?;
let mut processor = StreamProcessor::new(manifest)?;
// Connect System A (JSON) and System B (YAML)
let json_data = serde_json::json!({"name": "John", "age": 30});
let reader = JsonFormat::with_data(json_data);
let mut source = SingleChunkSource::new(Box::new(reader));
let mut writer = YamlFormat::new();
// Translate: System A β Bridge β System B
if let Some(output) = processor.process_chunk_and_emit(&mut source, &mut writer)? {
println!("{}", String::from_utf8_lossy(&output));
}The bridge translates data from System A's format to System B's format automatically.
UDBE uses an internal type system as the universal language between alien systems:
System A Language β Universal Types β System B Language
(Binary) (i64, f64, str) (JSON)
All systems translate through this common type system:
- Integers: i8, i16, i32, i64, u8, u16, u32, u64
- Floats: f32, f64
- Boolean: bool
- String: String
- Binary: Vec
This universal language enables any system to communicate with any other system.
Each format handler is a language interpreter:
- Decoders understand one system's language (read its data format)
- Encoders speak another system's language (write in its format)
- Core translator operates in the universal language (format-agnostic)
This architecture means:
- Adding support for a new system = add one interpreter
- Systems can be combined in any way (NΓM combinations from N+M interpreters)
- The core translator never needs to know system specifics
The schema is a translation dictionary between two systems:
# "System A says 'sensor_id' at offset 0, System B calls it 'device.id'"
source: sensor_id β target: device.id
# "System A uses u16, System B needs string - translate it"
source: sensor_id (u16) β cast β target: device.id (string)
# "System A has temperature in Fahrenheit, System B wants Celsius - convert it"
source: temp_f (f32) β compute: (temp_f - 32) * 5/9 β target: temp_c (f32)The schema is:
- Declarative: Describes what translation should happen, not how
- Version-controllable: Track translation changes over time
- Shareable: Same translation works across languages/environments
- Testable: Validate translations independently
The three-stage pipeline ensures clean separation:
- Decode: System A's language β Universal types
- Translate: Universal types β Universal types (with transformations)
- Encode: Universal types β System B's language
Each stage is independent:
- Change System A? Only update the decoder
- Change System B? Only update the encoder
- Change translation logic? Only update the schema
Format handlers access data via pointers and offsets, not by copying:
- Binary formats: Direct memory access at specified offsets
- Structured formats: Parse on-demand, access via paths
- No intermediate buffers: Data flows directly from source to target
The processor uses a register-based model:
- Registers: HashMap for field values (fast lookups)
- No heap allocations: In the hot path
- Pre-allocated: Registers cleared and reused per chunk
Schemas are compiled once into instruction sequences:
- No interpretation: Instructions are direct operations
- Optimized order: Compiler can reorder for efficiency
- Type-checked: Invalid operations caught at compile time
Designed for continuous data processing:
- Chunk-based: Process data in chunks, not all at once
- Windowing: Support for sliding windows over streams
- Backpressure: Can handle high-throughput streams
- Memory efficient: Only current chunk in memory
Typical performance characteristics:
- Schema compilation: < 1ms for typical schemas
- Per-record transformation: Microseconds (depends on complexity)
- Throughput: Millions of records/second for simple transformations
- Memory: Constant per chunk, not per total data size
UDBE supports 30+ formats across multiple categories. Each format can act as both source (reader) and target (writer), enabling true "everything to everything" transformations.
- JSON: Full support with array indexing (e.g.,
items.0.name) - YAML: Nested structures, sequences, mappings
- TOML: Configuration file format
- XML: Element and attribute access
- CSV: Column-based access with header support
- INI: Section-based configuration
- MessagePack: Compact binary format
- CBOR: Concise Binary Object Representation
- BSON: Binary JSON variant
- Binary: Custom offset/size parsing with endianness control (Little/Big)
- Supports all primitive types at arbitrary offsets
- PNG: Read/write with metadata extraction
- JPEG: Read/write with dimension extraction
- GIF: Animated GIF support
- BMP: Windows bitmap format
- WebP: Modern image format
- TIFF: Tagged Image File Format
- ICO: Icon format
- OBJ: Wavefront OBJ format
- STL: Stereolithography format
- PLY: Polygon File Format
- glTF: GL Transmission Format
- STEP: ISO 10303 CAD format
- WAV: Waveform audio format
- MP3: MPEG audio (read-only)
- MP4: MPEG-4 video container (metadata extraction)
- ZIP: Compressed archive format
- TAR: Tape archive format
- DXF: AutoCAD Drawing Exchange Format
- TTF/OTF: TrueType/OpenType fonts (read-only)
- Gerber: PCB manufacturing format
- SVG: Scalable Vector Graphics (composed on XML)
Some formats build on others:
- SVG uses XML internally for parsing/writing
- This demonstrates the composable architecture
Adding new formats is straightforwardβimplement StreamValueProvider and StreamWriter traits.
Direct field-to-field copying with optional transformations:
mapping:
- source: input_field
target: output_field
# Optional transform blockConvert between types explicitly:
transform:
cast: i64 # Convert to i64 (from string, float, etc.)Supported casts: All numeric types, string, bool. Conversions are validated at execution time.
Perform calculations on numeric fields:
transform:
compute:
operator: "+" # +, -, *, /
operands: ["field1", "field2", "literal_value"]Operators:
- Arithmetic:
+,-,*,/ - Comparison:
>,<,>=,<=,==,!=
Operands can be:
- Field names (from source or intermediate registers)
- Literal values (numbers as strings)
- Results from previous transformations
Manipulate string data:
transform:
string_op:
operator: concat # concat, upper, lower, trim
operands: ["str1", "str2", ...]Operations:
- concat: Join multiple strings
- upper: Convert to uppercase
- lower: Convert to lowercase
- trim: Remove whitespace
Implement if-then-else logic:
transform:
conditional:
condition:
operator: ">" # Comparison operator
operands: ["field", "threshold"]
then: "value_if_true"
else: "value_if_false" # OptionalThe else branch can reference other fields or be omitted (results in null/empty).
Transformations can reference intermediate results:
# First transformation
- source: raw_value
target: intermediate_value
transform:
compute:
operator: "*"
operands: ["raw_value", "2"]
# Second transformation uses intermediate
- source: intermediate_value
target: final_value
transform:
conditional:
condition:
operator: ">"
operands: ["intermediate_value", "100"]
then: "high"
else: "low"Access nested data structures:
# JSON/YAML nested paths
path: user.address.city
# Array indexing
path: items.0.name # First element
path: items.1.price # Second element
# XML paths
path: root.child.@attribute # Element with attribute
path: element.text() # Text contentFor binary formats, specify exact memory locations:
fields:
- name: sensor_id
type: u16
offset: 0 # Byte offset
size: 2 # Size in bytes
endianness: big # big or little- Zero-Copy Parsing: Data accessed via pointers/offsets, not copied
- Stack-Based Registers: Efficient memory management during execution
- Pre-Compiled Instructions: Schema compiled once, executed many times
- Streaming Support: Process data incrementally without loading everything into memory
- No Heap Allocations: Hot path avoids allocations for maximum speed
The Problem: Factory floor PLCs output Modbus binary registers. Cloud platforms expect JSON REST APIs. They cannot communicate.
The Bridge: UDBE translates Modbus binary β JSON in real-time:
- Decodes Modbus register layouts (offset, size, endianness)
- Translates to universal types
- Encodes to JSON API structure
- Handles unit conversions, scaling, and data validation
Result: Factory data flows directly to cloud dashboards without custom integration code.
The Problem: Mainframes output fixed-width EBCDIC files. Microservices consume JSON. Decades of separation.
The Bridge: UDBE translates fixed-width β JSON:
- Decodes fixed-width layouts (column positions, lengths)
- Handles character encoding (EBCDIC β UTF-8)
- Translates field names and structures
- Encodes to modern JSON APIs
Result: Legacy systems integrate with modern architectures without rewriting.
The Problem: IoT sensors output compact binary protocols. Enterprise systems use XML/SOAP. Different worlds.
The Bridge: UDBE translates binary β XML:
- Decodes binary protocol frames
- Extracts sensor readings (temperature, pressure, etc.)
- Applies transformations (unit conversion, calibration)
- Encodes to enterprise XML schemas
Result: Edge devices communicate with enterprise systems seamlessly.
The Problem: Legacy databases export in proprietary formats. Modern data warehouses expect Parquet/Arrow. Incompatible.
The Bridge: UDBE translates any format β Parquet:
- Decodes source database format
- Transforms schema and data types
- Applies data quality rules
- Encodes to columnar formats
Result: Database migrations without custom ETL pipelines.
The Problem: Media files contain metadata in format-specific structures. Analytics platforms need structured JSON. Mismatch.
The Bridge: UDBE extracts and translates:
- Decodes format-specific metadata (EXIF, ID3, etc.)
- Normalizes to common structure
- Encodes to analytics-ready JSON
Result: Media metadata flows into analytics without format-specific parsers.
The Problem: API v1 uses XML, API v2 uses JSON with different structure. Clients need both. Duplication.
The Bridge: UDBE translates between versions:
- Decodes v1 XML structure
- Transforms field names and types
- Encodes to v2 JSON structure
Result: Single implementation, multiple API versions through translation.
The core principle: The translator doesn't need to know the languages it's translating between.
- Core engine operates in universal types
- Format handlers are isolated interpreters
- Adding a new system = add one interpreter, not rewrite the translator
- Enables NΓM system combinations from N+M interpreters
The schema is a contract between two alien systems:
- Defines how System A's language maps to System B's language
- Version-controllable: Track how translations evolve
- Shareable: Same translation works everywhere
- Testable: Validate translations without running systems
Performance is critical when bridging high-throughput systems:
- Decode: Read data directly from source (no copying)
- Translate: Operate on references (minimal copying)
- Encode: Write directly to target (no intermediate buffers)
This enables real-time translation of high-volume data streams.
Systems often communicate continuously, not in batches:
- Process data incrementally (chunk by chunk)
- Constant memory usage (doesn't grow with data size)
- Handle infinite streams (real-time systems)
- Backpressure support (handle rate mismatches)
Prevent translation errors through strong typing:
- Type-checked field mappings
- Safe type conversions (explicit casts)
- Runtime validation of data
- Clear error messages when translation fails
Build complex translations from simple parts:
- Formats can compose (SVG builds on XML)
- Transformations can chain (output of one becomes input of next)
- Schemas can reference other schemas
- Enables complex multi-hop translations
Add new systems without changing core:
- Implement
StreamValueProviderto add a decoder - Implement
StreamWriterto add an encoder - Core translator remains unchanged
- Enables community-contributed format handlers
A translation schema is a contract that defines how to translate between two systems. It has three parts:
Defines how to decode the source system's language:
source:
type: binary # System A's language (format)
fields:
- name: sensor_id
type: u16
offset: 0 # Where in System A's data
size: 2
endianness: big # How System A encodes numbersDifferent systems require different decoding:
- Binary systems: Use
offset,size,endianness - Structured systems (JSON/XML/YAML): Use
path(e.g.,user.address.city) - Tabular systems (CSV): Use
columnname - Array systems: Use indexed paths (e.g.,
items.0.name)
Defines how to encode data for the target system:
target:
type: json # System B's language (format)
fields:
- name: device_id
type: string
path: device.id # Where in System B's structureTarget fields define:
- Output structure: How System B expects data organized
- Field types: What types System B understands
- Field locations: Where each value goes in System B's format
Connects System A's fields to System B's fields with translation logic:
mapping:
# Direct translation (same meaning, different names)
- source: sensor_id
target: device_id
transform:
cast: string # System A uses u16, System B needs string
# Unit conversion (same concept, different units)
- source: temp_fahrenheit
target: temp_celsius
transform:
compute:
operator: "-"
operands: ["temp_fahrenheit", "32"]
# Then multiply by 5/9 (chained via intermediate field)
# Structure transformation (same data, different organization)
- source: first_name
target: full_name
transform:
string_op:
operator: concat
operands: ["first_name", "last_name"]
# Business logic translation (different rules)
- source: raw_score
target: grade
transform:
conditional:
condition:
operator: ">="
operands: ["raw_score", "90"]
then: "A"
else: "B"Translation execution:
- Mappings execute sequentially
- Each mapping can reference results from previous ones
- Use intermediate fields for complex multi-step translations
- All translations happen in the universal type system
When systems can't talk, UDBE translates. π
UDBE enables communication between systems that were never designed to work together. Legacy mainframes, modern APIs, industrial PLCs, cloud services, embedded devicesβall can now exchange data through universal translation schemas.