Universal Data Bridge Engine (UDBE)

The universal translator for incompatible systems

A high-performance Rust engine that bridges data between incompatible systems using declarative YAML schemas. Transform between 30+ formats (JSON, Binary, CSV, XML, YAML, images, 3D models, audio, video) with zero-copy performance and pre-compiled execution. No code required—just define translation schemas.

When two systems speak different languages, use different protocols, or understand data in fundamentally different ways, they cannot communicate. UDBE is the bridge that translates between these alien systems—enabling data to flow seamlessly from one world to another, regardless of format, structure, or encoding.

🌉 The Problem: Alien Systems

In the real world, systems are built in isolation:

Legacy mainframes speak fixed-width binary
Modern APIs expect JSON
Industrial PLCs output Modbus registers
Cloud services require REST/JSON
Embedded devices use compact binary protocols
Enterprise systems use XML/SOAP

These systems are alien to each other—they cannot understand each other's data formats, structures, or protocols. Traditional integration requires:

Writing custom parsers for each source format
Writing custom serializers for each target format
Implementing transformation logic in code
Maintaining this code for every system combination

Result: Exponential complexity as systems multiply.

🎯 The Solution: Universal Translation

UDBE acts as a universal translator between any two systems. You define a translation schema that describes:

How to read the source system's language (source contract)
How to write the target system's language (target contract)
How to translate between them (mapping rules)

The engine becomes the bridge—it speaks both languages fluently and translates data in real-time with zero-copy performance.

Result: Linear complexity—one schema per translation, regardless of system count.

🏗️ How the Bridge Works

The Translation Process

UDBE operates as a three-stage translation pipeline:

Alien System A → [Decoder] → [Translator] → [Encoder] → Alien System B
   (Format X)      (Read)      (Transform)    (Write)      (Format Y)

Stage 1: Decode (Read Source Language)

Format handler interprets the source system's data format
Extracts values according to the source contract
Converts to UDBE's internal type system

Stage 2: Translate (Transform Between Languages)

StreamProcessor executes pre-compiled translation instructions
Applies transformations (cast, compute, conditional, etc.)
Operates on type-safe values in registers

Stage 3: Encode (Write Target Language)

Format handler serializes values to target system's format
Structures data according to target contract
Outputs data the target system understands

Core Components

Manifest (Translation Schema): The blueprint for translation. It defines:

Source Contract: How to interpret System A's data
Target Contract: How to structure data for System B
Mapping Rules: How to translate between them

Compiler: Converts the schema into optimized translation instructions. Compiled once, executed many times.

StreamProcessor: The translation engine. Executes instructions with:

Zero-copy decoding: Reads data directly from source
Register-based translation: Efficient value manipulation
Zero-copy encoding: Writes data directly to target

Format Handlers: Language interpreters for each system:

Decoders (StreamValueProvider): Understand source system's language
Encoders (StreamWriter): Speak target system's language

Translation Instructions

The compiler generates atomic translation operations:

Move: Direct field mapping (no transformation needed)
Cast: Type conversion (translate between type systems)
Compute: Arithmetic translation (convert units, calculate derived values)
StringOp: Text manipulation (format strings, concatenate)
Conditional: Logic translation (apply business rules)
Write: Direct value assignment

Each instruction is a microsecond-fast atomic operation.

✨ Key Capabilities

🌉 Universal Bridge: Connect any two systems, regardless of format incompatibility
🗣️ Language Interpreters: 30+ format handlers that understand different system languages
📋 Translation Schemas: Define how to translate between systems in YAML—no code required
⚡ Real-Time Translation: Zero-copy, microsecond-fast translation for high-throughput systems
🔄 Bidirectional: Every system can both send and receive—true universal communication
🌊 Streaming: Handle continuous data flows between systems in real-time
🧩 Composable: Build complex translations from simple parts
🔧 Rich Translation Logic: Unit conversion, type casting, calculations, conditional logic
🎯 Type-Safe: Strong typing prevents translation errors
📦 Portable: Minimal dependencies, runs anywhere Rust runs

🚀 Quick Start: Building Your First Bridge

Installation

git clone <repository>
cd udl
cargo build --release

Example: Bridging JSON and YAML Systems

Two systems need to communicate, but one speaks JSON and the other speaks YAML. Build a bridge:

use udl::*;

// Define the translation schema between the two systems
let schema = r#"
source:
  type: json  # System A's language
  fields:
    - name: name
      type: string
      path: name
    - name: age
      type: i64
      path: age

target:
  type: yaml  # System B's language
  fields:
    - name: full_name
      type: string
      path: full_name
    - name: years_old
      type: i64
      path: years_old

mapping:
  - source: name
    target: full_name
  - source: age
    target: years_old
"#;

// Create the bridge
let manifest = Manifest::from_yaml(schema)?;
let mut processor = StreamProcessor::new(manifest)?;

// Connect System A (JSON) and System B (YAML)
let json_data = serde_json::json!({"name": "John", "age": 30});
let reader = JsonFormat::with_data(json_data);
let mut source = SingleChunkSource::new(Box::new(reader));
let mut writer = YamlFormat::new();

// Translate: System A → Bridge → System B
if let Some(output) = processor.process_chunk_and_emit(&mut source, &mut writer)? {
    println!("{}", String::from_utf8_lossy(&output));
}

The bridge translates data from System A's format to System B's format automatically.

💡 Core Concepts: Universal Translation

The Universal Language

UDBE uses an internal type system as the universal language between alien systems:

System A Language → Universal Types → System B Language
  (Binary)          (i64, f64, str)      (JSON)

All systems translate through this common type system:

Integers: i8, i16, i32, i64, u8, u16, u32, u64
Floats: f32, f64
Boolean: bool
String: String
Binary: Vec

This universal language enables any system to communicate with any other system.

Format Interpreters

Each format handler is a language interpreter:

Decoders understand one system's language (read its data format)
Encoders speak another system's language (write in its format)
Core translator operates in the universal language (format-agnostic)

This architecture means:

Adding support for a new system = add one interpreter
Systems can be combined in any way (N×M combinations from N+M interpreters)
The core translator never needs to know system specifics

Schema as Translation Dictionary

The schema is a translation dictionary between two systems:

# "System A says 'sensor_id' at offset 0, System B calls it 'device.id'"
source: sensor_id → target: device.id

# "System A uses u16, System B needs string - translate it"
source: sensor_id (u16) → cast → target: device.id (string)

# "System A has temperature in Fahrenheit, System B wants Celsius - convert it"
source: temp_f (f32) → compute: (temp_f - 32) * 5/9 → target: temp_c (f32)

The schema is:

Declarative: Describes what translation should happen, not how
Version-controllable: Track translation changes over time
Shareable: Same translation works across languages/environments
Testable: Validate translations independently

Translation Pipeline

The three-stage pipeline ensures clean separation:

Decode: System A's language → Universal types
Translate: Universal types → Universal types (with transformations)
Encode: Universal types → System B's language

Each stage is independent:

Change System A? Only update the decoder
Change System B? Only update the encoder
Change translation logic? Only update the schema

⚡ Performance Characteristics

Zero-Copy Parsing

Format handlers access data via pointers and offsets, not by copying:

Binary formats: Direct memory access at specified offsets
Structured formats: Parse on-demand, access via paths
No intermediate buffers: Data flows directly from source to target

Stack-Based Execution

The processor uses a register-based model:

Registers: HashMap for field values (fast lookups)
No heap allocations: In the hot path
Pre-allocated: Registers cleared and reused per chunk

Pre-Compiled Instructions

Schemas are compiled once into instruction sequences:

No interpretation: Instructions are direct operations
Optimized order: Compiler can reorder for efficiency
Type-checked: Invalid operations caught at compile time

Streaming Architecture

Designed for continuous data processing:

Chunk-based: Process data in chunks, not all at once
Windowing: Support for sliding windows over streams
Backpressure: Can handle high-throughput streams
Memory efficient: Only current chunk in memory

Benchmarks

Typical performance characteristics:

Schema compilation: < 1ms for typical schemas
Per-record transformation: Microseconds (depends on complexity)
Throughput: Millions of records/second for simple transformations
Memory: Constant per chunk, not per total data size

🎯 Supported Formats

UDBE supports 30+ formats across multiple categories. Each format can act as both source (reader) and target (writer), enabling true "everything to everything" transformations.

Structured Data Formats

JSON: Full support with array indexing (e.g., items.0.name)
YAML: Nested structures, sequences, mappings
TOML: Configuration file format
XML: Element and attribute access
CSV: Column-based access with header support
INI: Section-based configuration

Binary Serialization Formats

MessagePack: Compact binary format
CBOR: Concise Binary Object Representation
BSON: Binary JSON variant

Binary Raw Formats

Binary: Custom offset/size parsing with endianness control (Little/Big)
Supports all primitive types at arbitrary offsets

Image Formats

PNG: Read/write with metadata extraction
JPEG: Read/write with dimension extraction
GIF: Animated GIF support
BMP: Windows bitmap format
WebP: Modern image format
TIFF: Tagged Image File Format
ICO: Icon format

3D Model Formats

OBJ: Wavefront OBJ format
STL: Stereolithography format
PLY: Polygon File Format
glTF: GL Transmission Format
STEP: ISO 10303 CAD format

Audio/Video Formats

WAV: Waveform audio format
MP3: MPEG audio (read-only)
MP4: MPEG-4 video container (metadata extraction)

Archive Formats

ZIP: Compressed archive format
TAR: Tape archive format

Specialized Formats

DXF: AutoCAD Drawing Exchange Format
TTF/OTF: TrueType/OpenType fonts (read-only)
Gerber: PCB manufacturing format
SVG: Scalable Vector Graphics (composed on XML)

Format Composition

Some formats build on others:

SVG uses XML internally for parsing/writing
This demonstrates the composable architecture

Adding new formats is straightforward—implement StreamValueProvider and StreamWriter traits.

🔧 Transformation Operations

Field Mapping

Direct field-to-field copying with optional transformations:

mapping:
  - source: input_field
    target: output_field
    # Optional transform block

Type Casting

Convert between types explicitly:

transform:
  cast: i64  # Convert to i64 (from string, float, etc.)

Supported casts: All numeric types, string, bool. Conversions are validated at execution time.

Arithmetic Operations

Perform calculations on numeric fields:

transform:
  compute:
    operator: "+"  # +, -, *, /
    operands: ["field1", "field2", "literal_value"]

Operators:

Arithmetic: +, -, *, /
Comparison: >, <, >=, <=, ==, !=

Operands can be:

Field names (from source or intermediate registers)
Literal values (numbers as strings)
Results from previous transformations

String Operations

Manipulate string data:

transform:
  string_op:
    operator: concat  # concat, upper, lower, trim
    operands: ["str1", "str2", ...]

Operations:

concat: Join multiple strings
upper: Convert to uppercase
lower: Convert to lowercase
trim: Remove whitespace

Conditional Logic

Implement if-then-else logic:

transform:
  conditional:
    condition:
      operator: ">"  # Comparison operator
      operands: ["field", "threshold"]
    then: "value_if_true"
    else: "value_if_false"  # Optional

The else branch can reference other fields or be omitted (results in null/empty).

Chained Transformations

Transformations can reference intermediate results:

# First transformation
- source: raw_value
  target: intermediate_value
  transform:
    compute:
      operator: "*"
      operands: ["raw_value", "2"]

# Second transformation uses intermediate
- source: intermediate_value
  target: final_value
  transform:
    conditional:
      condition:
        operator: ">"
        operands: ["intermediate_value", "100"]
      then: "high"
      else: "low"

Path Navigation

Access nested data structures:

# JSON/YAML nested paths
path: user.address.city

# Array indexing
path: items.0.name  # First element
path: items.1.price # Second element

# XML paths
path: root.child.@attribute  # Element with attribute
path: element.text()         # Text content

Binary Field Access

For binary formats, specify exact memory locations:

fields:
  - name: sensor_id
    type: u16
    offset: 0        # Byte offset
    size: 2          # Size in bytes
    endianness: big  # big or little

⚡ Performance Characteristics

Zero-Copy Parsing: Data accessed via pointers/offsets, not copied
Stack-Based Registers: Efficient memory management during execution
Pre-Compiled Instructions: Schema compiled once, executed many times
Streaming Support: Process data incrementally without loading everything into memory
No Heap Allocations: Hot path avoids allocations for maximum speed

🌍 Real-World Scenarios: Bridging Alien Systems

Industrial Systems ↔ Cloud Platforms

The Problem: Factory floor PLCs output Modbus binary registers. Cloud platforms expect JSON REST APIs. They cannot communicate.

The Bridge: UDBE translates Modbus binary → JSON in real-time:

Decodes Modbus register layouts (offset, size, endianness)
Translates to universal types
Encodes to JSON API structure
Handles unit conversions, scaling, and data validation

Result: Factory data flows directly to cloud dashboards without custom integration code.

Legacy Mainframes ↔ Modern Microservices

The Problem: Mainframes output fixed-width EBCDIC files. Microservices consume JSON. Decades of separation.

The Bridge: UDBE translates fixed-width → JSON:

Decodes fixed-width layouts (column positions, lengths)
Handles character encoding (EBCDIC → UTF-8)
Translates field names and structures
Encodes to modern JSON APIs

Result: Legacy systems integrate with modern architectures without rewriting.

Embedded Devices ↔ Enterprise Systems

The Problem: IoT sensors output compact binary protocols. Enterprise systems use XML/SOAP. Different worlds.

The Bridge: UDBE translates binary → XML:

Decodes binary protocol frames
Extracts sensor readings (temperature, pressure, etc.)
Applies transformations (unit conversion, calibration)
Encodes to enterprise XML schemas

Result: Edge devices communicate with enterprise systems seamlessly.

Database Exports ↔ Data Warehouses

The Problem: Legacy databases export in proprietary formats. Modern data warehouses expect Parquet/Arrow. Incompatible.

The Bridge: UDBE translates any format → Parquet:

Decodes source database format
Transforms schema and data types
Applies data quality rules
Encodes to columnar formats

Result: Database migrations without custom ETL pipelines.

Media Systems ↔ Analytics Platforms

The Problem: Media files contain metadata in format-specific structures. Analytics platforms need structured JSON. Mismatch.

The Bridge: UDBE extracts and translates:

Decodes format-specific metadata (EXIF, ID3, etc.)
Normalizes to common structure
Encodes to analytics-ready JSON

Result: Media metadata flows into analytics without format-specific parsers.

API Version Translation

The Problem: API v1 uses XML, API v2 uses JSON with different structure. Clients need both. Duplication.

The Bridge: UDBE translates between versions:

Decodes v1 XML structure
Transforms field names and types
Encodes to v2 JSON structure

Result: Single implementation, multiple API versions through translation.

🏛️ Design Principles: Building the Universal Bridge

1. Universal Translation, Not Format-Specific Code

The core principle: The translator doesn't need to know the languages it's translating between.

Core engine operates in universal types
Format handlers are isolated interpreters
Adding a new system = add one interpreter, not rewrite the translator
Enables N×M system combinations from N+M interpreters

2. Schema as Translation Contract

The schema is a contract between two alien systems:

Defines how System A's language maps to System B's language
Version-controllable: Track how translations evolve
Shareable: Same translation works everywhere
Testable: Validate translations without running systems

3. Zero-Copy Translation

Performance is critical when bridging high-throughput systems:

Decode: Read data directly from source (no copying)
Translate: Operate on references (minimal copying)
Encode: Write directly to target (no intermediate buffers)

This enables real-time translation of high-volume data streams.

4. Streaming by Design

Systems often communicate continuously, not in batches:

Process data incrementally (chunk by chunk)
Constant memory usage (doesn't grow with data size)
Handle infinite streams (real-time systems)
Backpressure support (handle rate mismatches)

5. Type-Safe Translation

Prevent translation errors through strong typing:

Type-checked field mappings
Safe type conversions (explicit casts)
Runtime validation of data
Clear error messages when translation fails

6. Compositional Architecture

Build complex translations from simple parts:

Formats can compose (SVG builds on XML)
Transformations can chain (output of one becomes input of next)
Schemas can reference other schemas
Enables complex multi-hop translations

7. Extensibility Without Modification

Add new systems without changing core:

Implement StreamValueProvider to add a decoder
Implement StreamWriter to add an encoder
Core translator remains unchanged
Enables community-contributed format handlers

📖 Translation Schema Reference

A translation schema is a contract that defines how to translate between two systems. It has three parts:

1. Source Contract: Understanding System A

Defines how to decode the source system's language:

source:
  type: binary  # System A's language (format)
  fields:
    - name: sensor_id
      type: u16
      offset: 0        # Where in System A's data
      size: 2
      endianness: big   # How System A encodes numbers

Different systems require different decoding:

Binary systems: Use offset, size, endianness
Structured systems (JSON/XML/YAML): Use path (e.g., user.address.city)
Tabular systems (CSV): Use column name
Array systems: Use indexed paths (e.g., items.0.name)

2. Target Contract: Speaking System B's Language

Defines how to encode data for the target system:

target:
  type: json  # System B's language (format)
  fields:
    - name: device_id
      type: string
      path: device.id  # Where in System B's structure

Target fields define:

Output structure: How System B expects data organized
Field types: What types System B understands
Field locations: Where each value goes in System B's format

3. Mapping Rules: The Translation Dictionary

Connects System A's fields to System B's fields with translation logic:

mapping:
  # Direct translation (same meaning, different names)
  - source: sensor_id
    target: device_id
    transform:
      cast: string  # System A uses u16, System B needs string

  # Unit conversion (same concept, different units)
  - source: temp_fahrenheit
    target: temp_celsius
    transform:
      compute:
        operator: "-"
        operands: ["temp_fahrenheit", "32"]
      # Then multiply by 5/9 (chained via intermediate field)

  # Structure transformation (same data, different organization)
  - source: first_name
    target: full_name
    transform:
      string_op:
        operator: concat
        operands: ["first_name", "last_name"]

  # Business logic translation (different rules)
  - source: raw_score
    target: grade
    transform:
      conditional:
        condition:
          operator: ">="
          operands: ["raw_score", "90"]
        then: "A"
        else: "B"

Translation execution:

Mappings execute sequentially
Each mapping can reference results from previous ones
Use intermediate fields for complex multi-step translations
All translations happen in the universal type system

When systems can't talk, UDBE translates. 🌉

UDBE enables communication between systems that were never designed to work together. Legacy mainframes, modern APIs, industrial PLCs, cloud services, embedded devices—all can now exchange data through universal translation schemas.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
examples		examples
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

h4kbas/udbe

Folders and files

Latest commit

History

Repository files navigation

Universal Data Bridge Engine (UDBE)

🌉 The Problem: Alien Systems

🎯 The Solution: Universal Translation

🏗️ How the Bridge Works

The Translation Process

Core Components

Translation Instructions

✨ Key Capabilities

🚀 Quick Start: Building Your First Bridge

Installation

Example: Bridging JSON and YAML Systems

💡 Core Concepts: Universal Translation

The Universal Language

Format Interpreters

Schema as Translation Dictionary

Translation Pipeline

⚡ Performance Characteristics

Zero-Copy Parsing

Stack-Based Execution

Pre-Compiled Instructions

Streaming Architecture

Benchmarks

🎯 Supported Formats

Structured Data Formats

Binary Serialization Formats

Binary Raw Formats

Image Formats

3D Model Formats

Audio/Video Formats

Archive Formats

Specialized Formats

Format Composition

🔧 Transformation Operations

Field Mapping

Type Casting

Arithmetic Operations

String Operations

Conditional Logic

Chained Transformations

Path Navigation

Binary Field Access

⚡ Performance Characteristics

🌍 Real-World Scenarios: Bridging Alien Systems

Industrial Systems ↔ Cloud Platforms

Legacy Mainframes ↔ Modern Microservices

Embedded Devices ↔ Enterprise Systems

Database Exports ↔ Data Warehouses

Media Systems ↔ Analytics Platforms

API Version Translation

🏛️ Design Principles: Building the Universal Bridge

1. Universal Translation, Not Format-Specific Code

2. Schema as Translation Contract

3. Zero-Copy Translation

4. Streaming by Design

5. Type-Safe Translation

6. Compositional Architecture

7. Extensibility Without Modification

📖 Translation Schema Reference

1. Source Contract: Understanding System A

2. Target Contract: Speaking System B's Language

3. Mapping Rules: The Translation Dictionary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages