Skip to content
/ udbe Public

Universal Data Bridge Engine: A high-performance Rust engine that bridges data between incompatible systems using declarative YAML schemas.

Notifications You must be signed in to change notification settings

h4kbas/udbe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Universal Data Bridge Engine (UDBE)

The universal translator for incompatible systems

A high-performance Rust engine that bridges data between incompatible systems using declarative YAML schemas. Transform between 30+ formats (JSON, Binary, CSV, XML, YAML, images, 3D models, audio, video) with zero-copy performance and pre-compiled execution. No code requiredβ€”just define translation schemas.

When two systems speak different languages, use different protocols, or understand data in fundamentally different ways, they cannot communicate. UDBE is the bridge that translates between these alien systemsβ€”enabling data to flow seamlessly from one world to another, regardless of format, structure, or encoding.

πŸŒ‰ The Problem: Alien Systems

In the real world, systems are built in isolation:

  • Legacy mainframes speak fixed-width binary
  • Modern APIs expect JSON
  • Industrial PLCs output Modbus registers
  • Cloud services require REST/JSON
  • Embedded devices use compact binary protocols
  • Enterprise systems use XML/SOAP

These systems are alien to each otherβ€”they cannot understand each other's data formats, structures, or protocols. Traditional integration requires:

  • Writing custom parsers for each source format
  • Writing custom serializers for each target format
  • Implementing transformation logic in code
  • Maintaining this code for every system combination

Result: Exponential complexity as systems multiply.

🎯 The Solution: Universal Translation

UDBE acts as a universal translator between any two systems. You define a translation schema that describes:

  1. How to read the source system's language (source contract)
  2. How to write the target system's language (target contract)
  3. How to translate between them (mapping rules)

The engine becomes the bridgeβ€”it speaks both languages fluently and translates data in real-time with zero-copy performance.

Result: Linear complexityβ€”one schema per translation, regardless of system count.

πŸ—οΈ How the Bridge Works

The Translation Process

UDBE operates as a three-stage translation pipeline:

Alien System A β†’ [Decoder] β†’ [Translator] β†’ [Encoder] β†’ Alien System B
   (Format X)      (Read)      (Transform)    (Write)      (Format Y)

Stage 1: Decode (Read Source Language)

  • Format handler interprets the source system's data format
  • Extracts values according to the source contract
  • Converts to UDBE's internal type system

Stage 2: Translate (Transform Between Languages)

  • StreamProcessor executes pre-compiled translation instructions
  • Applies transformations (cast, compute, conditional, etc.)
  • Operates on type-safe values in registers

Stage 3: Encode (Write Target Language)

  • Format handler serializes values to target system's format
  • Structures data according to target contract
  • Outputs data the target system understands

Core Components

Manifest (Translation Schema): The blueprint for translation. It defines:

  • Source Contract: How to interpret System A's data
  • Target Contract: How to structure data for System B
  • Mapping Rules: How to translate between them

Compiler: Converts the schema into optimized translation instructions. Compiled once, executed many times.

StreamProcessor: The translation engine. Executes instructions with:

  • Zero-copy decoding: Reads data directly from source
  • Register-based translation: Efficient value manipulation
  • Zero-copy encoding: Writes data directly to target

Format Handlers: Language interpreters for each system:

  • Decoders (StreamValueProvider): Understand source system's language
  • Encoders (StreamWriter): Speak target system's language

Translation Instructions

The compiler generates atomic translation operations:

  • Move: Direct field mapping (no transformation needed)
  • Cast: Type conversion (translate between type systems)
  • Compute: Arithmetic translation (convert units, calculate derived values)
  • StringOp: Text manipulation (format strings, concatenate)
  • Conditional: Logic translation (apply business rules)
  • Write: Direct value assignment

Each instruction is a microsecond-fast atomic operation.

✨ Key Capabilities

  • πŸŒ‰ Universal Bridge: Connect any two systems, regardless of format incompatibility
  • πŸ—£οΈ Language Interpreters: 30+ format handlers that understand different system languages
  • πŸ“‹ Translation Schemas: Define how to translate between systems in YAMLβ€”no code required
  • ⚑ Real-Time Translation: Zero-copy, microsecond-fast translation for high-throughput systems
  • πŸ”„ Bidirectional: Every system can both send and receiveβ€”true universal communication
  • 🌊 Streaming: Handle continuous data flows between systems in real-time
  • 🧩 Composable: Build complex translations from simple parts
  • πŸ”§ Rich Translation Logic: Unit conversion, type casting, calculations, conditional logic
  • 🎯 Type-Safe: Strong typing prevents translation errors
  • πŸ“¦ Portable: Minimal dependencies, runs anywhere Rust runs

πŸš€ Quick Start: Building Your First Bridge

Installation

git clone <repository>
cd udl
cargo build --release

Example: Bridging JSON and YAML Systems

Two systems need to communicate, but one speaks JSON and the other speaks YAML. Build a bridge:

use udl::*;

// Define the translation schema between the two systems
let schema = r#"
source:
  type: json  # System A's language
  fields:
    - name: name
      type: string
      path: name
    - name: age
      type: i64
      path: age

target:
  type: yaml  # System B's language
  fields:
    - name: full_name
      type: string
      path: full_name
    - name: years_old
      type: i64
      path: years_old

mapping:
  - source: name
    target: full_name
  - source: age
    target: years_old
"#;

// Create the bridge
let manifest = Manifest::from_yaml(schema)?;
let mut processor = StreamProcessor::new(manifest)?;

// Connect System A (JSON) and System B (YAML)
let json_data = serde_json::json!({"name": "John", "age": 30});
let reader = JsonFormat::with_data(json_data);
let mut source = SingleChunkSource::new(Box::new(reader));
let mut writer = YamlFormat::new();

// Translate: System A β†’ Bridge β†’ System B
if let Some(output) = processor.process_chunk_and_emit(&mut source, &mut writer)? {
    println!("{}", String::from_utf8_lossy(&output));
}

The bridge translates data from System A's format to System B's format automatically.

πŸ’‘ Core Concepts: Universal Translation

The Universal Language

UDBE uses an internal type system as the universal language between alien systems:

System A Language β†’ Universal Types β†’ System B Language
  (Binary)          (i64, f64, str)      (JSON)

All systems translate through this common type system:

  • Integers: i8, i16, i32, i64, u8, u16, u32, u64
  • Floats: f32, f64
  • Boolean: bool
  • String: String
  • Binary: Vec

This universal language enables any system to communicate with any other system.

Format Interpreters

Each format handler is a language interpreter:

  • Decoders understand one system's language (read its data format)
  • Encoders speak another system's language (write in its format)
  • Core translator operates in the universal language (format-agnostic)

This architecture means:

  • Adding support for a new system = add one interpreter
  • Systems can be combined in any way (NΓ—M combinations from N+M interpreters)
  • The core translator never needs to know system specifics

Schema as Translation Dictionary

The schema is a translation dictionary between two systems:

# "System A says 'sensor_id' at offset 0, System B calls it 'device.id'"
source: sensor_id β†’ target: device.id

# "System A uses u16, System B needs string - translate it"
source: sensor_id (u16) β†’ cast β†’ target: device.id (string)

# "System A has temperature in Fahrenheit, System B wants Celsius - convert it"
source: temp_f (f32) β†’ compute: (temp_f - 32) * 5/9 β†’ target: temp_c (f32)

The schema is:

  • Declarative: Describes what translation should happen, not how
  • Version-controllable: Track translation changes over time
  • Shareable: Same translation works across languages/environments
  • Testable: Validate translations independently

Translation Pipeline

The three-stage pipeline ensures clean separation:

  1. Decode: System A's language β†’ Universal types
  2. Translate: Universal types β†’ Universal types (with transformations)
  3. Encode: Universal types β†’ System B's language

Each stage is independent:

  • Change System A? Only update the decoder
  • Change System B? Only update the encoder
  • Change translation logic? Only update the schema

⚑ Performance Characteristics

Zero-Copy Parsing

Format handlers access data via pointers and offsets, not by copying:

  • Binary formats: Direct memory access at specified offsets
  • Structured formats: Parse on-demand, access via paths
  • No intermediate buffers: Data flows directly from source to target

Stack-Based Execution

The processor uses a register-based model:

  • Registers: HashMap for field values (fast lookups)
  • No heap allocations: In the hot path
  • Pre-allocated: Registers cleared and reused per chunk

Pre-Compiled Instructions

Schemas are compiled once into instruction sequences:

  • No interpretation: Instructions are direct operations
  • Optimized order: Compiler can reorder for efficiency
  • Type-checked: Invalid operations caught at compile time

Streaming Architecture

Designed for continuous data processing:

  • Chunk-based: Process data in chunks, not all at once
  • Windowing: Support for sliding windows over streams
  • Backpressure: Can handle high-throughput streams
  • Memory efficient: Only current chunk in memory

Benchmarks

Typical performance characteristics:

  • Schema compilation: < 1ms for typical schemas
  • Per-record transformation: Microseconds (depends on complexity)
  • Throughput: Millions of records/second for simple transformations
  • Memory: Constant per chunk, not per total data size

🎯 Supported Formats

UDBE supports 30+ formats across multiple categories. Each format can act as both source (reader) and target (writer), enabling true "everything to everything" transformations.

Structured Data Formats

  • JSON: Full support with array indexing (e.g., items.0.name)
  • YAML: Nested structures, sequences, mappings
  • TOML: Configuration file format
  • XML: Element and attribute access
  • CSV: Column-based access with header support
  • INI: Section-based configuration

Binary Serialization Formats

  • MessagePack: Compact binary format
  • CBOR: Concise Binary Object Representation
  • BSON: Binary JSON variant

Binary Raw Formats

  • Binary: Custom offset/size parsing with endianness control (Little/Big)
  • Supports all primitive types at arbitrary offsets

Image Formats

  • PNG: Read/write with metadata extraction
  • JPEG: Read/write with dimension extraction
  • GIF: Animated GIF support
  • BMP: Windows bitmap format
  • WebP: Modern image format
  • TIFF: Tagged Image File Format
  • ICO: Icon format

3D Model Formats

  • OBJ: Wavefront OBJ format
  • STL: Stereolithography format
  • PLY: Polygon File Format
  • glTF: GL Transmission Format
  • STEP: ISO 10303 CAD format

Audio/Video Formats

  • WAV: Waveform audio format
  • MP3: MPEG audio (read-only)
  • MP4: MPEG-4 video container (metadata extraction)

Archive Formats

  • ZIP: Compressed archive format
  • TAR: Tape archive format

Specialized Formats

  • DXF: AutoCAD Drawing Exchange Format
  • TTF/OTF: TrueType/OpenType fonts (read-only)
  • Gerber: PCB manufacturing format
  • SVG: Scalable Vector Graphics (composed on XML)

Format Composition

Some formats build on others:

  • SVG uses XML internally for parsing/writing
  • This demonstrates the composable architecture

Adding new formats is straightforwardβ€”implement StreamValueProvider and StreamWriter traits.

πŸ”§ Transformation Operations

Field Mapping

Direct field-to-field copying with optional transformations:

mapping:
  - source: input_field
    target: output_field
    # Optional transform block

Type Casting

Convert between types explicitly:

transform:
  cast: i64  # Convert to i64 (from string, float, etc.)

Supported casts: All numeric types, string, bool. Conversions are validated at execution time.

Arithmetic Operations

Perform calculations on numeric fields:

transform:
  compute:
    operator: "+"  # +, -, *, /
    operands: ["field1", "field2", "literal_value"]

Operators:

  • Arithmetic: +, -, *, /
  • Comparison: >, <, >=, <=, ==, !=

Operands can be:

  • Field names (from source or intermediate registers)
  • Literal values (numbers as strings)
  • Results from previous transformations

String Operations

Manipulate string data:

transform:
  string_op:
    operator: concat  # concat, upper, lower, trim
    operands: ["str1", "str2", ...]

Operations:

  • concat: Join multiple strings
  • upper: Convert to uppercase
  • lower: Convert to lowercase
  • trim: Remove whitespace

Conditional Logic

Implement if-then-else logic:

transform:
  conditional:
    condition:
      operator: ">"  # Comparison operator
      operands: ["field", "threshold"]
    then: "value_if_true"
    else: "value_if_false"  # Optional

The else branch can reference other fields or be omitted (results in null/empty).

Chained Transformations

Transformations can reference intermediate results:

# First transformation
- source: raw_value
  target: intermediate_value
  transform:
    compute:
      operator: "*"
      operands: ["raw_value", "2"]

# Second transformation uses intermediate
- source: intermediate_value
  target: final_value
  transform:
    conditional:
      condition:
        operator: ">"
        operands: ["intermediate_value", "100"]
      then: "high"
      else: "low"

Path Navigation

Access nested data structures:

# JSON/YAML nested paths
path: user.address.city

# Array indexing
path: items.0.name  # First element
path: items.1.price # Second element

# XML paths
path: root.child.@attribute  # Element with attribute
path: element.text()         # Text content

Binary Field Access

For binary formats, specify exact memory locations:

fields:
  - name: sensor_id
    type: u16
    offset: 0        # Byte offset
    size: 2          # Size in bytes
    endianness: big  # big or little

⚑ Performance Characteristics

  • Zero-Copy Parsing: Data accessed via pointers/offsets, not copied
  • Stack-Based Registers: Efficient memory management during execution
  • Pre-Compiled Instructions: Schema compiled once, executed many times
  • Streaming Support: Process data incrementally without loading everything into memory
  • No Heap Allocations: Hot path avoids allocations for maximum speed

🌍 Real-World Scenarios: Bridging Alien Systems

Industrial Systems ↔ Cloud Platforms

The Problem: Factory floor PLCs output Modbus binary registers. Cloud platforms expect JSON REST APIs. They cannot communicate.

The Bridge: UDBE translates Modbus binary β†’ JSON in real-time:

  • Decodes Modbus register layouts (offset, size, endianness)
  • Translates to universal types
  • Encodes to JSON API structure
  • Handles unit conversions, scaling, and data validation

Result: Factory data flows directly to cloud dashboards without custom integration code.

Legacy Mainframes ↔ Modern Microservices

The Problem: Mainframes output fixed-width EBCDIC files. Microservices consume JSON. Decades of separation.

The Bridge: UDBE translates fixed-width β†’ JSON:

  • Decodes fixed-width layouts (column positions, lengths)
  • Handles character encoding (EBCDIC β†’ UTF-8)
  • Translates field names and structures
  • Encodes to modern JSON APIs

Result: Legacy systems integrate with modern architectures without rewriting.

Embedded Devices ↔ Enterprise Systems

The Problem: IoT sensors output compact binary protocols. Enterprise systems use XML/SOAP. Different worlds.

The Bridge: UDBE translates binary β†’ XML:

  • Decodes binary protocol frames
  • Extracts sensor readings (temperature, pressure, etc.)
  • Applies transformations (unit conversion, calibration)
  • Encodes to enterprise XML schemas

Result: Edge devices communicate with enterprise systems seamlessly.

Database Exports ↔ Data Warehouses

The Problem: Legacy databases export in proprietary formats. Modern data warehouses expect Parquet/Arrow. Incompatible.

The Bridge: UDBE translates any format β†’ Parquet:

  • Decodes source database format
  • Transforms schema and data types
  • Applies data quality rules
  • Encodes to columnar formats

Result: Database migrations without custom ETL pipelines.

Media Systems ↔ Analytics Platforms

The Problem: Media files contain metadata in format-specific structures. Analytics platforms need structured JSON. Mismatch.

The Bridge: UDBE extracts and translates:

  • Decodes format-specific metadata (EXIF, ID3, etc.)
  • Normalizes to common structure
  • Encodes to analytics-ready JSON

Result: Media metadata flows into analytics without format-specific parsers.

API Version Translation

The Problem: API v1 uses XML, API v2 uses JSON with different structure. Clients need both. Duplication.

The Bridge: UDBE translates between versions:

  • Decodes v1 XML structure
  • Transforms field names and types
  • Encodes to v2 JSON structure

Result: Single implementation, multiple API versions through translation.

πŸ›οΈ Design Principles: Building the Universal Bridge

1. Universal Translation, Not Format-Specific Code

The core principle: The translator doesn't need to know the languages it's translating between.

  • Core engine operates in universal types
  • Format handlers are isolated interpreters
  • Adding a new system = add one interpreter, not rewrite the translator
  • Enables NΓ—M system combinations from N+M interpreters

2. Schema as Translation Contract

The schema is a contract between two alien systems:

  • Defines how System A's language maps to System B's language
  • Version-controllable: Track how translations evolve
  • Shareable: Same translation works everywhere
  • Testable: Validate translations without running systems

3. Zero-Copy Translation

Performance is critical when bridging high-throughput systems:

  • Decode: Read data directly from source (no copying)
  • Translate: Operate on references (minimal copying)
  • Encode: Write directly to target (no intermediate buffers)

This enables real-time translation of high-volume data streams.

4. Streaming by Design

Systems often communicate continuously, not in batches:

  • Process data incrementally (chunk by chunk)
  • Constant memory usage (doesn't grow with data size)
  • Handle infinite streams (real-time systems)
  • Backpressure support (handle rate mismatches)

5. Type-Safe Translation

Prevent translation errors through strong typing:

  • Type-checked field mappings
  • Safe type conversions (explicit casts)
  • Runtime validation of data
  • Clear error messages when translation fails

6. Compositional Architecture

Build complex translations from simple parts:

  • Formats can compose (SVG builds on XML)
  • Transformations can chain (output of one becomes input of next)
  • Schemas can reference other schemas
  • Enables complex multi-hop translations

7. Extensibility Without Modification

Add new systems without changing core:

  • Implement StreamValueProvider to add a decoder
  • Implement StreamWriter to add an encoder
  • Core translator remains unchanged
  • Enables community-contributed format handlers

πŸ“– Translation Schema Reference

A translation schema is a contract that defines how to translate between two systems. It has three parts:

1. Source Contract: Understanding System A

Defines how to decode the source system's language:

source:
  type: binary  # System A's language (format)
  fields:
    - name: sensor_id
      type: u16
      offset: 0        # Where in System A's data
      size: 2
      endianness: big   # How System A encodes numbers

Different systems require different decoding:

  • Binary systems: Use offset, size, endianness
  • Structured systems (JSON/XML/YAML): Use path (e.g., user.address.city)
  • Tabular systems (CSV): Use column name
  • Array systems: Use indexed paths (e.g., items.0.name)

2. Target Contract: Speaking System B's Language

Defines how to encode data for the target system:

target:
  type: json  # System B's language (format)
  fields:
    - name: device_id
      type: string
      path: device.id  # Where in System B's structure

Target fields define:

  • Output structure: How System B expects data organized
  • Field types: What types System B understands
  • Field locations: Where each value goes in System B's format

3. Mapping Rules: The Translation Dictionary

Connects System A's fields to System B's fields with translation logic:

mapping:
  # Direct translation (same meaning, different names)
  - source: sensor_id
    target: device_id
    transform:
      cast: string  # System A uses u16, System B needs string

  # Unit conversion (same concept, different units)
  - source: temp_fahrenheit
    target: temp_celsius
    transform:
      compute:
        operator: "-"
        operands: ["temp_fahrenheit", "32"]
      # Then multiply by 5/9 (chained via intermediate field)

  # Structure transformation (same data, different organization)
  - source: first_name
    target: full_name
    transform:
      string_op:
        operator: concat
        operands: ["first_name", "last_name"]

  # Business logic translation (different rules)
  - source: raw_score
    target: grade
    transform:
      conditional:
        condition:
          operator: ">="
          operands: ["raw_score", "90"]
        then: "A"
        else: "B"

Translation execution:

  • Mappings execute sequentially
  • Each mapping can reference results from previous ones
  • Use intermediate fields for complex multi-step translations
  • All translations happen in the universal type system

When systems can't talk, UDBE translates. πŸŒ‰

UDBE enables communication between systems that were never designed to work together. Legacy mainframes, modern APIs, industrial PLCs, cloud services, embedded devicesβ€”all can now exchange data through universal translation schemas.

About

Universal Data Bridge Engine: A high-performance Rust engine that bridges data between incompatible systems using declarative YAML schemas.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages