Skip to content

Feature Request: Support Clustering in hudi-rs #400

@CTTY

Description

@CTTY

Feature Description

Clustering is a core optimization feature in Apache Hudi, widely used to manage small files and improve query performance.

I’d love to see support for clustering in hudi-rs, which could handle this efficiently thanks to Rust’s performance. This would enable production-grade optimization workflows in native Rust pipelines.

Why this matters

  • Performance: Rust is expected to make compute-intensive operations like clustering much more performant
  • Ease of migration: Users who are using standalone clustering should be able to migrate to hudi-rs clustering easily
  • Ecosystem trend: Similar efforts are emerging in other formats, e.g. Iceberg compaction.

Suggested Scope

Initial support might include:

  • Reading clustering plans
  • Executing clustering as a standalone action
  • Supporting inline clustering in write paths (optional follow-up)

Prerequisites

  • hudi-rs doesn't have write support overall, it needs to be able to write and commit data before we implementing complicated table services

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions