byte_array’s Substack

byte_array’s Substack

Home
Archive
About
Computer Science behind Lakehouse Table Formats #1 :RUM Conjecture
Exploring how the different format choices affect Read, Update, Storage overheads
Mar 13, 2025 • byte[array]
Doing range gets on cloud storage for fun and profit
The thing that stands between good and great cloud read performance
Nov 28, 2023 • byte[array]
Corrections in data lakehouse table format comparisons
A live document to serve as a point of reference for corrections for inaccuracies for different comparative studies of Hudi, Delta Lake, or…
Apr 20, 2022 • byte[array]
Reliable ingestion from AWS S3 using Hudi
In this post we will talk about a new deltastreamer source which reliably and efficiently processes new data files as they arrive in AWS S3
Sep 2, 2021 • byte[array]
Apache Hudi — The Streaming Data Lake Platform
This blog is a repost of the original blog here
Jul 27, 2021 • byte[array]
Streaming Responsibly Into the Data Lake
How Apache Hudi maintains optimum sized files
Mar 15, 2021 • byte[array]
Optimize Data Lake layout using Clustering in Apache Hudi
This blog is a repost of this Hudi blog on medium.
Jan 28, 2021 • byte[array]
Employing the right indexes for fast updates, deletes in Apache Hudi
This blog is a repost of this Hudi blog on medium.
Dec 19, 2020 • byte[array]
byte_array’s Substack
byte_array’s Substack
Substack for under-the-hood learnings on data systems and architecture.

byte_array’s Substack

AboutArchiveSitemap
© 2026 byte_array · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture