byte_array’s Substack
Subscribe
Sign in
Home
Archive
About
Computer Science behind Lakehouse Table Formats #1 :RUM Conjecture
Exploring how the different format choices affect Read, Update, Storage overheads
Mar 13, 2025
•
byte[array]
1
Latest
Top
Doing range gets on cloud storage for fun and profit
The thing that stands between good and great cloud read performance
Nov 28, 2023
•
byte[array]
3
1
Corrections in data lakehouse table format comparisons
A live document to serve as a point of reference for corrections for inaccuracies for different comparative studies of Hudi, Delta Lake, or…
Apr 20, 2022
•
byte[array]
Reliable ingestion from AWS S3 using Hudi
In this post we will talk about a new deltastreamer source which reliably and efficiently processes new data files as they arrive in AWS S3
Sep 2, 2021
•
byte[array]
Apache Hudi — The Streaming Data Lake Platform
This blog is a repost of the original blog here
Jul 27, 2021
•
byte[array]
Streaming Responsibly Into the Data Lake
How Apache Hudi maintains optimum sized files
Mar 15, 2021
•
byte[array]
Optimize Data Lake layout using Clustering in Apache Hudi
This blog is a repost of this Hudi blog on medium.
Jan 28, 2021
•
byte[array]
Employing the right indexes for fast updates, deletes in Apache Hudi
This blog is a repost of this Hudi blog on medium.
Dec 19, 2020
•
byte[array]
See all
byte_array’s Substack
Substack for under-the-hood learnings on data systems and architecture.
Subscribe
byte_array’s Substack
Subscribe
About
Archive
Sitemap
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts