3 releases

Uses new Rust 2024

0.1.2 Nov 9, 2025
0.1.1 Nov 8, 2025
0.1.0 Nov 8, 2025

#575 in Database interfaces

AGPL-3.0-only

150KB
3K SLoC

flint

A lighter SQL database

Use case

Use Flint if you need: OLTP with low-latency writes of a single-writer SQL database, high-availability, and read replication.

Use something like Cassandra or ScyllaDB if you need maximum write throughput and single denormalized tables (no JOINs) with sharding across nodes.

If you do not fall within these two categories you may need to reconsider your schema requirements. Or, your usecase may fall into a specialized category with specific requirements. Geographic writes: Cockroach or Yubabyte, Analytics: Clickhouse, or Flexible schema: MongoDB. There are many more application specific databases, and you will likely know which to choose from if your requirements necessitate them.

Architecture

Flint follows a variation of the heap disk format that I've dubbed Log-structured Segmented Heap (LSH). The heap is split into segments (each segment corresponds to only one table) of 2MB. Our writes first go to an in-memory table + write-ahead log (WAL). In high-throughput scenarios where we have a growing queue of in-memory tables, we append new segments to the heap file, permanently growing the file. In all other cases we have a background vacuum process that queues up segments sorted by recency of last write (for temporal locality) and the amount of free slots in a segment. We can then perform insertions of inserts and updates of tuples to the "old" segment reclaiming dead tuple slots. Additionally, between segments (2MB) and pages (4KB), segments are split into blocks (64KB). This is the compressible unit and the unit of writes from the flushed in-memory tables. For point inserts and updates we also are able to do single page writes for uncompressed pages.

This architecture hopes to take advantage of the write-throughput of LSM designs utilizing in-memory tables + WAL, while avoiding the write-amplification induced by leveling and compaction of sstables. This design also maintains the advantage of traditional heap structure's superior point queries. We also leave room for optimizations such as compression and Postgres' HOT tuple locality.

Indexes

Indexing follows the MySQL primary-key indirection approach instead of updating every index for every tuple update. At the slight cost of read latency, we can maintain our higher write throughput we gained through the LSH architecture.

MVCC

Similarly to Postgres, Flint performs tuple level MVCC. All tuples are immutable once written, until they are automatically vacuumed.

Compression

Flint (will) implements LZ4 and Zstd compression at the 64KB block level. Blocks go through two phases when compression is enabled. Phase 1: an uncompressed block fills to 64KB. At which point this block is eligible for compression in future writes. We assume that if an entire block is written at once that there is a large sum of writes at the time and that we want to defer compression to a later time. Phase 2: compression of the block with inserts. The block is filled up to a set maximum that gives enough buffer space for future updates. Updates can have the same uncompressed size, but differing compression ratios so space is sacrificed to avoid overwrites that go beyond a block's fixed 64KB size (note: blocks are always padded to 64KB length whether compressed or not).

Phase 1 and 2 make up the transition phase of a block. Once compressed a block stays compressed (hot blocks defer compression for this reason). Operations on the block are now always in at block-level granularity; reads and writes. A compressed block maintains the buffer space so it can continue to be mutable. Its dead slots and vacuumed and reused for new point updates.

Replication (future improvements)

Standard deployment model is a single writer database with optional read replicas. This follows the common practice (used by Postgres) of streaming WAL changes to read replicas. This requires zero HA infrastructure and is the default solution for multi-instance deployments.

Failover and HA

Optionally, failover with a lease-based, single writer deployment is available. With a single writer, Flint maintains the low-latency writes of the single-writer model while having the HA benefits of multi-master deployments. The cost is a brief amount of downtime and dropped queries (still much shorter of than alternatives such as Postgres + Patroni). This designates 3-5 nodes in a membership Raft quorum, and it elects a primary write instance. (MAYBE) Each database (logical) is assigned a Raft group and can assign any of the nodes as the primary write instance. This distributes logical databases among the Flint instances for best write-throughput.

What Flint is not. An "infinitely" scaling multi-master database. This meets the 95% of use-cases where you have single-node writing and storage scale, but want the option for simple to manage HA and read replication to meet read throughput demands. You get fast writes, HA, read scaling, and operational sanity at the cost of geographically dispersed local writes and storage scaling for single deployments.

Heterogeneous Deployments (Advanced)

Each logical database can be configured to a different replica failover (RF) count. For a 5 node deployment, you can have your primary database as an RF of 5. So that it has 4 read replicas and failover nodes at any given time. Another logical database may not have the same read or failover requirements (RF 1), but it may require greater storage scaling. You can deploy 5 nodes, 4 of which meet the storage capacity needs of the primary database, and 1 that meets the storage capacity needs of the primary + secondary logical database. Thus, you have two logical databases within the same control plane with different HA, read replica, and capacity constraints.

Example:

You don't buy hardware then figure out where databases go. You decide:

db_users: 10TB, RF=3
db_analytics: 5TB, RF=1
db_events: 2TB, RF=2

Then compute required nodes:

db_users needs 3 nodes of at least 10TB each
db_analytics needs 1 node of at least 5TB
db_events needs 2 nodes of at least 2TB

Overlap these assignments:

Node 1: 10TB (db_users primary) + 2TB (db_events primary) = 12TB minimum
Node 2: 10TB (db_users replica) + 2TB (db_events replica) = 12TB minimum
Node 3: 10TB (db_users replica) = 10TB minimum
Node 4: 5TB (db_analytics) = 5TB minimum

This requires upfront planning, but has the benefit of deterministic hardware procurement. Any future "scaling" is vertical scaling of nodes following this calculation.

Source Layout

  Server
    └─> Handler
          └─> Executor
                ├─> Parser (SQLAST)
                ├─> Planner (AST → Plan)
                ├─> Storage Engine (data)
                └─> execute_plan(Plan, Storage) → Response

Todo

  • Support for variable length primary and secondary keys on indexes, currently flint currently only supports fixed-length values.
  • Proper serialization of segments/blocks
  • Hash indexes
  • MVCC for indexes (once UPDATE and DELETE are implemented)

Dependencies

~14MB
~253K SLoC