Store git objects and refs in PostgreSQL tables. Standard git push/clone work against the database through a libgit2-based backend.
Two implementations of the core functions: pure SQL (PL/pgSQL, works on any managed Postgres with pgcrypto) and a C extension (native git_oid type, faster SHA1 and tree parsing via OpenSSL).
The fastest way to try gitgres. Builds everything and starts Postgres with the schema and both extensions loaded:
docker build -t gitgres .
docker run --rm -it gitgres
From inside the container:
./backend/gitgres-backend init "dbname=gitgres user=postgres" myrepo
./backend/gitgres-backend push "dbname=gitgres user=postgres" myrepo /path/to/repo
psql -U postgres -d gitgres
To run the tests inside the container:
docker exec <container> bash -c "cd /gitgres && make test"
Requires PostgreSQL with pgcrypto, libgit2, and libpq.
brew install libgit2
Create a database and load the schema:
createdb gitgres
make install-sql PGDATABASE=gitgres
Build the libgit2 backend:
make backend
Build the C extension (optional, for performance):
make ext
Initialize a repository in the database:
./backend/gitgres-backend init "dbname=gitgres" myrepo
Push a local git repo into Postgres:
./backend/gitgres-backend push "dbname=gitgres" myrepo /path/to/repo
Clone from Postgres back to disk:
./backend/gitgres-backend clone "dbname=gitgres" myrepo /path/to/dest
List refs stored in the database:
./backend/gitgres-backend ls-refs "dbname=gitgres" myrepo
Import a repo using the shell script (no compilation needed):
./import/gitgres-import.sh /path/to/repo "dbname=gitgres" myrepo
After importing or pushing a repo, refresh the materialized views:
REFRESH MATERIALIZED VIEW commits_view;
REFRESH MATERIALIZED VIEW tree_entries_view;Then query commits like a regular table:
SELECT sha, author_name, authored_at, message
FROM commits_view
ORDER BY authored_at DESC;Walk a tree:
SELECT path, mode, encode(oid, 'hex')
FROM git_ls_tree_r(1, decode('abc123...', 'hex'));make test
Runs 30 Minitest tests against a gitgres_test database. Each test runs in a transaction that rolls back on teardown. Tests cover object hashing (verified against git hash-object), object store CRUD, tree and commit parsing, ref compare-and-swap updates, and a full push/clone roundtrip.
Git objects (commits, trees, blobs, tags) are stored in an objects table with their raw content and a SHA1 OID computed the same way git does: SHA1("<type> <size>\0<content>"). Refs live in a refs table with compare-and-swap updates for safe concurrent access.
The libgit2 backend implements git_odb_backend and git_refdb_backend, the two interfaces libgit2 needs to treat any storage system as a git repository. The backend reads and writes objects and refs through libpq. When receiving a push, it uses libgit2's packfile indexer to extract individual objects from the incoming pack, then stores each one in Postgres.
The C extension adds a proper git_oid type to Postgres (20-byte fixed binary with hex I/O and btree/hash indexing) and C implementations of SHA1 hashing and tree parsing for better performance on large repos.
With git data in Postgres, a git forge doesn't need filesystem storage at all. Forgejo already keeps everything except git repos in the database. Its entire git interaction goes through a single Go package (modules/git) that shells out to the git binary. Replace that package with SQL queries against the gitgres schema and the filesystem dependency disappears. One Postgres instance, one backup, one replication stream.
See forgejo.md for a detailed analysis of Forgejo's git layer and what replacing it would involve.
libgit2-backends -- the official collection of pluggable ODB backends for libgit2. Includes MySQL, SQLite, Redis, and Memcached implementations. No Postgres backend, which is what prompted this project.
gitbase -- a MySQL-compatible SQL interface for querying git repositories, built on go-git. Reads from on-disk repos rather than storing objects in the database. The SQL query layer is the goal rather than the storage layer.
JGit DFS -- JGit's abstract distributed file system storage. Defines the interface for storing git pack files on arbitrary backends. Google's internal git infrastructure builds on this. An earlier attempt (JGit DHT) tried storing objects directly in databases but was abandoned because no database could keep up with the access patterns.
Gitaly -- GitLab's git storage service. An RPC server that wraps git operations rather than replacing the storage layer. Still uses the filesystem for actual object storage.
Dolt -- a SQL database with git-style versioning (branch, merge, diff) built on prolly trees. Comes at the problem from the opposite direction: it's a database that borrowed git's semantics, not git storage backed by a database.