-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Add a dynamic GraphQL API that exposes records, metadata, and hook-produced features as a unified query surface. All queryable data lives in typed PostgreSQL tables; PostGraphile auto-generates the query API from the PG schema, including support for PG extension operators (pgvector, tsvector).
Context
- Records have metadata (currently JSON in
records.metadata) and features (typed PG tables infeatures.*schema) - Both metadata and feature column definitions are known at convention registration time (from Schema
FieldDefinition[]and HookColumnDef[]) - Users need cross-table queries: "records where species=human AND cell_classifier.confidence > 0.9 AND similar to this embedding"
- Query scoping by convention is wrong — users care about querying by data shape (schema, features), not by which convention produced the data
Design decisions
Typed PG tables for everything
Generate real PG tables from Schema field definitions (metadata tables) in addition to the existing feature tables from Hook column definitions. No JSON extraction at query time.
PostGraphile v5 as GraphQL layer
PostGraphile introspects PG schema and auto-generates a GraphQL API with filtering, pagination, ordering, and relationship traversal. Chosen over pg_graphql because:
- Composable custom operators — Grafast allows plugins that inject pgvector (
<=>), tsvector (@@), pg_trgm operators into the same SQL query as auto-generated filters - Plugin ecosystem —
addPgTableCondition,addPgTableOrderBy, connection-filter plugin - Maturity — 10 years, ~12.9k stars, active v5 RC development
Auth via reverse proxy
PostGraphile runs as an internal sidecar. FastAPI proxies /graphql:
- FastAPI receives request with JWT
- Verifies JWT, resolves Principal (existing auth pipeline)
- Forwards to PostGraphile with trusted headers (
X-OSA-User-Id,X-OSA-Role) - PostGraphile sets PG session variables via
pgSettings - RLS policies enforce row-level access
Dynamic schema detection
When feature/metadata tables are created via DDL, PostGraphile's watch mode (PG event triggers + LISTEN/NOTIFY) detects the change and rebuilds the GraphQL schema.
Architecture
Client ──JWT──► FastAPI /graphql ──trusted headers──► PostGraphile (internal)
(auth, proxy) (auto-generated GraphQL)
│
┌───────▼────────┐
│ PostgreSQL │
│ records │
│ metadata.* │
│ features.* │
│ pgvector │
└────────────────┘
Implementation steps
Phase 1: Typed metadata tables
- Generate metadata PG tables from Schema
FieldDefinition[](same pattern as feature tables) - Add
convention_srndenormalized column to records table - Add FK constraint from feature tables to records (refactor: add foreign key constraint from feature tables to records #75)
Phase 2: PostGraphile sidecar
- Add PostGraphile v5 to docker-compose (internal service, no host port)
- Configure watch mode for dynamic schema detection
- Add FK / comment directives for feature + metadata tables → records
- FastAPI
/graphqlproxy route with auth forwarding - RLS policies on records, metadata, and feature tables
- PostGraphile
pgSettingsfrom trusted headers
Phase 3: PG extension operators
- PostGraphile plugin for pgvector similarity as composable filter + orderBy
- PostGraphile plugin for tsvector full-text search with ts_rank ordering
Depends on
- refactor: add foreign key constraint from feature tables to records #75 (FK constraint from feature tables to records)
Related
- refactor: migrate vector embedding from post-publication index to pre-publication hook #70 (migrate vector embedding to hook)
- refactor: event system overhaul — consumer groups, decoupled domains, simplified pipeline #68 (event system overhaul)