Skip to content

feat: dynamic GraphQL query API for records and features #76

@rorybyrne

Description

@rorybyrne

Summary

Add a dynamic GraphQL API that exposes records, metadata, and hook-produced features as a unified query surface. All queryable data lives in typed PostgreSQL tables; PostGraphile auto-generates the query API from the PG schema, including support for PG extension operators (pgvector, tsvector).

Context

  • Records have metadata (currently JSON in records.metadata) and features (typed PG tables in features.* schema)
  • Both metadata and feature column definitions are known at convention registration time (from Schema FieldDefinition[] and Hook ColumnDef[])
  • Users need cross-table queries: "records where species=human AND cell_classifier.confidence > 0.9 AND similar to this embedding"
  • Query scoping by convention is wrong — users care about querying by data shape (schema, features), not by which convention produced the data

Design decisions

Typed PG tables for everything

Generate real PG tables from Schema field definitions (metadata tables) in addition to the existing feature tables from Hook column definitions. No JSON extraction at query time.

PostGraphile v5 as GraphQL layer

PostGraphile introspects PG schema and auto-generates a GraphQL API with filtering, pagination, ordering, and relationship traversal. Chosen over pg_graphql because:

  • Composable custom operators — Grafast allows plugins that inject pgvector (<=>), tsvector (@@), pg_trgm operators into the same SQL query as auto-generated filters
  • Plugin ecosystemaddPgTableCondition, addPgTableOrderBy, connection-filter plugin
  • Maturity — 10 years, ~12.9k stars, active v5 RC development

Auth via reverse proxy

PostGraphile runs as an internal sidecar. FastAPI proxies /graphql:

  1. FastAPI receives request with JWT
  2. Verifies JWT, resolves Principal (existing auth pipeline)
  3. Forwards to PostGraphile with trusted headers (X-OSA-User-Id, X-OSA-Role)
  4. PostGraphile sets PG session variables via pgSettings
  5. RLS policies enforce row-level access

Dynamic schema detection

When feature/metadata tables are created via DDL, PostGraphile's watch mode (PG event triggers + LISTEN/NOTIFY) detects the change and rebuilds the GraphQL schema.

Architecture

Client ──JWT──► FastAPI /graphql ──trusted headers──► PostGraphile (internal)
                 (auth, proxy)                         (auto-generated GraphQL)
                                                              │
                                                     ┌───────▼────────┐
                                                     │   PostgreSQL   │
                                                     │  records       │
                                                     │  metadata.*    │
                                                     │  features.*    │
                                                     │  pgvector      │
                                                     └────────────────┘

Implementation steps

Phase 1: Typed metadata tables

Phase 2: PostGraphile sidecar

  • Add PostGraphile v5 to docker-compose (internal service, no host port)
  • Configure watch mode for dynamic schema detection
  • Add FK / comment directives for feature + metadata tables → records
  • FastAPI /graphql proxy route with auth forwarding
  • RLS policies on records, metadata, and feature tables
  • PostGraphile pgSettings from trusted headers

Phase 3: PG extension operators

  • PostGraphile plugin for pgvector similarity as composable filter + orderBy
  • PostGraphile plugin for tsvector full-text search with ts_rank ordering

Depends on

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    design-neededNeeds architectural discussion before implementationfeatureNew functionalityinfrastructureCI, Docker, deployment, migrations

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions