A simplified cloud control plane demonstrating asynchronous project lifecycle management — similar to how platforms like Supabase provision backend resources on demand. Built from scratch with a TypeScript API, Go worker, PostgreSQL as both the data store and message broker, and a React dashboard.
┌─────────────────────────────────────────────────────────────────────┐
│ Docker Network │
│ │
│ ┌───────────────┐ HTTP ┌─────────────────────────────────┐ │
│ │ Browser │ ────────► │ React Frontend :5173 │ │
│ │ localhost:5173│ ◄──────── │ Vite dev server │ │
│ └───────────────┘ │ /api/* → proxy │ │
│ └────────────┬────────────────────┘ │
│ │ proxy /api → :3000 │
│ ┌────────────▼────────────────────┐ │
│ ┌───────────────┐ │ TypeScript API :3000 │ │
│ │ curl / tools │ ─────────►│ Node.js + Express │ │
│ │ localhost:3001│ │ │ │
│ └───────────────┘ │ POST /projects │ │
│ │ INSERT row (status=creating) │ │
│ │ pg_notify("provisioning_jobs")│ │
│ │ GET /projects[/:id] │ │
│ │ POST /projects/:id/retry │ │
│ │ GET /metrics GET /health │ │
│ └──────────────┬──────────────────┘ │
│ │ SQL + NOTIFY │
│ ┌──────────────▼──────────────────┐ │
│ │ PostgreSQL 16 :5432 │ │
│ │ projects table │ │
│ │ lifecycle_events table │ │
│ │ LISTEN/NOTIFY channel │ │
│ └──────────────┬──────────────────┘ │
│ │ LISTEN │
│ ┌──────────────▼──────────────────┐ │
│ │ Go Worker │ │
│ │ pq.Listener (dedicated conn) │ │
│ │ Claim job atomically │ │
│ │ Simulate 2–5s work │ │
│ │ 80% → ready / 20% → failed │ │
│ │ Record lifecycle_events │ │
│ └─────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
1. Browser sends POST /api/projects
2. Vite proxy strips /api, forwards to API :3000
3. API inserts row: { status: "creating" }
4. API records lifecycle event: none → creating
5. API calls pg_notify("provisioning_jobs", '{"project_id":"<uuid>"}')
6. API returns 201 immediately — no waiting for the worker
7. Worker receives NOTIFY on its dedicated listener connection
8. Worker: UPDATE projects SET status='provisioning' WHERE id=$1 AND status='creating'
→ 0 rows affected → another worker claimed it → skip (idempotency guard)
9. Worker sleeps 2–5 seconds (simulated provisioning)
10. Worker writes: ready (80%) or failed (20%) + error_reason
11. Worker records lifecycle event: provisioning → ready|failed
12. Browser poll picks up the new status
| Component | Technology |
|---|---|
| Frontend | React 18 + Vite + TypeScript |
| API | Node.js 20 + Express + TypeScript |
| Worker | Go 1.22 + lib/pq |
| Queue | PostgreSQL pg_notify / LISTEN |
| Database | PostgreSQL 16 |
| Orchestration | Docker Compose |
pg_notify
┌─────────────────────────────────────┐
│ │
┌─────▼──────┐ Worker claims ┌─────────┴────────┐
│ creating │ ────────────────► │ provisioning │
└────────────┘ └────────┬─────────┘
│
┌──────────────────┤
│ 80% │ 20%
┌────────▼──────┐ ┌────────▼──────┐
│ ready │ │ failed │
└───────────────┘ └───────┬───────┘
│
POST /retry
│
(resets to creating,
inside same TX as
pg_notify — atomic)
# 1. Copy environment file
cp .env.example .env
# 2. Build and start all four services
docker compose up --build
# 3. Open the dashboard
open http://localhost:5173
# 4. Or hit the API directly
curl -s http://localhost:3001/health | jqAll endpoints are available at http://localhost:3001 (host) or http://api:3000 (inside Docker).
Create a new project. Returns immediately (status: creating); provisioning is async.
curl -s -X POST http://localhost:3001/projects \
-H "Content-Type: application/json" \
-d '{"name": "my-project"}' | jqResponse 201:
{
"id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"name": "my-project",
"status": "creating",
"error_reason": null,
"created_at": "2024-01-01T00:00:00.000Z",
"updated_at": "2024-01-01T00:00:00.000Z"
}List all projects. Optional ?status= filter.
curl -s http://localhost:3001/projects | jq
curl -s "http://localhost:3001/projects?status=failed" | jqGet a single project by UUID.
curl -s http://localhost:3001/projects/<uuid> | jqRe-queue a failed project. Only works when status === "failed". Returns 409 otherwise.
curl -s -X POST http://localhost:3001/projects/<uuid>/retry | jqLive counts by status.
curl -s http://localhost:3001/metrics | jqResponse:
{
"counts_by_status": { "creating": 0, "provisioning": 1, "ready": 7, "failed": 2 },
"failed_total": 2
}DB connectivity check. Used by Docker Compose health checks.
curl -s http://localhost:3001/health | jqThe React UI at http://localhost:5173 has three tabs:
| Tab | What it shows |
|---|---|
| Projects | Create projects, watch live status transitions, retry failures |
| Metrics | Success rate, avg provision time, status breakdown bar, failures table |
| Settings | Poll interval (1s–30s), status filter, live API/DB health, stack info |
A project stuck in provisioning with no worker running will remain there indefinitely. There is no heartbeat/timeout mechanism in this implementation — a real system would add a background sweeper that resets stale provisioning rows back to creating after a configurable timeout.
The API uses a connection pool (pg.Pool). Requests that hit during the outage return 500. The startup loop retries the initial DB connection up to 15 times (2 s apart) before exiting. Once Postgres recovers, the pool reconnects automatically.
pq.Listener has built-in reconnection with configurable min/max backoff (10 s–60 s). The outer ListenAndProcess loop adds an additional 5 s pause before restarting the listener. Any pg_notify events fired during the outage are lost — PostgreSQL does not buffer undelivered notifications. Projects in creating at the time of the outage will stay creating until manually retried.
Can happen if the API retries a pg_notify or if two workers are running. The worker's atomic claim guard (UPDATE ... WHERE status='creating') ensures only one worker transitions the project. The second sees rowsAffected=0 and skips silently.
The INSERT and pg_notify succeed (or both fail). If the container crashes after committing but before sending the HTTP response, the client gets a network error but the worker will still process the project. The client can poll GET /projects to confirm state.
POST /projects/:id/retry runs the status reset and the pg_notify inside a single transaction. If the transaction rolls back (e.g., DB error), PostgreSQL also discards the notification. No ghost jobs are enqueued.
Chose PostgreSQL because:
- Zero extra infrastructure — the queue is the same DB that stores state, so the notification and the row update are always consistent.
pg_notifyfires inside transactions; if the transaction rolls back, the notification is discarded — free at-most-once delivery semantics.- Perfect for low-to-medium throughput (thousands of jobs/day).
Costs:
- Notifications are not persisted. If no listener is connected when
pg_notifyfires, the message is dropped. A dedicated broker (Kafka, SQS) stores messages durably. - LISTEN uses a dedicated connection per worker — at very high worker counts this strains Postgres connection limits.
- No backpressure or flow control; a burst of notifications floods workers immediately.
Chose polling because:
- Simple to implement and debug; no persistent connections to manage.
- Adjustable interval (1–30 s) via the Settings tab.
Costs:
- Even at 2 s polling, each browser tab sends 30 requests/min to the API. WebSockets or SSE would push updates only on changes, eliminating redundant requests.
The worker's 2–5 s sleep and random outcome are intentional simplifications. In a real system this would be replaced by calls to a cloud provider SDK (e.g., AWS SDK, GCP client), Terraform, or Kubernetes API.
The API is stateless — multiple replicas behind a load balancer work without coordination. Each replica has its own pool connection to Postgres. The only shared state is the database.
Multiple worker instances can run simultaneously. The idempotency guard (UPDATE ... WHERE status='creating') ensures each project is processed exactly once. Adding workers increases throughput linearly, up to Postgres connection limits.
At high project creation rates:
projectsandlifecycle_eventsgrow unboundedly. Add a data-retention job to archive old rows.- The
GROUP BY statusquery inGET /metricsdoes a full table scan. Add a materialized counter (e.g., aproject_countstable updated by triggers) for O(1) metrics. pg_notifypayload is limited to 8 KB and notifications are not queued when no listener is connected. Use apending_jobstable as a durable outbox for high-reliability requirements.
PostgreSQL defaults to 100 connections. The API pool uses up to 10. Each worker uses 1 query pool connection + 1 listener connection. With the default config you can run ~40 workers before hitting limits. Use PgBouncer in transaction-pooling mode to multiplex many app connections over fewer server connections.
Every component emits JSON logs:
- API — structured JSON via a custom
log()helper. Every request/response includesrequest_id,method,path,status,duration_ms. - Worker — Go
log/slog(built-in since Go 1.21). Every state transition logsproject_id,worker_id,previous_state,new_state.
View live logs:
docker compose logs -f api
docker compose logs -f worker| Signal | Tool | What to instrument |
|---|---|---|
| Metrics | Prometheus + Grafana | projects_created_total, provisioning_duration_seconds histogram, projects_by_status gauge, API request rate/latency/error rate |
| Tracing | OpenTelemetry + Jaeger | Trace the full path: HTTP request → DB insert → pg_notify → worker LISTEN → final UPDATE; measure each span |
| Alerting | Grafana Alertmanager | Alert when failed rate exceeds threshold, when provisioning queue depth grows, when API p99 latency spikes |
| Error tracking | Sentry | Capture unhandled exceptions in API and worker with full stack traces |
| Health checks | k8s liveness probes | GET /health returns { status, db } — wire into readiness and liveness probes |
The lifecycle_events table records every state transition with a timestamp:
SELECT project_id, previous_state, new_state, occurred_at
FROM lifecycle_events
ORDER BY occurred_at DESC
LIMIT 50;This can feed a real-time audit log UI, or be streamed to a data warehouse via Debezium CDC for analytics.
# View lifecycle events in the DB
docker compose exec postgres psql -U clouduser -d clouddb \
-c "SELECT project_id, previous_state, new_state, occurred_at \
FROM lifecycle_events ORDER BY occurred_at DESC LIMIT 20;"
# Run only the database
docker compose up postgres
# Run API locally (outside Docker)
cd api && npm install && npm run dev
# Run worker locally (outside Docker)
cd worker && go run .docker compose down
# Also remove the database volume (destroys all data):
docker compose down -v