Skip to content

feat: implement unified hook model with SDK, feature tables, and OCI runner#66

Merged
rorybyrne merged 6 commits intomainfrom
064-feat-implement-unified
Feb 25, 2026
Merged

feat: implement unified hook model with SDK, feature tables, and OCI runner#66
rorybyrne merged 6 commits intomainfrom
064-feat-implement-unified

Conversation

@rorybyrne
Copy link
Contributor

Summary

  • Replace the old validator model with a unified hook system where hooks produce typed features stored in per-convention PostgreSQL tables
  • Add Python SDK (sdk/py/) with @hook decorator, convention() declarations, testing harness, manifest generation, and deploy tooling
  • Add feature domain with FeatureStore port and PostgresFeatureStore adapter that creates per-convention PG schemas with typed columns from hook manifests
  • Replace DockerValidatorRunner with OciHookRunner supporting progress tracking, rejection, feature collection, and security hardening
  • Replace legacy CheckResult/CheckStatus with HookResult/HookStatus on ValidationRun; remove dead old validator port
  • Add Alembic migration for hooks JSON column and feature_tables catalog table
  • Convention creation creates feature tables before persisting the row (atomic ordering)
  • Quote all PG identifiers in dynamic DDL to handle convention IDs with hyphens
  • 532 server tests + 76 SDK tests passing

Test plan

  • All 532 server unit tests pass
  • All 76 SDK unit tests pass
  • Ruff lint clean
  • ty type check passes (pre-existing warnings only)
  • Pre-commit hooks pass

…runner

Replace the old validator model with a unified hook system where hooks
produce typed features stored in per-convention PostgreSQL tables. This
includes the Python SDK for authoring hooks, the OCI container runner,
dynamic DDL for feature tables, and the event-driven validation pipeline.

Key changes:
- Add Python SDK (sdk/py/) with @hook decorator, convention declarations,
  testing harness, manifest generation, and deploy tooling
- Add feature domain with FeatureStore port and PostgresFeatureStore adapter
  that creates per-convention PG schemas with typed columns from hook manifests
- Replace old OCI DockerValidatorRunner with OciHookRunner supporting
  progress tracking, rejection, feature collection, and security hardening
- Replace CheckResult/CheckStatus with HookResult/HookStatus on ValidationRun
- Remove dead old validator port (ValidatorRunner, ValidationInputs)
- Update validation API route to serve HookResult-based responses
- Add Alembic migration for hooks JSON column and feature_tables catalog
- Convention creation now creates feature tables before persisting the row
- Quote all PG identifiers in dynamic DDL to handle convention IDs with hyphens
- Validate record_srn presence before feature insertion
- 532 server tests + 76 SDK tests passing
@github-actions
Copy link

github-actions bot commented Feb 21, 2026

Code Coverage

Package Line Rate Complexity Health
. 70% 0
application 0% 0
application.api 100% 0
application.api.rest 0% 0
application.api.v1 82% 0
application.api.v1.routes 10% 0
application.event 100% 0
cli 40% 0
cli.commands 18% 0
cli.util 53% 0
domain 100% 0
domain.auth 100% 0
domain.auth.command 97% 0
domain.auth.event 100% 0
domain.auth.model 92% 0
domain.auth.port 99% 0
domain.auth.query 93% 0
domain.auth.service 90% 0
domain.auth.util 100% 0
domain.auth.util.di 81% 0
domain.curation 100% 0
domain.curation.adapter 100% 0
domain.curation.command 100% 0
domain.curation.event 100% 0
domain.curation.handler 92% 0
domain.curation.model 100% 0
domain.curation.port 100% 0
domain.curation.query 100% 0
domain.curation.service 100% 0
domain.deposition 100% 0
domain.deposition.adapter 100% 0
domain.deposition.command 69% 0
domain.deposition.event 100% 0
domain.deposition.handler 100% 0
domain.deposition.model 98% 0
domain.deposition.port 100% 0
domain.deposition.query 60% 0
domain.deposition.service 97% 0
domain.export 100% 0
domain.export.adapter 100% 0
domain.export.command 100% 0
domain.export.event 100% 0
domain.export.model 100% 0
domain.export.port 100% 0
domain.export.query 100% 0
domain.export.service 100% 0
domain.feature 100% 0
domain.feature.handler 100% 0
domain.feature.model 0% 0
domain.feature.port 100% 0
domain.feature.service 97% 0
domain.feature.util 100% 0
domain.feature.util.di 0% 0
domain.index 100% 0
domain.index.event 100% 0
domain.index.handler 76% 0
domain.index.model 84% 0
domain.index.service 100% 0
domain.record 100% 0
domain.record.adapter 100% 0
domain.record.command 100% 0
domain.record.event 100% 0
domain.record.handler 100% 0
domain.record.model 100% 0
domain.record.port 100% 0
domain.record.query 100% 0
domain.record.service 100% 0
domain.search 100% 0
domain.search.adapter 100% 0
domain.search.command 100% 0
domain.search.event 100% 0
domain.search.model 100% 0
domain.search.port 100% 0
domain.search.query 100% 0
domain.search.service 100% 0
domain.semantics 100% 0
domain.semantics.command 31% 0
domain.semantics.event 100% 0
domain.semantics.handler 100% 0
domain.semantics.model 100% 0
domain.semantics.port 100% 0
domain.semantics.query 0% 0
domain.semantics.service 100% 0
domain.semantics.util 100% 0
domain.semantics.util.di 0% 0
domain.shared 90% 0
domain.shared.authorization 87% 0
domain.shared.model 92% 0
domain.shared.port 100% 0
domain.source 100% 0
domain.source.event 100% 0
domain.source.handler 96% 0
domain.source.model 100% 0
domain.source.port 100% 0
domain.source.schedule 67% 0
domain.source.service 98% 0
domain.validation 100% 0
domain.validation.adapter 100% 0
domain.validation.command 0% 0
domain.validation.event 100% 0
domain.validation.handler 90% 0
domain.validation.model 85% 0
domain.validation.port 100% 0
domain.validation.query 100% 0
domain.validation.service 73% 0
infrastructure 100% 0
infrastructure.auth 0% 0
infrastructure.event 68% 0
infrastructure.http 36% 0
infrastructure.index 0% 0
infrastructure.index.vector 73% 0
infrastructure.messaging 100% 0
infrastructure.oci 52% 0
infrastructure.persistence 71% 0
infrastructure.persistence.adapter 74% 0
infrastructure.persistence.mappers 56% 0
infrastructure.persistence.repository 31% 0
infrastructure.source 0% 0
sdk 100% 0
sdk.index 100% 0
util 100% 0
util.di 25% 0
Summary 62% (3906 / 6295) 0

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 21, 2026

Greptile Summary

This large PR (157 files, +10k/-2k lines) replaces the old validator model with a unified hook system. Hooks are OCI containers that produce typed features stored in per-convention PostgreSQL tables. The PR also adds a Python SDK with @hook decorator, convention declarations, manifest generation, and deploy tooling, plus an OCI source runner for automated data ingestion.

Key changes:

  • New HookDefinition / HookManifest / HookLimits domain models replace ValidatorRef and CheckResult/CheckStatus
  • OciHookRunner replaces DockerValidatorRunner with improved security hardening (ReadonlyRootfs, CapDrop ALL, no-new-privileges, PidsLimit, MemorySwap)
  • New feature domain with PostgresFeatureStore adapter for dynamic DDL table creation from hook manifests
  • OciSourceRunner added for containerized data source execution with network access
  • SourceService refactored from in-process source pulling to OCI container model
  • WorkerPool now dynamically builds schedules from conventions with sources at startup
  • Alembic migration replaces validator_refs with hooks JSON column and adds feature_tables catalog
  • Python SDK (sdk/py/) provides @hook decorator, convention(), testing harness, and deploy CLI

Issues found:

  • Security (critical): CreateConventionHandler auth gate changed from at_least(Role.ADMIN) to public(), allowing unauthenticated users to register conventions that specify arbitrary OCI images for execution
  • Security: Docker socket mounted in production docker-compose.yml without a socket proxy; default JWT secret provided
  • HookManifest.name lacks pattern validation at the model level, relying on downstream _validate_pg_identifier in the feature store
  • PostgresFeatureStore.create_table has a TOCTOU race in its check-then-create pattern

Confidence Score: 2/5

  • The auth downgrade on convention creation is a significant security risk that should be resolved before merging.
  • The architecture and domain modeling are well-structured, and the OCI runner has good security hardening. However, the CreateConventionHandler auth gate change from Role.ADMIN to public() is a critical security issue — it allows unauthenticated users to register conventions that trigger arbitrary OCI container execution. Combined with the Docker socket mount in production compose, this creates a serious attack surface. The remaining issues (TOCTOU race, missing model-level name validation, default JWT secret) are lower severity but should also be addressed.
  • Pay close attention to server/osa/domain/deposition/command/create_convention.py (auth downgrade), deploy/docker-compose.yml (Docker socket + default JWT secret), and server/osa/infrastructure/persistence/feature_store.py (TOCTOU race).

Important Files Changed

Filename Overview
server/osa/domain/deposition/command/create_convention.py Auth gate changed from at_least(Role.ADMIN) to public(), allowing unauthenticated convention creation including OCI image execution specs.
server/osa/infrastructure/oci/runner.py Solid OCI hook runner with good security hardening (ReadonlyRootfs, CapDrop ALL, no network, PidsLimit, MemorySwap). Minor env injection concern with unsanitized hook name.
server/osa/infrastructure/persistence/feature_store.py Dynamic DDL feature table creation with PG identifier validation. TOCTOU race in check-then-create pattern; concurrent calls could bypass ConflictError.
server/osa/infrastructure/oci/source_runner.py OCI source runner with network access (intentional for upstream APIs). Good security hardening with CapDrop ALL and no-new-privileges. Code largely mirrors OciHookRunner.
server/osa/domain/validation/service/validation.py Clean refactor from validator-based to hook-based validation. Sequential hook execution with halt-on-failure. New validate_deposition method consolidates full workflow.
server/osa/domain/deposition/service/convention.py Convention service creates feature tables before persisting (good atomic ordering). Emits ConventionRegistered event for downstream source triggers. Clean integration with FeatureService and Outbox.
server/osa/domain/shared/model/hook.py Hook domain models (ColumnDef, FeatureSchema, HookManifest, HookLimits, HookDefinition). HookManifest.name lacks pattern validation despite being used in PG identifiers and file paths.
server/migrations/versions/add_hooks_and_feature_tables.py Clean Alembic migration: drops validator_refs, adds hooks/source JSON columns, creates feature_tables catalog. Reversible downgrade included.
server/osa/domain/source/service/source.py Major refactor from in-process source pulling to OCI container execution model. Creates depositions from container output records, moves staged files, and emits continuation events for chunked processing.
sdk/py/osa/runtime/entrypoint.py OCI container entrypoint for hooks. Reads record.json, runs hook, writes features.json or rejection to progress.jsonl. Previously flagged field mismatch (reason vs message) has been addressed.
deploy/docker-compose.yml Docker socket mount added for sibling container execution. Default JWT secret in production compose. Both are security concerns that should be addressed before production use.
server/osa/infrastructure/persistence/adapter/storage.py Extended with hook output dirs, feature reading, source staging dirs, and file move operations. Path traversal protection via _safe_path for uploaded files; hook_name used directly in paths relies on upstream validation.

Sequence Diagram

sequenceDiagram
    participant CLI as SDK Deploy CLI
    participant API as OSA Server API
    participant CS as ConventionService
    participant FS as FeatureService
    participant PG as PostgreSQL
    participant Outbox as Event Outbox
    participant Worker as WorkerPool
    participant Docker as Docker Daemon
    participant Hook as Hook Container

    CLI->>CLI: Build hook/source Docker images
    CLI->>API: POST /api/v1/conventions (hooks, source, schema)
    API->>CS: create_convention()
    CS->>FS: create_table(hook_def) for each hook
    FS->>PG: CREATE SCHEMA features + CREATE TABLE + catalog insert
    CS->>PG: INSERT convention row
    CS->>Outbox: ConventionRegistered event

    Note over Worker: On DepositionSubmitted event
    Worker->>CS: validate_deposition(dep_srn)
    CS->>CS: Build HookInputs (record_json, files_dir)
    loop For each HookDefinition
        CS->>Docker: Create + start hook container
        Docker->>Hook: Run with /osa/in (ro), /osa/out (rw)
        Hook->>Hook: Read record.json, write features.json
        Hook-->>Docker: Exit
        Docker-->>CS: Wait result + inspect OOM
        CS->>CS: Parse progress.jsonl, check rejection
    end
    CS->>Outbox: ValidationCompleted or ValidationFailed

    Note over Worker: On RecordPublished event
    Worker->>FS: insert_features_for_record()
    FS->>PG: Read features.json from cold storage
    FS->>PG: Bulk INSERT into features.hook_name table
Loading

Last reviewed commit: c473cbf

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

90 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

result = hook_fn(record)
except Reject as e:
# Write rejection to progress.jsonl
entry = {"status": "rejected", "reason": str(e)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field mismatch - entrypoint writes "reason" but runner reads "message"

The entrypoint writes rejection as {"status": "rejected", "reason": str(e)}, but OciHookRunner._check_rejection() reads entry.message (server/osa/infrastructure/oci/runner.py:212). This means rejections will always return None as the rejection reason.

Suggested change
entry = {"status": "rejected", "reason": str(e)}
entry = {"status": "rejected", "message": str(e)}

…pipeline

Complete the unified hook model with source ingestion support:

SDK:
- Add Source base class, SourceContext, SourceRecord, and SourceSchedule types
- Add convention() function for declaring schemas, hooks, sources, and file
  requirements together in code
- Add osa deploy CLI: builds OCI images for hooks and sources from
  pyproject.toml, registers conventions with the server via REST API
- Add osa-run-hook and osa-run-source console script entry points (replaces
  python -m invocation to avoid double-import issues)
- Add auto-discovery of convention packages via osa.conventions entry point
  group
- Generate Dockerfiles with build-essential and local SDK staging for
  projects using path-based SDK dependencies

Server:
- Add SourceDefinition domain model and SourceRunner port
- Add OciSourceRunner adapter (network-enabled containers for upstream API
  access, bind-mount staging with host path translation for sibling
  containers)
- Add TriggerSourceOnDeploy event handler (ConventionRegistered ->
  SourceRequested)
- Add PullFromSource handler with full deposition creation from source
  records
- Add InsertRecordFeatures handler (deferred feature insertion on record
  publication)
- Add host_data_dir config for Docker sibling container path translation
- Add file storage methods for source staging, hook features, and
  source-to-deposition file moves
- Fix OCI runner staging cleanup (onexc permission handler instead of
  ignore_errors)
- Increase hook error log capture to 2000 chars for debugging
- Make convention creation public (temporary, for osa deploy without auth)
- Add integration tests for convention, deposition, event, and feature store
  repositories
- Remove old in-process source discovery and geo_entrez source plugin
@rorybyrne
Copy link
Contributor Author

@greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

152 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
- Standardize rejection key on "message" (was "reason" in CLI)
- Replace RuntimeError with ExternalServiceError in OciSourceRunner
- Split staging into sibling input/ and output/ dirs under work_dir
- Add find_latest_where to filter events by payload field, fixing
  TriggerInitialSourceRun and SourceSchedule global query bug
@rorybyrne
Copy link
Contributor Author

@greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

156 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

- Move business logic from ValidateDeposition handler into
  ValidationService.validate_deposition() — handler now delegates
  to service instead of orchestrating repos directly
- Move feature insertion logic from InsertRecordFeatures handler
  into FeatureService.insert_features_for_record()
- Delete TriggerInitialSourceRun handler (replaced by
  TriggerSourceOnDeploy in previous commit)
- Move _compute_summary from validation route into
  ValidationRun.summary property on the domain entity
- Add RunStatus.REJECTED to distinguish hook rejections from
  infrastructure failures
- Fix file storage to use dedicated files/ subdirectory instead
  of writing directly to deposition root
- Add per-run isolation for source staging dirs
- Add SQL-level list_with_source to ConventionRepository (replaces
  in-memory filtering)
- Add pg identifier validation and conflict detection to feature
  store DDL
- Include image resolution in timeout scope for OCI hook/source
  runners
- Move httpx import to module level in SDK deploy command
@rorybyrne
Copy link
Contributor Author

@greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

157 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

OSA_DATABASE__URL: postgresql+asyncpg://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-osa}@db:5432/${POSTGRES_DB:-osa}
OSA_DATA_DIR: /data
OSA_LOGGING__LEVEL: ${LOG_LEVEL:-INFO}
OSA_AUTH__JWT__SECRET: ${JWT_SECRET:-change-me-in-production-must-be-32-chars-long}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default JWT secret in production compose

The default JWT secret change-me-in-production-must-be-32-chars-long will be used if JWT_SECRET isn't set. While the name is self-documenting, having any default in the production compose file means a misconfigured deployment silently uses an insecure secret. Consider removing the default so the server fails to start if JWT_SECRET isn't explicitly set.

Introduce PgIdentifier type with regex validation to ensure hook and
column names are safe PostgreSQL identifiers, preventing SQL injection,
path traversal, and environment variable injection attacks
@rorybyrne rorybyrne merged commit b704157 into main Feb 25, 2026
6 checks passed
@rorybyrne rorybyrne deleted the 064-feat-implement-unified branch February 25, 2026 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant