Skip to content

docs: view schema documentation#1100

Merged
MicBun merged 2 commits intomainfrom
docs/schema
Aug 12, 2025
Merged

docs: view schema documentation#1100
MicBun merged 2 commits intomainfrom
docs/schema

Conversation

@williamrusdyputra
Copy link
Contributor

@williamrusdyputra williamrusdyputra commented Aug 12, 2025

resolves: #1099

Summary by CodeRabbit

  • Documentation
    • Added a new "Schema" section to the README linking to detailed database schema docs.
    • Introduced "Database Schema Documentation" explaining the data-streams model (providers, streams, events, taxonomies, metadata), relationships, keys, and soft-delete semantics.
    • Included a schema diagram for quick visual reference.

@williamrusdyputra williamrusdyputra self-assigned this Aug 12, 2025
@coderabbitai
Copy link

coderabbitai bot commented Aug 12, 2025

Walkthrough

Adds a new docs/schema.md detailing the database schema and links it from README.md under a new "Schema" section. No code or API changes.

Changes

Cohort / File(s) Summary
README Schema Link
README.md
Inserts a "Schema" section linking to ./docs/schema.md after Terminology.
Schema Documentation
docs/schema.md, docs/images/schema.png (referenced)
Introduces database schema documentation covering tables, fields, keys, relationships, and soft-delete semantics.

Sequence Diagram(s)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~6 minutes

Assessment against linked issues

Objective Addressed Explanation
Provide accessible schema documentation (#1099)

Assessment against linked issues: Out-of-scope changes

Suggested reviewers

  • outerlook
  • MicBun

Poem

I thump my feet in docs delight,
A schema blooms in tidy light—
Streams and keys in ordered rows,
Taxonomies where data grows.
With links now clear, I hop to see,
A carrot-shaped ERD for me! 🥕🐇

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch docs/schema

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@williamrusdyputra williamrusdyputra marked this pull request as ready for review August 12, 2025 04:48
@williamrusdyputra
Copy link
Contributor Author

@MicBun if you have any suggestions regarding wording or maybe there are missing info, please do, thank you

@holdex
Copy link

holdex bot commented Aug 12, 2025

Time Submission Status

Member Status Time Action Last Update
williamrusdyputra ✅ Submitted 30min Update time Aug 12, 2025, 4:52 AM
MicBun ✅ Submitted 15min Update time Aug 12, 2025, 5:10 AM

Copy link
Member

@MicBun MicBun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good 👍

@MicBun MicBun merged commit 91b9df0 into main Aug 12, 2025
6 of 8 checks passed
@MicBun MicBun deleted the docs/schema branch August 12, 2025 04:50
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (7)
docs/schema.md (6)

22-28: Clarify stream identity and constraints.

You have both id and stream_id. Clarify:

  • Is stream_id a human-readable identifier? Is it globally unique or unique per data_provider_id?
  • Define UNIQUE constraints accordingly (e.g., UNIQUE(data_provider_id, stream_id)).
  • Document valid values for stream_type (enum) and whether it’s constrained.

41-54: Define taxonomy integrity rules (uniqueness, cycles, effective dating).

To keep hierarchies sound:

  • Enforce no cycles (e.g., via application logic or triggers).
  • Add UNIQUE constraints to prevent duplicates, e.g., UNIQUE(stream_ref, child_stream_ref, start_time) or with group_sequence if required.
  • Clarify whether start_time pairs with an end_time or relies on disabled_at for bitemporal semantics.

55-69: Metadata model: ensure determinism and efficient lookup.

With a wide KV table:

  • Consider UNIQUE(stream_ref, metadata_key, created_at) if multiple values over time are allowed, or UNIQUE(stream_ref, metadata_key) if only one active at a time.
  • Add indexes: (stream_ref, metadata_key, disabled_at), and possibly partial index where disabled_at IS NULL for fast “active” lookups.
  • If multiple type columns are mutually exclusive, document the rule (only one of value_i/value_f/value_b/value_s/value_ref set).

10-18: Provider address: document uniqueness and canonical form.

If address is a blockchain address, state:

  • Canonical representation (checksum/encoding, case sensitivity).
  • Uniqueness (UNIQUE(address)).
  • Whether multiple addresses can map to a single provider entity.

3-5: Add operational details: time zone and referential targets.

  • Specify the canonical time zone (UTC recommended) for all timestamp fields.
  • For FKs, name the exact targets (e.g., stream_ref → streams.id). This avoids ambiguity with stream_id.

35-40: Revisit composite PK design for primitive_events

Including created_at in the primary key can make upserts and deduplication more complex. Typical time-series tables use:

  • Primary key: (stream_ref, event_time)
  • Secondary indexes:
    • (stream_ref, event_time DESC) for latest‐per‐stream queries
    • (stream_ref, created_at) if you need insertion‐order lookups

Please confirm whether you actually expect multiple records with the same (stream_ref, event_time). If not, remove created_at from the PK (docs/schema.md lines 35–40) and add it as a separate index instead.

README.md (1)

15-17: Add a one-line description for consistency with other sections.

Other sections include a short intro sentence before the link. Mirror that for uniformity.

Apply this diff:

 ## Schema

-[Schema](./docs/schema.md)
+An overview of the database tables, relationships, and keys:
+
+[Schema](./docs/schema.md)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81b3227 and c675542.

⛔ Files ignored due to path filters (1)
  • docs/images/schema.png is excluded by !**/*.png
📒 Files selected for processing (2)
  • README.md (1 hunks)
  • docs/schema.md (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: acceptance-test
🔇 Additional comments (1)
README.md (1)

15-17: LGTM: Adding Schema section and link.

The placement and link path look good and improve discoverability.

## Overview
This database schema manages data streams with taxonomies, events, and metadata tracking.

![Schema](/docs/images/schema.png)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Fix image path: leading slash will break rendering on GitHub.

In Markdown on GitHub, /docs/images/schema.png resolves from the site root, not the repo. Use a relative path from this file instead.

Apply this diff:

-![Schema](/docs/images/schema.png)
+![Database schema diagram](./images/schema.png)

Run this script to verify the image actually exists and the link renders relative to docs/schema.md:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Check that the schema doc and image exist:"
fd -a -t f 'schema.md'
fd -a -t f 'schema.png'

echo "Preview the image link target as it would resolve from docs/schema.md:"
# Printing expected relative path
echo "./docs/images/schema.png should exist relative to repo root if you keep it under docs/images/"

Length of output: 585


Fix image path and alt text in docs/schema.md

The image exists at docs/images/schema.png, so from docs/schema.md it should be referenced relatively, not from the site root. Also update the alt text for clarity.

• File: docs/schema.md
• Line: 6

Apply this change:

-![Schema](/docs/images/schema.png)
+![Database schema diagram](./images/schema.png)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
![Schema](/docs/images/schema.png)
![Database schema diagram](./images/schema.png)
🤖 Prompt for AI Agents
In docs/schema.md around line 6, the image reference uses an absolute root path
and unclear alt text ("Schema"); change the markdown image to use a relative
path to the image file (e.g. ./images/schema.png) and update the alt text to a
clearer description such as "Database schema diagram" so the file renders
correctly from the docs directory and has meaningful alt text.

Comment on lines +13 to +18
| Field | Type | Description |
|-------|------|-------------|
| `id` | PK | Unique identifier |
| `address` | | Provider |
| `created_at` | | Record creation timestamp |

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Specify column types, nullability, defaults, and constraints.

The “Type” column is empty for multiple fields. This limits the doc’s usefulness when implementing clients or reviewing migrations.

Recommendations:

  • Add concrete DB types (e.g., uuid/text/bigint/timestamp), nullability (NOT NULL/NULL), and default values.
  • Call out unique constraints (e.g., address UNIQUE), and indexes commonly used by queries.

If helpful, I can generate a DDL template aligned with your Kwil/SQL dialect.

🤖 Prompt for AI Agents
In docs/schema.md around lines 13 to 18, several fields are missing concrete
column metadata; update the Type column to include precise DB types (e.g.,
uuid/text/timestamp/bigint), explicit nullability (NOT NULL or NULL), and any
default expressions (e.g., auto-generated UUIDs or current_timestamp) for each
field; mark primary key (id) and call out constraints such as address UNIQUE and
any indexes (e.g., frequent query columns like address or created_at) so the
table row shows type, nullability/default, and constraints/index
recommendations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Problem: can't view schema documentation

2 participants