Skip to content

chore: enhance AMI with streamlined node configuration and setup#1182

Merged
MicBun merged 2 commits intomainfrom
amiSnapshot
Sep 29, 2025
Merged

chore: enhance AMI with streamlined node configuration and setup#1182
MicBun merged 2 commits intomainfrom
amiSnapshot

Conversation

@MicBun
Copy link
Member

@MicBun MicBun commented Sep 29, 2025

Improve AMI user experience with guided configuration workflow and comprehensive MCP integration support:

Node Configuration Improvements:

  • Replace auto-start behavior with explicit user configuration
  • Add welcome messages on first login with clear setup instructions
  • Support node reconfiguration for MCP-only changes (preserves identity)
  • Block private key changes post-configuration to prevent identity issues

MCP Integration Enhancements:

  • Fix MCP server startup using command parameters instead of environment variables
  • Force restricted access mode for production security
  • Add comprehensive MCP setup guidance in welcome messages
  • Include security group configuration instructions for port 8000
  • Update node update script with proper container recreation

Infrastructure Optimizations:

  • Implement stage-aware stack naming for environment isolation
  • Fix docker network naming to prevent tn_tn-network confusion
  • Remove unused mcp-transport parameter (always use SSE)
  • Update deployment guide with corrected examples and troubleshooting

User Experience:

  • Provide step-by-step MCP setup instructions on first login
  • Clear guidance for security group configuration
  • Simplified command options with better defaults
  • Enhanced documentation for both basic and advanced use cases

These changes ensure users have a smooth, guided experience from instance launch to fully operational node with optional AI integration capabilities.

resolves: https://github.com/trufnetwork/truf-network/issues/1237

image

Summary by CodeRabbit

  • New Features

    • Added safe reconfiguration mode that preserves your existing node identity and prevents accidental private key changes.
    • Improved setup experience with welcome messages and clearer configuration feedback.
    • MCP remains configurable; when enabled, it runs over SSE in restricted mode and is announced on first setup.
  • Changes

    • Update command renamed to tn-node-update; now uses docker compose and force-recreates containers.
    • Node image updated to latest; state-sync trusted provider adjusted.
    • Standardized network naming.
    • Stage-aware deployment naming for clearer environments.

@MicBun MicBun self-assigned this Sep 29, 2025
@coderabbitai
Copy link

coderabbitai bot commented Sep 29, 2025

Warning

Rate limit exceeded

@MicBun has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 4 minutes and 41 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 6ba9825 and 9c1d7df.

📒 Files selected for processing (1)
  • deployments/infra/stacks/ami_pipeline_stack.go (2 hunks)

Walkthrough

Introduces stage-aware CDK stack naming, updates AMI pipeline provisioning to support reconfiguration mode with private key preservation and service lifecycle controls, and adjusts docker-compose template for tn-node image/tag, state-sync trusted providers, postgres-mcp startup flags, and network naming.

Changes

Cohort / File(s) Summary
CDK stack naming
deployments/infra/ami-cdk.go
Reads context stage and constructs stack name AMI-Pipeline-<stage>-Stack; adds fmt import; retains stack creation behavior.
AMI pipeline reconfigure flow & scripts
deployments/infra/stacks/ami_pipeline_stack.go
Adds RECONFIGURE path when /opt/tn/.env exists; preserves TN_PRIVATE_KEY on reconfigure; blocks key changes; stops services and brings down compose before applying config; updates env file ownership; adds welcome/profile messages; renames update script to tn-node-update; switches to docker compose; logs explicit status messages.
Compose services and networking
deployments/infra/stacks/docker-compose.template.yml
tn-node image set to ghcr.io/trufnetwork/node:latest; updates --state-sync.trusted-providers to reference node-1; postgres-mcp uses command flags --transport=sse --access-mode=restricted --sse-host=0.0.0.0 and removes related env vars; declares network name tn-network.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant EC2 as AMI Instance
  participant Config as Configure Script
  participant Docker as Docker Compose
  participant Services as tn-node Services

  User->>EC2: First boot / login
  EC2->>Config: Run configuration
  alt Reconfigure mode (/.env exists)
    Config->>Services: Stop tn-node
    Config->>Docker: docker compose down
    Config-->>Config: Preserve TN_PRIVATE_KEY
    Config->>EC2: Update .env (CHAIN_ID, MCP, preserved key)
    Config->>Services: Start as needed
    Config-->>User: "Reconfiguration complete"
  else Initial configuration
    Config-->>Config: Use provided PRIVATE_KEY or generate new
    Config->>EC2: Write .env (CHAIN_ID, MCP, TN_PRIVATE_KEY)
    Config->>Services: Start as needed
    Config-->>User: "Configuration complete"
  end
Loading
sequenceDiagram
  autonumber
  participant CDK as CDK App
  participant Ctx as Context
  participant Stack as AMI Pipeline Stack

  CDK->>Ctx: TryGetContext("stage")
  alt Found
    Ctx-->>CDK: stage
  else Not found
    Ctx-->>CDK: "default"
  end
  CDK->>Stack: Create with name "AMI-Pipeline-<stage>-Stack"
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • outerlook

Poem

A rabbit taps keys with a gentle thrum,
New stacks greet stages as deployments come.
Keys kept snug, while services nap,
Then hop back up with a compose clap.
Node trusts node-1, networks align—
Ship it, burrow-safe, and running fine. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Linked Issues Check ⚠️ Warning Although the PR delivers extensive node configuration, MCP integration, and deployment enhancements, it makes no changes to AWS network settings, security groups, VPC configuration, or other infrastructure components required to resolve the AMI connectivity failures described in linked issue #1237. As such, none of the modifications directly address the stated objective of investigating and fixing AWS AMI instances failing to connect to the network. Include the necessary AWS networking updates—such as configuring VPC subnets, security group rules, or network interfaces—to directly resolve the AMI connectivity issue outlined in issue #1237.
Out of Scope Changes Check ⚠️ Warning The pull request introduces broad out-of-scope changes—node reconfiguration scripts, welcome messaging, MCP setup guidance, and stage-aware stack naming—that are unrelated to the network connectivity problem targeted by linked issue #1237. These unrelated alterations increase the scope of the PR beyond the networking fix and obscure the core objective of restoring AMI network access. Split the node configuration and MCP enhancements into separate chore PRs and refocus this pull request exclusively on AWS network configuration changes needed to restore AMI connectivity.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title clearly summarizes the main change of enhancing the AMI with improved node configuration and setup workflows, reflecting the substantial script and configuration updates in the changeset without extraneous details or vague phrasing. It is concise, specific, and immediately communicates the primary developer intent.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@MicBun MicBun marked this pull request as draft September 29, 2025 10:07
Improve AMI user experience with guided configuration workflow and comprehensive MCP integration support:

**Node Configuration Improvements:**
- Replace auto-start behavior with explicit user configuration
- Add welcome messages on first login with clear setup instructions
- Support node reconfiguration for MCP-only changes (preserves identity)
- Block private key changes post-configuration to prevent identity issues

**MCP Integration Enhancements:**
- Fix MCP server startup using command parameters instead of environment variables
- Force restricted access mode for production security
- Add comprehensive MCP setup guidance in welcome messages
- Include security group configuration instructions for port 8000
- Update node update script with proper container recreation

**Infrastructure Optimizations:**
- Implement stage-aware stack naming for environment isolation
- Fix docker network naming to prevent tn_tn-network confusion
- Remove unused mcp-transport parameter (always use SSE)
- Update deployment guide with corrected examples and troubleshooting

**User Experience:**
- Provide step-by-step MCP setup instructions on first login
- Clear guidance for security group configuration
- Simplified command options with better defaults
- Enhanced documentation for both basic and advanced use cases

These changes ensure users have a smooth, guided experience from instance launch to fully operational node with optional AI integration capabilities.

resolves: trufnetwork/truf-network#1237
@MicBun MicBun changed the title chore: connect AMI instance to the network chore: enhance AMI with streamlined node configuration and MCP setup Sep 29, 2025
@MicBun MicBun requested a review from outerlook September 29, 2025 11:20
@MicBun MicBun marked this pull request as ready for review September 29, 2025 11:20
@holdex
Copy link

holdex bot commented Sep 29, 2025

Time Submission Status

Member Status Time Action Last Update
MicBun ✅ Submitted 8h 30min Update time Sep 29, 2025, 1:16 PM
@outerlook ❌ Missing - ⚠️ Submit time -

@MicBun MicBun changed the title chore: enhance AMI with streamlined node configuration and MCP setup chore: enhance AMI with streamlined node configuration and setup Sep 29, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
deployments/infra/stacks/docker-compose.template.yml (1)

28-33: Harden defaults: restrict RPC (26657) from public exposure.

Publishing 26657 on all interfaces is risky in production. Default to localhost; operators can override explicitly.

-      - "26657:26657"
+      - "127.0.0.1:26657:26657"
deployments/infra/stacks/ami_pipeline_stack.go (1)

166-173: Standardize on docker compose (plugin) instead of legacy docker-compose.

The install step uses the Compose v2 plugin; checking docker-compose --version may fail if the v1 shim isn’t present. Prefer docker compose version.

-            - docker-compose --version
+            - docker compose version
🧹 Nitpick comments (5)
deployments/infra/stacks/docker-compose.template.yml (3)

22-22: Avoid :latest for tn-node to keep AMIs reproducible.

Pin to a version or digest and rely on tn-node-update to pull when desired.

-    image: ghcr.io/trufnetwork/node:latest
+    image: ghcr.io/trufnetwork/node:<version-or-digest>

57-57: Add a second state-sync trusted provider for resilience.

Using a single trusted provider is a SPOF. Include both node-1 and node-2 (or another independent provider).

-          /app/kwild setup init ... --state-sync.trusted-providers "4e0b5c...@node-1.mainnet.truf.network:26656" ...
+          /app/kwild setup init ... --state-sync.trusted-providers "4e0b5c...@node-1.mainnet.truf.network:26656,0c830b...@node-2.mainnet.truf.network:26656" ...

83-91: Confirm MCP bind intent.

--sse-host=0.0.0.0 exposes MCP on all interfaces. Given the MOTD suggests opening SG to 0.0.0.0/0, confirm this matches your security posture; otherwise consider binding to 127.0.0.1 by default and documenting how to expose it.

deployments/infra/ami-cdk.go (1)

25-31: Type‑safe stage handling for stack name.

TryGetContext returns interface{}. Coerce to string to avoid surprises and keep %s safe. Also aligns with the stack suffix naming learning. Based on learnings.

-    stage := app.Node().TryGetContext(jsii.String("stage"))
-    if stage == nil {
-        stage = "default"
-    }
-    stackName := fmt.Sprintf("AMI-Pipeline-%s-Stack", stage)
+    ctxStage := app.Node().TryGetContext(jsii.String("stage"))
+    stage := "default"
+    if s, ok := ctxStage.(string); ok && s != "" {
+        stage = s
+    }
+    stackName := fmt.Sprintf("AMI-Pipeline-%s-Stack", stage)
deployments/infra/stacks/ami_pipeline_stack.go (1)

355-379: Update script looks good; consider prune of old images optionally.

pull then up --force-recreate is correct. Optionally add a commented docker image prune -f to help operators reclaim space after updates.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 785becb and 6ba9825.

📒 Files selected for processing (3)
  • deployments/infra/ami-cdk.go (2 hunks)
  • deployments/infra/stacks/ami_pipeline_stack.go (2 hunks)
  • deployments/infra/stacks/docker-compose.template.yml (4 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: outerlook
PR: trufnetwork/node#1170
File: .github/workflows/ami-build.yml:75-75
Timestamp: 2025-09-22T18:35:49.536Z
Learning: In the trufnetwork/node repository, the CDK stack naming convention uses `WithStackSuffix()` which generates names in the format `<stackName>-<stackSuffix>-Stack`. The AMI pipeline stack uses `AMI-Pipeline-default-Stack` because no `stackSuffix` is set in the CDK context, so it defaults to "default".
📚 Learning: 2025-09-22T18:35:49.536Z
Learnt from: outerlook
PR: trufnetwork/node#1170
File: .github/workflows/ami-build.yml:75-75
Timestamp: 2025-09-22T18:35:49.536Z
Learning: In the trufnetwork/node repository, the CDK stack naming convention uses `WithStackSuffix()` which generates names in the format `<stackName>-<stackSuffix>-Stack`. The AMI pipeline stack uses `AMI-Pipeline-default-Stack` because no `stackSuffix` is set in the CDK context, so it defaults to "default".

Applied to files:

  • deployments/infra/ami-cdk.go
📚 Learning: 2025-09-19T18:59:51.942Z
Learnt from: outerlook
PR: trufnetwork/node#1168
File: tests/extensions/tn_digest/test_tn_digest.sh:15-15
Timestamp: 2025-09-19T18:59:51.942Z
Learning: In Docker Compose files, service names (like "tn-db:") can remain unchanged even when the underlying image is updated (e.g., from tn-db:local to ghcr.io/trufnetwork/node:local). Service names are internal network identifiers and don't need to match the actual image names.

Applied to files:

  • deployments/infra/stacks/docker-compose.template.yml
📚 Learning: 2025-09-19T18:14:01.428Z
Learnt from: outerlook
PR: trufnetwork/node#1168
File: .github/workflows/publish-node-image.yaml:19-19
Timestamp: 2025-09-19T18:14:01.428Z
Learning: The ghcr.io/trufnetwork/tn-db image was never released, so no backwards compatibility or transitional measures are needed when renaming to ghcr.io/trufnetwork/node.

Applied to files:

  • deployments/infra/stacks/docker-compose.template.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: acceptance-test
🔇 Additional comments (2)
deployments/infra/stacks/ami_pipeline_stack.go (2)

380-386: Service enablement timing: good call not enabling on build.

Deferring enablement until tn-node-configure avoids accidental starts with incomplete config.


387-463: MOTD and first‑login guidance: nice UX.

Clear, actionable instructions. Keep.

Copy link
Contributor

@outerlook outerlook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all nice for configuring to our mainnet... The only thing I think could be made easier is to configure non-default networks (e.g. our testnet nodes) or completely new networks (similar to how the default autogenerate does)

But this is not essential, I understand that. For that, currently we:

1/ bootstrap an AMI instance
2/ clear persisted directories (tn-data, pg-data)
3/ get kwild/use kwil service container to generate new files with kwild init
4/ put in the correct path and start again

So up to you if we let this improvement come over next iterations

@MicBun MicBun merged commit 5a33433 into main Sep 29, 2025
5 of 6 checks passed
@MicBun MicBun deleted the amiSnapshot branch September 29, 2025 13:20
@MicBun
Copy link
Member Author

MicBun commented Sep 29, 2025

@outerlook how about separate dev vs prod? Prod should always stick to tn-v2.1, where our mainnet is as it is what customers will use. I don't think we want customers to connect to other networks than ours or create new ones.

@outerlook
Copy link
Contributor

I don't think it's worth the burden of maintaining a new AMI for that. It's okay and desirable to default to production; the only idea is to make it easy to drift to these options, since we do that with some frequency, no need to make that the default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants