Skip to content

Conversation

@ezeike
Copy link
Contributor

@ezeike ezeike commented Jun 19, 2025

This PR implements an Ethereum transaction Oracle allowing for cross-chain transfers to Canopy.

  • Added Ethereum blockchain interaction layer
  • Implemented ERC20 token caching and validation system
  • Created block provider with transaction monitoring
  • Added order store for managing cross-chain transactions
  • Included extensive test coverage and e2e testing framework
  • Integrated oracle functionality with controller and configuration
  • Added testing infrastructure with Anvil setup and utility scripts

@ezeike ezeike force-pushed the eth-oracle branch 3 times, most recently from aab69bb to 6b456bb Compare June 19, 2025 06:49
@@ -0,0 +1,340 @@
package main
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial Reveiw

Good

  • Code looks really high quality
  • Dynamically extracts the ERC20 info by using eth_call
  • Testing is robust and well done
  • Good abstraction

Concerns

  • Inspecting every ERC20 before checking if it includes the lock order / close order bytes is expensive and dangerous. Could lead to the oracle falling behind. Why not skip any ERC20 transfer that doesn't have 'extra data'?
  • "remove lock order from store if it was not found in the order book" is wrong - it should remove lock from the store if the order was 'locked'
  • Safe block logic and potential re-orgs (non-actionable - just noting)
  • Use of JSON file is non-atomic and very susceptible to corruption. Should use a badgerDB instance instead.
  • Does err := <-sub.Err(): mean that the connection dropped? If not p.connectWithRetry() could potentially cause multiple simultaneous subscribers.
  • I don't think a separate package at 'root' level for testing is necessary. Please move to the oracle package if possible
  • Potentially too much abstraction at the ETH transaction level. Would like to see code footprint reduced a bit if possible

Missing logic:

  • Marking orders included in a 'proposal' as 'pending' to not duplicate orders
  • Smart 'resubmit' logic
  • Smart 'waiting period' for preventing brand new locks and closes from being submitted to allow slow nodes to process orders on Ethereum
  • Removal of native 'lock' and 'close' order logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files are written atomically by writing to a temporary file then moving that file into place. This ensures only a successfully written file is moved into place.

This should be sufficient for safety guarantees and also lends itself to easier administration and debugging.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but speed is a concern - as we can't have the oracle falling behind. Do you forsee this being a bottleneck?

Copy link
Contributor Author

@ezeike ezeike Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but speed is a concern - as we can't have the oracle falling behind. Do you forsee this being a bottleneck?

In general I do not. On my system a benchmark wrote, read and removed 80k orders per second with sync.

How many orders should we be prepared to handle on a node that is very far behind?

One advantage to using BadgerDB though is adding commands to the canopy binary rather than using standard shell tools to query & troubleshoot.

@ezeike ezeike force-pushed the eth-oracle branch 17 times, most recently from 26efc88 to 1a1ccaf Compare June 24, 2025 18:21
@ezeike ezeike force-pushed the eth-oracle branch 9 times, most recently from a54880f to 8f492a1 Compare July 7, 2025 13:05
@ezeike ezeike force-pushed the eth-oracle branch 8 times, most recently from 75511d9 to 462a82d Compare November 23, 2025 00:55
ezeike and others added 20 commits January 26, 2026 20:38
Refactor EthBlockProvider architecture and add comprehensive test coverage

- Refactor EthBlockProvider to use centralized configuration struct
- Add calculateFetchRange function with proper height management and reorg protection
- Simplify processBlocks to use internal safeHeight state
- Add comprehensive table-driven tests for calculateFetchRange function
- Update TestEthBlockProvider_processBlocks to match simplified function signature
- Add transaction data size validation with 1024 byte limit
- Improve error handling and retry logic for transaction processing
- Add extensive documentation and flow diagrams
- Enhance oracle state management and validation
- Add new CLI query commands and RPC endpoints
- Add debug logs for RPC and WebSocket connection closing
- Add info log before connection retry with delay duration
- Add error log when websocket client is not initialized
- Add error log when subscription error is received
- Add info log when oracle is not enabled
Add 18 new Prometheus metrics for oracle monitoring across 5 categories:

Block height metrics:
- canopy_oracle_last_processed_height
- canopy_oracle_confirmation_lag
- canopy_oracle_orders_awaiting_confirmation
- canopy_oracle_reorg_rollback_depth (histogram)

Order lifecycle metrics:
- canopy_oracle_orders_not_in_orderbook_total
- canopy_oracle_orders_duplicate_total
- canopy_oracle_orders_archived_total
- canopy_oracle_lock_orders_committed_total
- canopy_oracle_close_orders_committed_total

Validation failure metrics:
- canopy_oracle_validation_failures_total (labeled by reason)

Submission tracking metrics:
- canopy_oracle_orders_held_awaiting_safe_total
- canopy_oracle_orders_held_propose_delay_total
- canopy_oracle_orders_held_resubmit_delay_total
- canopy_oracle_lock_order_resubmissions_total
- canopy_oracle_close_order_resubmissions_total

Store operation metrics:
- canopy_oracle_store_write_errors_total
- canopy_oracle_store_read_errors_total
- canopy_oracle_store_remove_errors_total

Also fixes test to use nil metrics (methods are nil-safe).
Add 23 new Prometheus metrics for better observability of Ethereum
block provider operations:

High Priority - Connection & Sync:
- RPC/WS connection attempts and errors
- Connection state tracking (disconnected/connecting/rpc/fully_connected)
- Sync status (unsynced/syncing/synced)
- Block height lag

High Priority - Block Processing:
- Block fetch errors by type
- Processing timeouts and batch sizes
- Reorg detection

Medium Priority - Transaction Processing:
- Transactions total, parse errors, retry by attempt
- Exhausted retries, success status breakdown
- Receipt fetch errors

Medium Priority - Order Detection & Token Cache:
- ERC20 transfer, lock/close order detection
- Order validation errors by type
- Token info fetch errors and contract call timeouts
Add three new gauge metrics for tracking Ethereum block heights:
- canopy_eth_chain_head_height: latest block from chain head
- canopy_eth_last_processed_height: last block successfully processed
- canopy_eth_safe_height: blocks sent through channel (confirmed)

These complement the existing block_height_lag metric for better
visibility into sync progress and block processing state.
Document all Prometheus metrics for the oracle system including:
- Oracle metrics: block heights, order lifecycle, validation, submissions
- Eth metrics: connection, sync, block processing, transactions, orders

Includes example Prometheus queries, alerting recommendations,
and Grafana dashboard tips for effective monitoring.
Apply gofmt formatting fixes:
- Reorder imports alphabetically
- Align struct field assignments
- Normalize comment list indentation
- Add missing trailing newlines
- Remove extra blank lines
Change lock order safe height check from Debug to Info for better
visibility when orders are being held back due to confirmation
requirements.
- Update shouldSubmit() to log specific hold reasons at Info level
- Log propose delay with blocks needed until eligible
- Log resubmit delay with exact eligibility root height
- Log lock order cooldown with blocks remaining
- Log already submitted cases for both lock and close orders
- Remove redundant logs from WitnessedOrders that assumed wrong reason
Rename the CanopyOrders RPC endpoint and related code to OracleOrders
for better clarity about the endpoint's purpose.

- Route path: /v1/query/canopy-orders -> /v1/query/oracle-orders
- CLI command: canopy-orders -> oracle-orders
- Handler, client method, and response type renamed accordingly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants