Add new prometheus metrics for replications queues and set those instead of logging queue errors #464

neilcook · 2025-11-04T22:58:43Z

When the send and/or recv replication queues are full, previous behaviour was to log an error message. Given that replication messages can be in the order of thousands or tens of thousands per second, this would cause a doom loop of performance problems.

Instead, this PR creates new metrics to track the send queue size and also to track when the queue size is exceeded. These metrics are used instead of logging.

New metrics examples:

# HELP wforce_repl_send_queue_size How full is the replication per-sibling send queue?
# TYPE wforce_repl_send_queue_size gauge
wforce_repl_send_queue_size{sibling="1.2.3.4:1234"} 10
wforce_repl_send_queue_size{sibling="127.0.0.1"} 0
wforce_repl_send_queue_size{sibling="1.2.3.4"} 0
wforce_repl_send_queue_size{sibling="127.0.0.1:1233"} 10

# HELP wforce_replication_send_queue_error_total How many errors trying to add replication messages to the send queue?
# TYPE wforce_replication_send_queue_error_total counter
wforce_replication_send_queue_error_total{sibling="1.2.3.4:1234"} 0
wforce_replication_send_queue_error_total{sibling="127.0.0.1"} 0
wforce_replication_send_queue_error_total{sibling="1.2.3.4"} 0
wforce_replication_send_queue_error_total{sibling="127.0.0.1:1233"} 0

# HELP wforce_replication_rcvd_queue_error_total How many errors trying to add replication msgs to the receive queue?
# TYPE wforce_replication_rcvd_queue_error_total counter
wforce_replication_rcvd_queue_error_total{sibling="1.2.3.4:1234"} 0
wforce_replication_rcvd_queue_error_total{sibling="127.0.0.1"} 0
wforce_replication_rcvd_queue_error_total{sibling="1.2.3.4"} 0
wforce_replication_rcvd_queue_error_total{sibling="127.0.0.1:1233"} 0

… a metric

… instead of logging when queue size too big

github-actions · 2025-11-05T00:22:08Z

Test Results

2 files ±0 2 suites ±0 33m 18s ⏱️ -3s
73 tests ±0 73 ✅ ±0 0 💤 ±0 0 ❌ ±0
146 runs ±0 146 ✅ ±0 0 💤 ±0 0 ❌ ±0

Results for commit 7eff70f. ± Comparison against base commit 5d13229.

♻️ This comment has been updated with latest results.

neilcook added 4 commits November 4, 2025 22:52

wforce: Add new metrics to track send and recv queue errors and size

96fe007

wforce: Don't log when recv replication queue is full, just increment…

5f6a263

… a metric

wforce: Setup new prometheus metrics for send queue size and set them…

8af1653

… instead of logging when queue size too big

ci: Upload artifacts in GitHub CI

7eff70f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add new prometheus metrics for replications queues and set those instead of logging queue errors #464

Add new prometheus metrics for replications queues and set those instead of logging queue errors #464

Uh oh!

neilcook commented Nov 4, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add new prometheus metrics for replications queues and set those instead of logging queue errors #464

Are you sure you want to change the base?

Add new prometheus metrics for replications queues and set those instead of logging queue errors #464

Uh oh!

Conversation

neilcook commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

neilcook commented Nov 4, 2025 •

edited

Loading

github-actions bot commented Nov 5, 2025 •

edited

Loading