Track timing of consensus stages #3930

vicsn · 2025-10-12T20:19:24Z

Motivation

This PR introduces a method to track consensus events for visualization, building on top of @kpandl 's earlier work. My understanding of the image below is that under load, block processing is the bottleneck, not certificate signing.

This "ugly" approach mirrors the way we track pub static R1CS_HASHES in snarkVM for certain tests only.

Open TODOs:

Add a few logs so we can observe the timing in production

node/bft/src/helpers/timing.rs

ljedrz · 2025-10-13T08:20:50Z

dumping all of the data into a global lazy static, and only enabling this in dev/test_network mode. Would a pub static also be fine? Or something else?

For dev/test purposes it's perfectly fine; it is only in production where it's an anti-pattern, and should instead be incorporated into the workflow. The alternative would be to store it in the related objects, which would also work, though could be a bit trickier to implement.

exporting the data to a JSON file, also only in dev/test_network mode.

As long as we don't expect very long measurements/outputs, this is fine.

Oh, and naturally, this still needs to be feature-gated.

…se parking_lot lock

kaimast · 2025-10-20T22:39:01Z

Would it not be better to add a separate feature for this? Or do you think the performance overhead for test networks that do not require timing information is minimal?

Using test_network for consensus_tracking could be inefficient for long-running dev networks. Merging it with production metrics would take a lot more work.

vicsn · 2025-10-24T12:45:14Z

Would it not be better to add a separate feature for this? Or do you think the performance overhead for test networks that do not require timing information is minimal?

Agreed, made a new test_consensus_tracking feature. Specifically for long-running networks, these hacky JSON files might get huge.

node/bft/Cargo.toml

node/consensus/src/lib.rs

node/bft/src/helpers/timing.rs

ljedrz · 2025-10-27T08:56:40Z

node/bft/src/helpers/timing.rs

+use serde::{Deserialize, Serialize};
+
+/// Global storage for round-based event data
+static ROUND_EVENTS: Lazy<Arc<RwLock<HashMap<u64, RoundEvents>>>> = Lazy::new(|| Arc::new(RwLock::new(HashMap::new())));


Suggested change

static ROUND_EVENTS: Lazy<Arc<RwLock<HashMap<u64, RoundEvents>>>> = Lazy::new(|| Arc::new(RwLock::new(HashMap::new())));

static ROUND_EVENTS: Lazy<Arc<RwLock<HashMap<u64, RoundEvents>>>> = Lazy::new(|| Default::default());

nit

You might also be able to use LazyLock in std instead of relying on OnceCell.

Why is that preferred?

node/bft/src/helpers/timing.rs

ljedrz

Left a few comments.

kaimast · 2025-10-31T18:22:58Z

node/consensus/src/lib.rs

+        // Export timing data to JSON after block generation
+        #[cfg(feature = "test_consensus_tracking")]
+        {
+            let dev_index = self.bft().primary().gateway().dev().unwrap_or_default();


Should this be 0 when not in dev mode? We could alternatively append nothing or _nodev.

Good catch: 5da19d5

kaimast · 2025-10-31T18:29:28Z

Cargo.toml

 test_targets = [ "snarkos-cli/test_targets" ]
 test_consensus_heights = [ "snarkos-cli/test_consensus_heights" ]
 test_network = [ "snarkos-cli/test_network" ]
+test_consensus_tracking = [ "snarkos-node/test_consensus_tracking" ]


It would be good to start documenting these new features. Either here or in the README similar to what I did in this PR.

kaimast · 2025-10-31T18:31:07Z

node/consensus/src/lib.rs

+        let _lowest_round = rounds.iter().min().copied().unwrap_or(0);
+        let _highest_round = rounds.iter().max().copied().unwrap_or(0);
+
+        #[cfg(feature = "test_consensus_tracking")]


It might make sense to add a macro for the start_subdag_stage/end_subdag_stage pair. That macro could also be no-op if the feature flag isn't set, so we do not have to have this many feature gates in the code.

(I can push a commit for with this change, if it is too much of a hassle for you)

Agreed it was really ugly, made it more readable: 5da19d5

I don't like the marginal improvement of a new macro, but maybe once your other tracing logic works and we can decorate functions with it, it could replace this current approach entirely.

Track timing of consensus stages

3e17fa2

vicsn requested review from kaimast and ljedrz October 12, 2025 20:19