-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Track timing of consensus stages #3930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: staging
Are you sure you want to change the base?
Conversation
For dev/test purposes it's perfectly fine; it is only in production where it's an anti-pattern, and should instead be incorporated into the workflow. The alternative would be to store it in the related objects, which would also work, though could be a bit trickier to implement.
As long as we don't expect very long measurements/outputs, this is fine. Oh, and naturally, this still needs to be feature-gated. |
…se parking_lot lock
|
Would it not be better to add a separate feature for this? Or do you think the performance overhead for test networks that do not require timing information is minimal? |
Using test_network for consensus_tracking could be inefficient for long-running dev networks. Merging it with production metrics would take a lot more work.
Agreed, made a new |
node/bft/src/helpers/timing.rs
Outdated
| use serde::{Deserialize, Serialize}; | ||
|
|
||
| /// Global storage for round-based event data | ||
| static ROUND_EVENTS: Lazy<Arc<RwLock<HashMap<u64, RoundEvents>>>> = Lazy::new(|| Arc::new(RwLock::new(HashMap::new()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| static ROUND_EVENTS: Lazy<Arc<RwLock<HashMap<u64, RoundEvents>>>> = Lazy::new(|| Arc::new(RwLock::new(HashMap::new()))); | |
| static ROUND_EVENTS: Lazy<Arc<RwLock<HashMap<u64, RoundEvents>>>> = Lazy::new(|| Default::default()); |
nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might also be able to use LazyLock in std instead of relying on OnceCell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that preferred?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ljedrz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments.
node/consensus/src/lib.rs
Outdated
| // Export timing data to JSON after block generation | ||
| #[cfg(feature = "test_consensus_tracking")] | ||
| { | ||
| let dev_index = self.bft().primary().gateway().dev().unwrap_or_default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be 0 when not in dev mode? We could alternatively append nothing or _nodev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch: 5da19d5
| test_targets = [ "snarkos-cli/test_targets" ] | ||
| test_consensus_heights = [ "snarkos-cli/test_consensus_heights" ] | ||
| test_network = [ "snarkos-cli/test_network" ] | ||
| test_consensus_tracking = [ "snarkos-node/test_consensus_tracking" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to start documenting these new features. Either here or in the README similar to what I did in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let _lowest_round = rounds.iter().min().copied().unwrap_or(0); | ||
| let _highest_round = rounds.iter().max().copied().unwrap_or(0); | ||
|
|
||
| #[cfg(feature = "test_consensus_tracking")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to add a macro for the start_subdag_stage/end_subdag_stage pair. That macro could also be no-op if the feature flag isn't set, so we do not have to have this many feature gates in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I can push a commit for with this change, if it is too much of a hassle for you)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed it was really ugly, made it more readable: 5da19d5
I don't like the marginal improvement of a new macro, but maybe once your other tracing logic works and we can decorate functions with it, it could replace this current approach entirely.
Motivation
This PR introduces a method to track consensus events for visualization, building on top of @kpandl 's earlier work. My understanding of the image below is that under load, block processing is the bottleneck, not certificate signing.
This "ugly" approach mirrors the way we track
pub static R1CS_HASHESin snarkVM for certain tests only.Open TODOs: