Skip to content

feat: add jolt-profiling crate#1364

Open
markosg04 wants to merge 3 commits intomainfrom
jolt-v2/jolt-profiling
Open

feat: add jolt-profiling crate#1364
markosg04 wants to merge 3 commits intomainfrom
jolt-v2/jolt-profiling

Conversation

@markosg04
Copy link
Copy Markdown
Collaborator

Summary

Changes

Testing

  • Ran tests for modified crates
  • cargo clippy and cargo fmt pass

Security Considerations

Breaking Changes

None

Copy link
Copy Markdown
Collaborator Author

markosg04 commented Mar 25, 2026

@markosg04 markosg04 marked this pull request as ready for review April 2, 2026 15:33
@0xAndoroid 0xAndoroid force-pushed the jolt-v2/jolt-field branch from 74305c7 to 17e1d5d Compare April 3, 2026 18:38
@0xAndoroid 0xAndoroid requested a review from sagar-a16z as a code owner April 3, 2026 18:38
@0xAndoroid 0xAndoroid force-pushed the jolt-v2/jolt-field branch 3 times, most recently from 15bbf97 to 327de42 Compare April 3, 2026 19:21
Base automatically changed from jolt-v2/jolt-field to main April 3, 2026 20:51
@0xAndoroid 0xAndoroid force-pushed the jolt-v2/jolt-profiling branch from 0a58fef to 240e60c Compare April 3, 2026 20:54
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Warning

This PR has more than 500 changed lines and does not include a spec.

Large features and architectural changes benefit from a spec-driven workflow.
See CONTRIBUTING.md for details on how to create a spec.

If this PR is a bug fix, refactor, or doesn't warrant a spec, feel free to ignore this message.

@github-actions github-actions bot added the no-spec PR has no spec file label Apr 3, 2026
Remove duplicate [workspace.lints.clippy] section introduced by rebase,
add workspace lint inheritance to jolt-profiling, bump sysinfo to
workspace version, and fix all lint violations.
Copy link
Copy Markdown
Collaborator

@0xAndoroid 0xAndoroid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 issues found after deduplication across semantic, bug, tech-debt, and security review passes. Two earlier issues (missing workspace lints, sysinfo version mismatch) were fixed in b003fba.

Comment on lines +44 to +50
let interval = Duration::from_millis((interval_secs * 1000.0) as u64);
let mut system = System::new_all();

thread::sleep(sysinfo::MINIMUM_CPU_UPDATE_INTERVAL);

while !stop.load(Ordering::Relaxed) {
system.refresh_all();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System::new_all() initializes disk, network, component, and process lists — none of which are read. refresh_all() refreshes all of them every tick. For a monitor that only needs CPU and memory this is significant unnecessary overhead per sample.

Suggested change
let interval = Duration::from_millis((interval_secs * 1000.0) as u64);
let mut system = System::new_all();
thread::sleep(sysinfo::MINIMUM_CPU_UPDATE_INTERVAL);
while !stop.load(Ordering::Relaxed) {
system.refresh_all();
let interval = Duration::from_millis((interval_secs * 1000.0) as u64);
let mut system = System::new();
thread::sleep(sysinfo::MINIMUM_CPU_UPDATE_INTERVAL);
while !stop.load(Ordering::Relaxed) {
system.refresh_cpu_all();

///
/// Uses GiB for values >= 1.0, otherwise MiB.
pub fn format_memory_size(gib: f64) -> String {
if gib >= 1.0 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory deltas from memory.rs can be negative (memory freed during a span). A delta of -2.0 GiB fails the >= 1.0 check, falls into the MiB branch, and formats as "-2048.00 MiB" instead of "-2.00 GiB".

Suggested change
if gib >= 1.0 {
if gib.abs() >= 1.0 {

let mut opts = Options::default();
opts.color_diffusion = true;
opts.count_name = String::from("MiB");
opts.factor = 1.0 / BYTES_PER_GIB * 1024.0;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is bytes→MiB (matching count_name = "MiB"), but the expression takes a detour through GiB. BYTES_PER_MIB is already defined in units.

Suggested change
opts.factor = 1.0 / BYTES_PER_GIB * 1024.0;
opts.factor = 1.0 / BYTES_PER_MIB;

Also update the import on line 8 to bring in BYTES_PER_MIB instead of (or in addition to) BYTES_PER_GIB.

Comment on lines +60 to +67
let mut memory_usage_map = MEMORY_USAGE_MAP.lock().unwrap();
let memory_gib_start = memory_usage_map
.remove(label)
.unwrap_or_else(|| panic!("no open memory span: {label}"));

let delta = memory_gib_end - memory_gib_start;
let mut memory_delta_map = MEMORY_DELTA_MAP.lock().unwrap();
assert_eq!(memory_delta_map.insert(label, delta), None);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEMORY_USAGE_MAP is held while acquiring MEMORY_DELTA_MAP. If the assert_eq! on line 67 panics, the first mutex is poisoned and every subsequent call to start_memory_tracing_span/end_memory_tracing_span panics with PoisonError. Drop the first lock before acquiring the second:

Suggested change
let mut memory_usage_map = MEMORY_USAGE_MAP.lock().unwrap();
let memory_gib_start = memory_usage_map
.remove(label)
.unwrap_or_else(|| panic!("no open memory span: {label}"));
let delta = memory_gib_end - memory_gib_start;
let mut memory_delta_map = MEMORY_DELTA_MAP.lock().unwrap();
assert_eq!(memory_delta_map.insert(label, delta), None);
let memory_gib_start = {
let mut map = MEMORY_USAGE_MAP.lock().unwrap();
map.remove(label)
.unwrap_or_else(|| panic!("no open memory span: {label}"))
};
let delta = memory_gib_end - memory_gib_start;
let mut memory_delta_map = MEMORY_DELTA_MAP.lock().unwrap();
assert_eq!(memory_delta_map.insert(label, delta), None);

let handle = thread::Builder::new()
.name("metrics-monitor".to_string())
.spawn(move || {
let interval = Duration::from_millis((interval_secs * 1000.0) as u64);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(negative_f64 * 1000.0) as u64 saturates to 0, and NaN also casts to 0. This creates a Duration::ZERO sleep, turning the loop into a CPU spin. Clamp to a sane minimum.

Suggested change
let interval = Duration::from_millis((interval_secs * 1000.0) as u64);
let interval = Duration::from_millis(((interval_secs * 1000.0) as u64).max(50));

Comment on lines +42 to +43
#[cfg(feature = "allocative")]
pub mod flamegraph;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every other module's public functions are re-exported at the crate root (memory::*, monitor::MetricsMonitor, setup::*). flamegraph is the exception — callers must use jolt_profiling::flamegraph::write_flamegraph_svg. Consider adding:

#[cfg(feature = "allocative")]
pub use flamegraph::{print_data_structure_heap_usage, write_flamegraph_svg};

Comment on lines +49 to +95
while !stop.load(Ordering::Relaxed) {
system.refresh_all();

let memory_gib = memory_stats()
.map(|s| s.physical_mem as f64 / BYTES_PER_GIB)
.unwrap_or(0.0);
let cpu_percent = system.global_cpu_usage();
let cores_active_avg = cpu_percent / 100.0 * (system.cpus().len() as f32);
let active_cores = system
.cpus()
.iter()
.filter(|cpu| cpu.cpu_usage() > 0.1)
.count();

#[cfg(target_os = "linux")]
let active_threads = std::fs::read_dir("/proc/self/task")
.map(|entries| entries.count())
.unwrap_or(0);

#[cfg(not(target_os = "linux"))]
let active_threads = 0_usize;

tracing::debug!(
counters.memory_gib = memory_gib,
counters.cpu_percent = cpu_percent,
counters.cores_active_avg = cores_active_avg,
counters.cores_active = active_cores,
counters.thread_count = active_threads,
);

thread::sleep(interval);
}

tracing::info!("MetricsMonitor stopping");
})
.expect("Failed to spawn metrics monitor thread");

MetricsMonitor {
handle: Some(handle),
stop_flag,
}
}
}

impl Drop for MetricsMonitor {
fn drop(&mut self) {
self.stop_flag.store(true, Ordering::Relaxed);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering::Relaxed on both the load (line 49) and store (line 95) is technically safe — join() provides the final synchronization — but on weakly-ordered architectures the thread may run extra iterations after Drop signals stop. Standard pattern for stop flags is Release (store) / Acquire (load). Minor, but it's a one-word change.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Benchmark comparison (crates)

No baseline found — skipping comparison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-spec PR has no spec file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants