It would be useful to profile benchmarks run via pyperformance using perf record to help focus runtime optimizations efforts. This is currently possible, but requires post processing to filter out data that was collected from non-worker processes. Additionally, the data collected from workers may include samples taken while the benchmark was not running. Ideally we would only enable perf while the benchmark was running.