Add support for tracing #117

skoudmar · 2025-03-06T15:57:45Z

This PR adds support for tracing by importing tracepoints used by rclcpp and creating new ones.

ROS 2 tracing is built on the LTTng tracing framework.

Motivation

Tracing ROS applications can help identify performance bottlenecks or measure and identify causes of latencies.

Available tools to analyze traces:

Trace compass with experimental plugin ROS2
tracetools_analysis
My Ros2TraceAnalyzer

Overview

Traces of R2R application without this PR already include tracepoints
from rcl and rmw, but these are not sufficient for extracting
useful information. Therefore, this PR extends R2R in the following
areas:

Creates the r2r_tracing module, which
- Imports tracepoints from ROS package tracetools (a part of ros2_tracing).
- Defines new R2R specific tracepoints (for Node::spin_once).
- Uses the tracing feature flag. Without it nothing is imported/linked, and tracepoints are no-ops.
Calls tracepoint functions defined in r2r_tracing from r2r code.
Adds methods Node::subscribe_traced, Node::create_service_traced,
and Timer::on_tick. The first two are similar to Node::subscribe
and Node::create_service, but add tracepoints around user-supplied
callback. Timer::on_tick adds tracepoints to timer callback.
Updates R2R so that ROS entities are created in a location with
stable address.

Stable memory address

Without this PR, ROS entities are first created on the stack, so initialization events like ros2::rcl_publisher_init use the stack address as their tracepoint event. Besides the address, these initialization events usually contain helpful information such as topic or period. R2R then moves the entity to the heap using Box or Arc. When a traced function, for example, rcl_publish, is later called on the entity, the tracepoint event references the heap memory address instead. Since the heap address differs from the original stack address, the traced function cannot be correctly matched with the corresponding initialization event.

This PR solves the problem for publishers (cb10d28), subscriptions (3f24514), services (e9dece1), and clients (c926fa6). For timers, the problem was already solved in PR #88.

Importing tracepoints from `ros2_tracing`

To add tracepoints compatible with rclcpp to the R2R code, we first need to import them. This is handled by dynamically loading tracetools library from ros2_tracing. tracetools is a part of ROS, and is depended on by, e.g., rcl or rclcpp.

The following tracepoints are imported:

ros2:rclcpp_subscription_init
ros2:rclcpp_timer_link_node
ros2:rclcpp_subscription_callback_added
ros2:rclcpp_service_callback_added
ros2:rclcpp_timer_callback_added
ros2:rclcpp_callback_register
ros2:rclcpp_publish
ros2:rclcpp_take
ros2:callback_start
ros2:callback_end

Reference: https://github.com/ros2/ros2_tracing/blob/rolling/doc/design_ros_2.md

Defining new tracepoints

In rclcpp, sampling of ready entities is announced by tracepoint rclcpp_executor_wait_for_work. However, this tracepoint is tied to the C++ executor lifecycle, and R2R does not include executors. In R2R, sampling is done by Node::spin_once. Therefore, to be able to get information about sampling, we define new tracepoints:

r2r:spin_start
r2r:spin_wake
r2r:spin_timeout
r2r:spin_end

These are executed according to the following table:

Spin result	Tracepoint sequence
Some entities are ready	(spin_start) -> (spin_wake) -> (spin_end)
Wait timeouted	(spin_start) -> (spin_timeout)

The tracepoints are generated by crate lttng-ust-generate. Its release version is old and does not support usize, so I have created a PR (sapphire-arches/lttng-ust-rs#5) which fixes it, but the maintainer is not active, so my fork is used instead.

Additionally, the generated code needs to link with lttng-ust library which is only available on Linux. This is the main reason the tracing support is feature-gated.

`tracing` feature flag

Generation, inclusion, and linking are controlled by the tracing feature flag in r2r_tracing, which is also exported by r2r.

Without the feature flag tracing, nothing is imported/linked, and tracepoints are no-ops.

New callback tracing functions

In order to support tracing of callback execution, new functions Node::subscribe_traced, Node::create_service_traced, and Timer::on_tick are added.

Example usage is provided in r2r/examples/callback_tracing.rs.

CC: @wentasah

m-dahl · 2025-04-01T07:58:57Z

Hi Skoudmar. Thanks for the contribution. I just wanted you to know that I have seen this, but I haven't had time to test it (on mac right now). I am not familiar with the ros2 tracing features but I think this looks solid and non-intrusive so definitely open to accepting.

What do you think about the slight API change when you want to trace (i.e changing from stream to callback style)? I realize that you need to call trace_callback_end somewhere, but it becomes a bit of a mismatch and forces you to change more code than simply tacking on _traced to your existing code when you want to enable tracing for something. I also see that we could end up with a lot of different combinations (eg if you wanted to trace a "raw" subscriber). Do you think we can achieve the same thing by wrapping the stream? Upon wrapping we set up the id and we put the tracing logic in poll_next. Then we can have a Traced trait and you would just add .traced() on your existing code. Do you think it could work?

I hope to be able to test it a bit soon.

skoudmar · 2025-04-01T14:22:25Z

Hi @m-dahl, thanks for your feedback!

The callback approach is necessary because ROS's existing tracing API was designed specifically with callbacks in mind. An async function can pause execution via await. This execution pattern can't be properly represented by existing tracepoints. Creating new tracepoints would, in this case, break compatibility with existing tracing tools.

While I did consider wrapping the stream as you suggested, there's a fundamental mismatch between stream semantics and callback semantics. Treating poll_next exits and entries as callback boundaries isn't technically accurate - there may be multiple other async operations happening between these calls that aren't part of what we'd want to consider the callback execution.

However, I agree that wrapping the stream is better for maintainability. So, using a combined approach of using a wrapped stream and also having callbacks offers benefits in maintainability and correctness. This PR already includes something similar for timers.

What about something like this:

// Subscribe as usual
let stream = node.subscribe("/topic", QosProfile::default())?;

// Option 1 (untraced): Use the stream normally like now
let untraced_future = stream.for_each(|msg| {
    println!("Untraced: {msg:?}");
    async {
        // some async code
    }
});

// Option 2 (traced): Convert to traced callback style
// `traced_callback` is a new function on the stream wrapper.
let traced_future = stream.traced_callback(|msg| {
    println!("Traced: {msg:?}");
    // some sync code
});

What do you think? I'd like to hear your feedback before implementing this.

m-dahl · 2025-04-17T07:51:45Z

Hi, again sorry for not being so responsive. Yes I can see that there is a fundamental difference in the stream vs cb style here. I have wondered when the stream design would come back to bite us and it seems the time has come :) But its not so bad, and I think the wrapping approach you present is fine. If you would like to implement it I will merge this PR.

Another thing, should we gate the feature also on cfg(target_os = "linux")?

This change is necessary because rcl uses address of the rcl handle in tracepoints. If the location changes, further publish events will not be correctly assigned to the publisher.

This change is necessary because rcl uses address of the rcl handle in tracepoints. If the location changes further `take` events will not be correctly assigned to the subscriber.

This prevents another service initializing at the same location. Therefore preventing conflicts in a trace.

This prevents another client initializing at the same location. Therefore preventing conflicts in a trace.

The crate contains tracepoints; either defined directly or imported from ros2_tracing project (tracetools package). The crate define a `tracing` feature which should only be used on Linux because it uses LTTng only available on Linux.

The TracingId is used as a wrapper for the pointer to allow implementing Send and Sync traits. The pointer is used only as an ID and is never dereferenced, therefore, it is safe.

Location and arguments of tracepoints is consistent with rclcpp

The StreamWithTracingData wrapper implements both Stream and FusedStream similarly to Receiver

skoudmar · 2025-05-07T10:50:59Z

Hi, I have updated the implementation (and rebased it on the current master) to implement the Stream wrapper. However, I am not sure about its name: StreamWithTracingData. Is it fine or any suggestions?

I have also added a compile error when the feature is used on non-Linux inside build.rs.

skoudmar added 4 commits April 23, 2025 17:12

Make rcl_publisher_t have stable location

5514c47

This change is necessary because rcl uses address of the rcl handle in tracepoints. If the location changes, further publish events will not be correctly assigned to the publisher.

Make rcl_subscription_t have stable location

a51bda8

This change is necessary because rcl uses address of the rcl handle in tracepoints. If the location changes further `take` events will not be correctly assigned to the subscriber.

Make rcl_service_t have stable location

23672cf

This prevents another service initializing at the same location. Therefore preventing conflicts in a trace.

Make rcl_client_t have stable location

e371130

This prevents another client initializing at the same location. Therefore preventing conflicts in a trace.

skoudmar force-pushed the pr-tracing branch from 262b3e3 to b963b70 Compare May 7, 2025 10:20

skoudmar and others added 7 commits May 7, 2025 12:47

Add r2r_tracing crate

cf03fe1

The crate contains tracepoints; either defined directly or imported from ros2_tracing project (tracetools package). The crate define a `tracing` feature which should only be used on Linux because it uses LTTng only available on Linux.

Create TracingId type

2b4f975

The TracingId is used as a wrapper for the pointer to allow implementing Send and Sync traits. The pointer is used only as an ID and is never dereferenced, therefore, it is safe.

Add Callback type to allow tracing callback calls

583c45c

Add tracepoints to r2r

148e1de

Location and arguments of tracepoints is consistent with rclcpp

Create to Stream wrapper with traced_callback method

f77574c

The StreamWithTracingData wrapper implements both Stream and FusedStream similarly to Receiver

Add tracing to parameter handling services

c77b001

Add example showing usage of callback tracing

8aa9e89

skoudmar force-pushed the pr-tracing branch from b963b70 to 8aa9e89 Compare May 7, 2025 10:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for tracing #117

Add support for tracing #117

Uh oh!

skoudmar commented Mar 6, 2025

Uh oh!

m-dahl commented Apr 1, 2025

Uh oh!

skoudmar commented Apr 1, 2025

Uh oh!

m-dahl commented Apr 17, 2025

Uh oh!

skoudmar commented May 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add support for tracing #117

Are you sure you want to change the base?

Add support for tracing #117

Uh oh!

Conversation

skoudmar commented Mar 6, 2025

Motivation

Overview

Stable memory address

Importing tracepoints from ros2_tracing

Defining new tracepoints

tracing feature flag

New callback tracing functions

Uh oh!

m-dahl commented Apr 1, 2025

Uh oh!

skoudmar commented Apr 1, 2025

Uh oh!

m-dahl commented Apr 17, 2025

Uh oh!

skoudmar commented May 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Importing tracepoints from `ros2_tracing`

`tracing` feature flag