Skip to content

Conversation

@roiedanino
Copy link
Contributor

@roiedanino roiedanino commented Dec 14, 2025

What

Ignore stale ACKs from previous connections in UD endpoint ACK processing.

Why

After an endpoint reset, stale ACKs from the previous connection can still arrive from the network. These ACKs reference PSNs that are no longer valid for the current connection, causing an assertion failure and crash.

Signed-off-by: Roie Danino <rdanino@nvidia.com>
@roiedanino roiedanino self-assigned this Dec 14, 2025
@roiedanino roiedanino marked this pull request as draft December 14, 2025 16:17
Signed-off-by: Roie Danino <rdanino@nvidia.com>
@roiedanino roiedanino marked this pull request as ready for review December 15, 2025 08:46
return;
}

/* Ignore ACK for packets we never sent. This can happen if a stale ACK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MInor: more concise.
e.g

/* Ignore stale ACKs for unsent packets (e.g., after endpoint reset).
 * Valid ACK PSN must be in (acked_psn, psn). */
 * */


#if UCT_UD_EP_DEBUG_HOOKS
/* Stale ACK PSN to inject - simulates ACK from before endpoint reset */
static volatile uct_ud_psn_t stale_ack_psn_to_inject = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor. Maybe a better design approach would be to have stale_ack_psn_to_inject and inject_stale_ack_psn as class members, rather than static?

Can inherit and specify a sub-class.

Signed-off-by: Roie Danino <rdanino@nvidia.com>
…ed by tx hook

Signed-off-by: Roie Danino <rdanino@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants