Fix: SendAsync callback was not invoked when producer is in reconnecting #1345

nodece · 2025-03-12T17:35:45Z

Motivation

When the producer reconnects, the batching/non-batching messages can not be added to the pending queue, so the timeout error can not be triggered. This leads to situations where SendAsync callbacks are never invoked when the producer is in a reconnecting state, potentially causing resource leaks and unresponsive applications.

Modifications

Separated internal data writing and network requests from the event loop.
- Data flow: SendAsync -> dataChan -> writeChan -> broker.

Fixed payload size handling in TestProducerSendWithContext test:

time="2025-03-16T09:05:13Z" level=error msg="Single message serialize failed %!s(<nil>)" error="encryptedPayload 
exceeds MaxMessageSize, size: 1048607, MaxMessageSize: 1048576" producerID=1 producer_name=standalone-0- 
358 topic="persistent://public/default/my-topic-857833034"

Added a comprehensive test case TestProducerKeepReconnectingAndThenCallSendAsync that verifies:
- SendAsync callbacks are properly invoked with appropriate errors during reconnection.
- The fix works correctly with both batching enabled and disabled configurations.

nodece · 2025-03-16T15:05:04Z

@gunli Could you have a chance to review?

RobertIndie

This PR introduces two separate message writing paths:

Non-batch message: SendAsync -> dataChan
Batch messages: SendAsync -> batchChan -> dataChan

Both paths are asynchronous, which breaks the message order guaranteed by SendAsync.

For example, if we send two messages using SendAsync like this:

producer.SendAsync(context.Background(), &ProducerMessage{
    Payload: []byte("A"),
}, func(_ MessageID, _ *ProducerMessage, err error) {
    errChan <- err
})

producer.SendAsync(context.Background(), &ProducerMessage{
    Payload: []byte("B"),
}, func(_ MessageID, _ *ProducerMessage, err error) {
    errChan <- err
})

We need to ensure the message order is A, B.

pulsar/producer_test.go

nodece · 2025-04-07T04:10:35Z

ping @RobertIndie

pulsar/producer_test.go

pulsar/producer_partition.go

RobertIndie · 2025-04-17T10:57:45Z

pulsar/producer_test.go

+
+	// send again
+	testProducer.SendAsync(context.Background(), &ProducerMessage{
+		Payload: []byte("test"),
+	}, func(_ MessageID, _ *ProducerMessage, err error) {
+		errChan <- err
+	})


Suggested change

// send again

testProducer.SendAsync(context.Background(), &ProducerMessage{

Payload: []byte("test"),

}, func(_ MessageID, _ *ProducerMessage, err error) {

errChan <- err

})

for i := 0; i < 10; i++ {

testProducer.SendAsync(context.Background(), &ProducerMessage{

Payload: []byte("test"),

}, func(id MessageID, producerMessage *ProducerMessage, err error) {

fmt.Println("send async callback", id, producerMessage, err)

})

}

// send again

testProducer.SendAsync(context.Background(), &ProducerMessage{

Payload: []byte("test"),

}, func(_ MessageID, _ *ProducerMessage, err error) {

errChan <- err

})

You can eaisly reproduce the issue by sending some messages before this step. And also set the BatchingMaxMessages of the producer to a smaller value like 5.

RobertIndie · 2025-04-18T08:32:42Z

pulsar/producer_partition.go

 		options:          options,
 		producerID:       client.rpcClient.NewProducerID(),
 		dataChan:         make(chan *sendRequest, maxPendingMessages),
+		writeChan:        make(chan *pendingItem, maxPendingMessages),


This doesn't address the root cause. If we send too many messages, the writeChan can still get stuck, and the issue persists. I can reproduce the problem by producing at least maxPendingMessages after the cluster is shut down.

@RobertIndie Once the internal queue reaches its configured capacity (maxPendingMessages), the producer should either block or reject incoming messages based on the DisableBlockIfQueueFull setting:

If DisableBlockIfQueueFull is false, the producer should block until space becomes available.

If DisableBlockIfQueueFull is true, the producer should fail fast by rejecting new messages.

Proper backpressure handling is essential to avoid unbounded memory usage and ensure the system remains responsive under load or during broker unavailability.

By default, DisableBlockIfQueueFull is set to false, which means the producer will block when the number of pending messages exceeds maxPendingMessages. If it is explicitly set to true, the producer will fail fast with a producer send queue is full error.

I think this topic is different from the current PR and might be better addressed separately. Let’s keep this PR focused on the specific issue it aims to resolve.

nodece · 2025-09-16T04:18:49Z

Closed by #1422

nodece marked this pull request as draft March 12, 2025 17:36

nodece mentioned this pull request Mar 12, 2025

Fix: SendAsync callback was not invoked when producer is in reconnecting #1333

Open

1 task

nodece force-pushed the fix-SendAsync-callback branch 4 times, most recently from f38a7ad to 61794b2 Compare March 16, 2025 07:53

nodece marked this pull request as ready for review March 16, 2025 15:04

nodece requested review from BewareMyPower, RobertIndie and shibd March 16, 2025 15:04

nodece self-assigned this Mar 16, 2025

BewareMyPower added this to the v0.15.0 milestone Mar 31, 2025

RobertIndie requested changes Mar 31, 2025

View reviewed changes

pulsar/producer_test.go Outdated Show resolved Hide resolved

nodece marked this pull request as draft March 31, 2025 17:03

nodece force-pushed the fix-SendAsync-callback branch 5 times, most recently from 926c0f3 to 081e7a5 Compare April 1, 2025 13:26

nodece requested a review from RobertIndie April 1, 2025 13:27

nodece marked this pull request as ready for review April 1, 2025 13:28

Fix: SendAsync callback was not invoked when producer is in reconnecting

687c80a

nodece force-pushed the fix-SendAsync-callback branch from 081e7a5 to 687c80a Compare April 3, 2025 09:51

RobertIndie requested changes Apr 17, 2025

View reviewed changes

Fix writeChan and improve test

933807b

nodece force-pushed the fix-SendAsync-callback branch from edfce2f to 933807b Compare April 18, 2025 07:18

nodece requested a review from RobertIndie April 18, 2025 07:18

RobertIndie requested changes Apr 18, 2025

View reviewed changes

RobertIndie mentioned this pull request Apr 18, 2025

Fix SendAsync won't be timeout during producer reconnection #1356

Closed

RobertIndie modified the milestones: v0.15.0, v0.16.0 May 15, 2025

RobertIndie modified the milestones: v0.16.0, v0.17.0 Jul 29, 2025

RobertIndie mentioned this pull request Sep 11, 2025

fix: SendAsync doesn't respect context and can't timeout during reconnection #1422

Merged

nodece closed this Sep 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: SendAsync callback was not invoked when producer is in reconnecting #1345

Fix: SendAsync callback was not invoked when producer is in reconnecting #1345

Uh oh!

nodece commented Mar 12, 2025 •

edited

Loading

Uh oh!

nodece commented Mar 16, 2025

Uh oh!

RobertIndie left a comment

Uh oh!

Uh oh!

nodece commented Apr 7, 2025

Uh oh!

Uh oh!

Uh oh!

RobertIndie Apr 17, 2025

Uh oh!

nodece Apr 18, 2025

Uh oh!

RobertIndie Apr 18, 2025

Uh oh!

nodece Apr 18, 2025

Uh oh!

nodece commented Sep 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix: SendAsync callback was not invoked when producer is in reconnecting #1345

Fix: SendAsync callback was not invoked when producer is in reconnecting #1345

Uh oh!

Conversation

nodece commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Uh oh!

nodece commented Mar 16, 2025

Uh oh!

RobertIndie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nodece commented Apr 7, 2025

Uh oh!

Uh oh!

Uh oh!

RobertIndie Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

nodece Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

RobertIndie Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

nodece Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

nodece commented Sep 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nodece commented Mar 12, 2025 •

edited

Loading