Add android retry by larkox · Pull Request #131 · mattermost/mattermost-push-proxy

larkox · 2024-12-17T13:25:39Z

Summary

Add android retry to push proxy

Ticket Link

Fix https://mattermost.atlassian.net/browse/MM-56827

agnivade · 2024-12-27T05:52:47Z

Hi @larkox - I've not been feeling well for the past few days. I hope you are not blocked on this review?

agnivade

Also wondering if there is a way to test this by setting up a mock HTTP server or something? I was looking at https://github.com/firebase/firebase-admin-go/blob/v3.13.0/internal/http_client_test.go but they have access to the client's internal fields so it doesn't help us.

agnivade · 2024-12-30T05:10:03Z

server/android_notification_server.go

+		defer cancelRetryContext()
+		_, err = me.client.Send(retryContext, fcmMsg)
+		if me.metrics != nil {
+			me.metrics.observerNotificationResponse(PushNotifyApple, time.Since(start).Seconds())


Wrong platform key.

agnivade · 2024-12-30T05:11:05Z

server/android_notification_server.go

+		if err == nil {
+			break
+		}
+
+		if !isRetryable(err) {
+			break
+		}


Can we combine these in an OR?

agnivade · 2024-12-30T05:12:23Z

server/android_notification_server.go

+		if retries == MAX_RETRIES-1 {
+			me.logger.Errorf("Max retries reached did=%v", fcmMsg.Token)
+			break
+		}


Doesn't the for loop take care of this? Why do we need an extra check? And if we do need it, can we remove the condition from the for loop?

At the last loop, we don't want to wait, that is why the early break.

I would keep the for condition just to be "intention revealing", but completely 0/5 on that.

agnivade · 2024-12-30T05:12:28Z

server/android_notification_server.go

+
+		select {
+		case <-generalContext.Done():
+		case <-time.After(waitTime):


I think we need to incorporate some of this logic as well: https://github.com/firebase/firebase-admin-go/blob/54b81142d35e1b98cfd8687a9e56712c78a0f4ec/internal/http_client.go#L427-L438. We need to get the value of the Retry-After header and set that to be the minimum wait time to retry. Otherwise we will get a rate limited error again.

Doing this will require even more spelunking than what we did to get out the error. I feel the effort to try to get out this value is not worth the risk of retrying before the expected time. We expect this not to happen often, and we hope the delay we are adding is enough.

But if you think otherwise, I can try to look deeper into it.

lieut-data

Only two nits to add, and entirely non-blocking. Deferring to Agniva's exisiting comments on whether more is required or not.

lieut-data · 2025-01-22T15:49:49Z

server/android_notification_server.go

+		start := time.Now()
+
+		retryContext, cancelRetryContext := context.WithTimeout(generalContext, me.retryTimeout)
+		defer cancelRetryContext()


nit: these cancellations queue up to run at when SendNotificationWithRetry returns. I don't think that causes any issues, but a defer in a for loop always gives me pause in case the semantics aren't understood by a future reader. Is it worth considering calling this explicitly at the end of the loop, or wrapping in a closure?

I am 0/5 on this. I feel adding extra boilerplate will just cause more cognitive overload, instead of clarify things.

You are right that having these defer here are a bit counterintuitive (they are not called at the end of the loop, but when the function finishes) but we would need to cancel then on every break, which is also quite boilerplaty. I feel it is OK like this, but I would be happy to hear other options.

lieut-data · 2025-01-22T15:57:14Z

server/android_notification_server.go

+		if generalContext.Err() != nil {
+			logger.Info(
+				"Not retrying because context error",
+				mlog.Int("retry", retries),
+				mlog.Err(generalContext.Err()),
+			)
+			err = generalContext.Err()
+			break
+		}


nit: Might we inline this within the case <-generalContext.Done():? When I first read it, I was thinking about the timer expiring vs. generalContext being done. Err just returns nil if it's not, so this code isn't wrong, but a more idiomatic pattern seems to be:

select { case <-ctx.Done(): return ctx.Err() case ...: }

I added the change, but I am 0/5 about the result. There is something that feels off about returning in a switch inside a loop. But maybe is just that it's been a long time since I do proper GO :P

larkox · 2025-02-04T15:52:31Z

@agnivade @lieut-data I added one extra change to decide whether the error should be retried by adding the condition of it being a context error (we want to retry if the request is taking too long).

I will ask the re-review just in case.

lieut-data · 2025-02-04T16:06:01Z

server/android_notification_server.go

+	if errors.Is(err, context.DeadlineExceeded) {
+		return true
+	}


If it does start happening more often than expected, do we have a way of tracking this?

Apart of the bug reports from notifications arriving more than once, on line 263 we are logging all errors that imply a retry.

Add android retry

3f50444

larkox added the 1: Dev Review Requires review by a core commiter label Dec 17, 2024

larkox requested review from a team and agnivade December 17, 2024 13:25

Fix lint

7317560

lieut-data removed the request for review from a team December 17, 2024 16:28

agnivade reviewed Dec 30, 2024

View reviewed changes

mvitale1989 mentioned this pull request Jan 17, 2025

Fork PRs: do not push the docker image #132

Merged

larkox added 2 commits January 21, 2025 12:40

Merge branch 'master' into addAndroidRetry

8badb13

Address feedback

a3e2257

larkox requested a review from agnivade January 21, 2025 14:08

agnivade approved these changes Jan 22, 2025

View reviewed changes

larkox requested review from a team and lieut-data and removed request for a team January 22, 2025 09:27

lieut-data approved these changes Jan 22, 2025

View reviewed changes

Address feedback

5f684fa

larkox requested review from agnivade and lieut-data February 4, 2025 15:52

lieut-data approved these changes Feb 4, 2025

View reviewed changes

agnivade approved these changes Feb 4, 2025

View reviewed changes

larkox added 2: Reviews Complete All reviewers have approved the pull request and removed 1: Dev Review Requires review by a core commiter labels Feb 4, 2025

larkox merged commit 620c92f into master Feb 4, 2025
5 checks passed

larkox deleted the addAndroidRetry branch February 4, 2025 16:49

Conversation

larkox commented Dec 17, 2024

Summary

Ticket Link

Uh oh!

agnivade commented Dec 27, 2024

Uh oh!

agnivade left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lieut-data left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

larkox commented Feb 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants