-
Notifications
You must be signed in to change notification settings - Fork 149
feat(transport): randomize the first packet number #2885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(transport): randomize the first packet number #2885
Conversation
This is mostly the work of @larseggert, I'm just repackaging it. This will target his branch with any changes. This randomizes the starting packet number the client uses for the Initial packet number space. We don't randomize this on the server, since otherwise we'd need even more changes to the tests to account for that. Fixes mozilla#2462. Closes mozilla#2499.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2885 +/- ##
==========================================
- Coverage 95.50% 95.50% -0.01%
==========================================
Files 115 115
Lines 34432 34465 +33
Branches 34432 34465 +33
==========================================
+ Hits 32886 32916 +30
- Misses 1539 1540 +1
- Partials 7 9 +2
|
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to 7ec9d4a. neqo-latest as client
neqo-latest as server
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
|
Client/server transfer resultsPerformance differences relative to 8ce7f9b. Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Download data for |
|
|
|
Never seen it fail before. I thought I was quite conservative with 85% bandwidth usage only.
I re-triggered the benchmark. In case it is intermittent only, and there is no suspicion that this pull request introduces a regression, please ignore the failure. I will take a look tomorrow. |
|
Still failing: |
|
Taking a look. Will try to reproduce locally. |
|
I can reproduce this locally via: Around every 3rd iteration fails. I can not reproduce this on This is surprising. |
|
Is there anything in our CC code that assumes packet numbers start at zero? Just glancing at stuff, |
|
Good question about CC. Worth checking. @mxinden we need the end of the tail to be >256 with non-negligible probability because that is the point at which a lot of the bugs I recently found trigger (because that is what gets the packet number encoding to truncate. What you have never does that. What we want is high probability of being less than 64, so we are typically most efficient, but then something that tends to spread across the range to about 512. How about this? pn(r[0] & 0x1f) + (pn(r[1].saturating_sub(224)) << 5) + 1That's simple: most of the time it's even odds between 1 and 32 (inclusive). Then about 1/8 of the time (tune as you like) the value is evenly distributed from 1 to 1024. At 1024, we're well into the second byte for packet numbers and still well within the 2 byte encoding for varints. That way, we're at about 3/32 (10%) over 256 and 15/128 (a little higher) over 64. |
|
Looking at neqo/neqo-transport/src/cc/classic_cc.rs Lines 379 to 385 in b1c4293
As long as we avoid being app limited (which doesn't depend on packet numbers) we should be OK. Edit: the handshake starts out app limited anyway, because we are often sending less than half the congestion window. That's true before as much as it is now. We should tolerate the random PN easily, because we just bump the value up when sending a packet. I can't see that being the issue. |
|
Thanks for the explainer @martinthomson. Feel free to ignore the below. I mostly want to challenge my own understanding.
If I understand the Intuitively I would have said Both solutions are fine with me, preference for the latter. Maybe you can add a sentence to explain |
Benchmark resultsPerformance differences relative to 8ce7f9b. 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: Change within noise threshold. time: [201.34 ms 201.85 ms 202.51 ms]
thrpt: [493.79 MiB/s 495.41 MiB/s 496.67 MiB/s]
change:
time: [+0.8275% +1.1310% +1.5052%] (p = 0.00 < 0.05)
thrpt: [−1.4828% −1.1184% −0.8207%]
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: Change within noise threshold. time: [303.03 ms 304.54 ms 306.06 ms]
thrpt: [32.673 Kelem/s 32.836 Kelem/s 33.000 Kelem/s]
change:
time: [−1.4792% −0.7693% −0.0296%] (p = 0.04 < 0.05)
thrpt: [+0.0296% +0.7753% +1.5014%]
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected. time: [28.048 ms 28.185 ms 28.342 ms]
thrpt: [35.283 B/s 35.479 B/s 35.653 B/s]
change:
time: [−1.0007% −0.3783% +0.2960%] (p = 0.26 > 0.05)
thrpt: [−0.2951% +0.3798% +1.0108%]
1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: 💔 Performance has regressed. time: [206.79 ms 207.19 ms 207.67 ms]
thrpt: [481.53 MiB/s 482.66 MiB/s 483.58 MiB/s]
change:
time: [+1.0495% +1.3112% +1.5991%] (p = 0.00 < 0.05)
thrpt: [−1.5740% −1.2942% −1.0386%]
decode 4096 bytes, mask ff: No change in performance detected. time: [11.836 µs 11.908 µs 11.998 µs]
change: [−0.5578% +0.0658% +0.7198%] (p = 0.84 > 0.05)
decode 1048576 bytes, mask ff: No change in performance detected. time: [3.0194 ms 3.0286 ms 3.0396 ms]
change: [−0.4484% +0.0287% +0.5151%] (p = 0.89 > 0.05)
decode 4096 bytes, mask 7f: No change in performance detected. time: [19.959 µs 20.014 µs 20.075 µs]
change: [−0.1815% +0.5315% +1.5563%] (p = 0.28 > 0.05)
decode 1048576 bytes, mask 7f: No change in performance detected. time: [5.0434 ms 5.0597 ms 5.0797 ms]
change: [−0.4870% +0.0391% +0.5474%] (p = 0.89 > 0.05)
decode 4096 bytes, mask 3f: No change in performance detected. time: [8.2633 µs 8.2925 µs 8.3284 µs]
change: [−0.3017% +0.1130% +0.5337%] (p = 0.61 > 0.05)
decode 1048576 bytes, mask 3f: No change in performance detected. time: [1.5867 ms 1.5923 ms 1.5993 ms]
change: [−0.4273% +0.1016% +0.6355%] (p = 0.69 > 0.05)
coalesce_acked_from_zero 1+1 entries: No change in performance detected. time: [87.920 ns 88.239 ns 88.566 ns]
change: [−4.8602% −1.9714% +0.0789%] (p = 0.13 > 0.05)
coalesce_acked_from_zero 3+1 entries: No change in performance detected. time: [105.96 ns 106.23 ns 106.54 ns]
change: [−0.2478% +0.2277% +0.8546%] (p = 0.44 > 0.05)
coalesce_acked_from_zero 10+1 entries: No change in performance detected. time: [105.41 ns 106.69 ns 109.21 ns]
change: [−0.5768% +0.4833% +1.8581%] (p = 0.56 > 0.05)
coalesce_acked_from_zero 1000+1 entries: No change in performance detected. time: [88.961 ns 89.074 ns 89.200 ns]
change: [−1.4811% −0.4640% +0.5587%] (p = 0.39 > 0.05)
RxStreamOrderer::inbound_frame(): Change within noise threshold. time: [107.96 ms 108.02 ms 108.08 ms]
change: [+0.0445% +0.1248% +0.2142%] (p = 0.00 < 0.05)
sent::Packets::take_ranges: :green_heart: Performance has improved. time: [5.0216 µs 5.0910 µs 5.1497 µs]
change: [−42.757% −36.585% −20.919%] (p = 0.00 < 0.05)
transfer/pacing-false/varying-seeds: 💚 Performance has improved. time: [28.392 ms 28.437 ms 28.484 ms]
change: [−22.429% −22.175% −21.940%] (p = 0.00 < 0.05)
transfer/pacing-true/varying-seeds: 💚 Performance has improved. time: [30.050 ms 30.130 ms 30.213 ms]
change: [−19.888% −19.575% −19.265%] (p = 0.00 < 0.05)
transfer/pacing-false/same-seed: 💚 Performance has improved. time: [29.848 ms 29.889 ms 29.929 ms]
change: [−18.059% −17.882% −17.710%] (p = 0.00 < 0.05)
transfer/pacing-true/same-seed: 💚 Performance has improved. time: [30.638 ms 30.680 ms 30.723 ms]
change: [−19.941% −19.730% −19.532%] (p = 0.00 < 0.05)
Download data for |
|
OK, I've fixed the fuzzing issue, but the bandwidth benchmark is still stuck. What is most interesting is that disabling randomization doesn't change the outcome. The problem is in other parts of the code. The code on main doesn't get as high a congestion window in my test, but it hits the first loss much sooner. The main code pushes more packets through and loses more of them as a result. I logged the cwnd increases and there are 4k+ times that it is increased by a little on main. On the branch, it increases less than 200 times. So, while the starting point is higher on the branch, it seems like it isn't increasing the congestion window in congestion avoidance anywhere near as much on the branch. |
|
CC @omansfeld, see @martinthomson's latest comment above - got any ideas? |
|
OK, I have it. I don't understand it completely though. It seems like our Cubic implementation is a bit unreliable at dialing in a bandwidth estimate. Stability relies a LOT on the path characteristics being stable and our test here isn't that. On main, we set a target ACK rate of 25.5 times per RTT. That's an awful lot, but it means that we end up with a lot of high quality feedback. This patch changed to 4 times per RTT. That's a big improvement in terms of its ability to operate efficiently. I'm still collecting results, but I'll create a table here:
I'm only logging the cwnd when we leave slow start, which sometimes doesn't happen at all, but generally happens first at close to the same value (~9M in this test). Two interesting things:
This is a very simple simulation, so it's not really indicative of anything in particular, we have a fixed delay and a fixed-depth tail-drop queue that simulates a gigabit link. There's no jitter, we aren't doing PMTUD, and there is no baseline drop rate other than what happens if we exceed the link capacity. I used a fixed seed for the simulator, but that seed won't be used in this set; the variation we're getting is from things like the connection ID size, which can vary between runs. |
|
An observation here is that @mxinden's estimate of achievable throughput seems pretty aggressive. Though the theoretical throughput of a 1Gb/s is high, our overheads are around 48 (IPv6) + 5 (short + 4 byte packet number) + 16 (tag) + 5 (stream frame) = 74, reducing that to around 940Mb/s. But the test seems like it settles on 870–880Mb/s. It seems like the taildrop queue here is a problem, allowing the bandwidth to spike up. I'm going to take another look at the setup and see if it can't be made a little more sensible. |
Oh, how does this patch, i.e. the randomization of packet numbers, lead to a reduction in the ACK frequency? |
It was an accident. I changed the test setup to remove the |
|
OK, I'm going to suggest that we take this problem off to another thread and see if we can't work out what to do about the test separately. I've increased the ACK rate for the test to the maximum so that it passes. There's probably something more to do to fix that when operating at these rates, but that's for future study, not this change. |
💡 never thought about ACKs forcing the sender to smooth over the whole RTT. Thanks. This is helpful.
Agreed. Doesn't need to block this effort. Thank you for investigating!
May I suggest lowering the
|
|
Let's look at tweaking the min_bandwidth setup in a separate effort. This just restores it to where it was before this change. Isolating it, in effect. Then we can look at changing the test in light of what we've just learned here separately. |
|
Where are we with this one? Should we merge this and investigate the CC issues separately? |
|
The CC issues are latent and need attention at some point, but not here. This is good to go. |
|
https://github.com/mozilla/neqo/actions/runs/17235761436?pr=2885 looks pretty sketchy to me @mxinden it looks like the new transfer tests are getting mangled. Can you take a look? |
|
Thanks for the ping. I will take a look. In the meantime, I suggest we revert it. |
We parse the criterion benchmark output, alter it, then post it as a GitHub comment. https://github.com/mozilla/neqo/blob/8ce7f9bcdef89fa5a1eb37db6a2375db9a4531ea/.github/workflows/bench.yml#L163-L187 We want to split by benchmark. The easiest way to do so is through a newline. Criterion only inserts a newline for each benchmark group. Thus split the existing group into two, one per benchmark function (i.e. wallclock-time and simulated-time). Issue raised in mozilla#2885 (comment)
We parse the criterion benchmark output, alter it, then post it as a GitHub comment. https://github.com/mozilla/neqo/blob/8ce7f9bcdef89fa5a1eb37db6a2375db9a4531ea/.github/workflows/bench.yml#L163-L187 We want to split by benchmark. The easiest way to do so is through a newline. Criterion only inserts a newline for each benchmark group. Thus split the existing group into two, one per benchmark function (i.e. wallclock-time and simulated-time). Issue raised in mozilla#2885 (comment)
We parse the criterion benchmark output, alter it, then post it as a GitHub comment. https://github.com/mozilla/neqo/blob/8ce7f9bcdef89fa5a1eb37db6a2375db9a4531ea/.github/workflows/bench.yml#L163-L187 We want to split by benchmark. The easiest way to do so is through a newline. Criterion only inserts a newline for each benchmark group. Thus split the existing group into two, one per benchmark function (i.e. wallclock-time and simulated-time). Issue raised in mozilla#2885 (comment)
…ted time" (#2909) * Reapply "bench(transport/transfer): measure both wallclock and simulated time …" (#2908) This reverts commit 8ce7f9b. * fix(transport/bench): use new group for each benchmark We parse the criterion benchmark output, alter it, then post it as a GitHub comment. https://github.com/mozilla/neqo/blob/8ce7f9bcdef89fa5a1eb37db6a2375db9a4531ea/.github/workflows/bench.yml#L163-L187 We want to split by benchmark. The easiest way to do so is through a newline. Criterion only inserts a newline for each benchmark group. Thus split the existing group into two, one per benchmark function (i.e. wallclock-time and simulated-time). Issue raised in #2885 (comment)
)" This reverts commit 7e58040.
This is mostly the work of @larseggert, I'm just repackaging it. This will target his branch with any changes.
This randomizes the starting packet number the client uses for the Initial packet number space.
We don't randomize this on the server, since otherwise we'd need even more changes to the tests to account for that.
Fixes #2462.
Closes #2499.