feat(transport): randomize the first packet number #2885

martinthomson · 2025-08-21T03:41:10Z

This is mostly the work of @larseggert, I'm just repackaging it. This will target his branch with any changes.

This randomizes the starting packet number the client uses for the Initial packet number space.

We don't randomize this on the server, since otherwise we'd need even more changes to the tests to account for that.

Fixes #2462.
Closes #2499.

@larseggert

This is mostly the work of @larseggert, I'm just repackaging it. This will target his branch with any changes. This randomizes the starting packet number the client uses for the Initial packet number space. We don't randomize this on the server, since otherwise we'd need even more changes to the tests to account for that. Fixes mozilla#2462. Closes mozilla#2499.

codecov · 2025-08-21T03:45:42Z

Codecov Report

❌ Patch coverage is 97.91667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 95.50%. Comparing base (8ce7f9b) to head (f630722).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2885      +/-   ##
==========================================
- Coverage   95.50%   95.50%   -0.01%     
==========================================
  Files         115      115              
  Lines       34432    34465      +33     
  Branches    34432    34465      +33     
==========================================
+ Hits        32886    32916      +30     
- Misses       1539     1540       +1     
- Partials        7        9       +2

Components	Coverage Δ
neqo-common	`98.13% <ø> (ø)`
neqo-crypto	`89.77% <ø> (ø)`
neqo-http3	`94.43% <ø> (ø)`
neqo-qpack	`95.90% <ø> (ø)`
neqo-transport	`96.54% <97.91%> (-0.02%)`	⬇️
neqo-udp	`91.17% <ø> (ø)`

github-actions · 2025-08-21T04:08:03Z

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to 7ec9d4a.

neqo-latest as client

neqo-latest vs. go-x-net: BP BA
neqo-latest vs. haproxy: ⚠️L1 BP BA
neqo-latest vs. kwik: L1 BP BA
neqo-latest vs. linuxquic: L1 C1
neqo-latest vs. lsquic: E L1 C1
neqo-latest vs. msquic: Z A L1 C1
neqo-latest vs. mvfst: A C1 BA
neqo-latest vs. neqo: A
neqo-latest vs. neqo-latest: A 🚀BA
neqo-latest vs. nginx: BP BA
neqo-latest vs. ngtcp2: E CM
neqo-latest vs. picoquic: run cancelled after 20 min
neqo-latest vs. quic-go: A
neqo-latest vs. quiche: BP BA
neqo-latest vs. quinn: A ⚠️L1
neqo-latest vs. s2n-quic: E ⚠️BA CM
neqo-latest vs. tquic: S ⚠️A BP BA
neqo-latest vs. xquic: A L1 C1

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: H DC LR C20 M S R Z 3 B U A 🚀L1 L2 C1 C2 6 V2 BP BA
neqo-latest vs. go-x-net: H DC LR M B U A L2 C2 6
neqo-latest vs. haproxy: H DC LR C20 M S R Z 3 B U A ⚠️L1 L2 C1 C2 6 V2
neqo-latest vs. kwik: H DC LR C20 M S R Z 3 B U A L2 C1 C2 6 V2
neqo-latest vs. linuxquic: H DC LR C20 M S R Z 3 B U E A L2 C2 6 V2 BP BA CM
neqo-latest vs. lsquic: H DC LR C20 M S R Z 3 B U A L2 C2 6 V2 BP BA CM
neqo-latest vs. msquic: H DC LR C20 M S R B U L2 C2 6 V2 BP BA
neqo-latest vs. mvfst: H DC LR M R Z 3 B U L1 L2 C2 6 BP
neqo-latest vs. neqo: H DC LR C20 M S R Z 3 B U E L1 L2 C1 C2 6 V2 BP BA CM
neqo-latest vs. neqo-latest: H DC LR C20 M S R Z 3 B U E L1 L2 C1 C2 6 V2 BP 🚀BA CM
neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. ngtcp2: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2 BP BA
neqo-latest vs. quic-go: H DC LR C20 M S R Z 3 B U L1 L2 C1 C2 6 BP BA
neqo-latest vs. quiche: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. quinn: H DC LR C20 M S R Z 3 B U E ⚠️L1 L2 C1 C2 6 BP BA
neqo-latest vs. s2n-quic: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6 BP ⚠️BA
neqo-latest vs. tquic: H DC LR C20 M R Z 3 B U ⚠️A L1 L2 C1 C2 6
neqo-latest vs. xquic: H DC LR C20 M R Z 3 B U L2 C2 6 BP BA

neqo-latest as server

aioquic vs. neqo-latest: H DC LR C20 M S R Z 3 B A L1 L2 C1 C2 6 V2 BP ⚠️BA
chrome vs. neqo-latest: 3
go-x-net vs. neqo-latest: H DC LR M B U A L2 C2 6 BP BA
kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
linuxquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
lsquic vs. neqo-latest: H DC LR C20 M S R 3 B E A ⚠️L1 L2 C1 C2 6 V2 BP BA CM
msquic vs. neqo-latest: H DC LR C20 M S R Z B U A L1 L2 🚀C1 C2 6 V2 BP
mvfst vs. neqo-latest: H DC LR M 3 B L2 C2 6 BP BA
neqo vs. neqo-latest: H DC LR C20 M S R Z 3 B U E L1 L2 C1 C2 6 V2 BP BA CM
ngtcp2 vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
openssl vs. neqo-latest: H DC C20 S R 3 B L2 C2 6 BP BA
picoquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
quic-go vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 BP BA
quiche vs. neqo-latest: H DC LR M S R Z 3 B A L1 L2 C1 C2 6 BP BA
quinn vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A 🚀L1 L2 ⚠️C1 C2 6 BP BA
s2n-quic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 ⚠️C1 C2 6 BP BA
tquic vs. neqo-latest: H DC LR M S R Z 3 B A L1 L2 C1 C2 6 BP BA
xquic vs. neqo-latest: H DC LR C20 S R Z 3 B U A L1 L2 C1 C2 6 BP BA

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: E CM
neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2 CM
neqo-latest vs. haproxy: E CM
neqo-latest vs. kwik: E CM
neqo-latest vs. msquic: 3 E CM
neqo-latest vs. mvfst: C20 S E V2 CM
neqo-latest vs. nginx: E V2 CM
neqo-latest vs. quic-go: E V2 CM
neqo-latest vs. quiche: E V2 CM
neqo-latest vs. quinn: V2 CM
neqo-latest vs. s2n-quic: Z V2
neqo-latest vs. tquic: E V2 CM
neqo-latest vs. xquic: S E V2 CM

neqo-latest as server

aioquic vs. neqo-latest: U E
chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2 BP BA CM
go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
kwik vs. neqo-latest: E
lsquic vs. neqo-latest: Z U
msquic vs. neqo-latest: 3 E
mvfst vs. neqo-latest: C20 S R U E V2
openssl vs. neqo-latest: Z U E L1 C1 V2
quic-go vs. neqo-latest: E V2
quiche vs. neqo-latest: C20 U E V2
s2n-quic vs. neqo-latest: C20 Z U V2
tquic vs. neqo-latest: C20 U E V2
xquic vs. neqo-latest: E V2

github-actions · 2025-08-21T04:09:49Z

Client/server transfer results

Performance differences relative to 8ce7f9b.

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params)	Mean ± σ	Min	Max	MiB/s ± σ	Δ `main`	Δ `main`
google vs. google	454.6 ± 5.4	447.6	473.9	70.4 ± 5.9
google vs. neqo (cubic, paced)	273.0 ± 4.3	262.8	287.9	117.2 ± 7.4	-0.5	-0.2%
msquic vs. msquic	124.3 ± 12.4	107.4	176.3	257.5 ± 2.6
msquic vs. neqo (cubic, paced)	150.6 ± 31.0	123.9	349.0	212.5 ± 1.0	-0.8	-0.6%
neqo vs. google (cubic, paced)	757.0 ± 4.1	751.1	773.9	42.3 ± 7.8	0.5	0.1%
neqo vs. msquic (cubic, paced)	155.8 ± 4.8	148.1	165.5	205.4 ± 6.7	-0.2	-0.1%
neqo vs. neqo (cubic)	92.0 ± 4.3	84.5	100.4	347.7 ± 7.4	-0.2	-0.2%
neqo vs. neqo (cubic, paced)	92.5 ± 4.3	86.1	103.9	345.9 ± 7.4	0.1	0.1%
neqo vs. neqo (reno)	94.3 ± 5.1	81.6	113.9	339.4 ± 6.3	💔 2.3	2.5%
neqo vs. neqo (reno, paced)	93.8 ± 4.2	83.9	102.5	341.0 ± 7.6	0.7	0.7%
neqo vs. quiche (cubic, paced)	193.2 ± 3.5	188.9	204.5	165.6 ± 9.1	-0.8	-0.4%
neqo vs. s2n (cubic, paced)	220.4 ± 4.6	212.4	227.6	145.2 ± 7.0	💚 -1.9	-0.8%
quiche vs. neqo (cubic, paced)	147.9 ± 4.6	137.9	156.9	216.4 ± 7.0	-0.9	-0.6%
quiche vs. quiche	144.6 ± 4.7	136.3	158.8	221.2 ± 6.8
s2n vs. neqo (cubic, paced)	171.2 ± 4.8	161.3	181.1	186.9 ± 6.7	0.7	0.4%
s2n vs. s2n	251.5 ± 31.3	231.5	361.1	127.2 ± 1.0

Download data for profiler.firefox.com or download performance comparison data.

larseggert · 2025-08-21T08:55:40Z

benches/min_bandwidth.rs failed. Not sure if this is a fluke. @mxinden?

mxinden · 2025-08-21T15:45:45Z

Never seen it fail before. I thought I was quite conservative with 85% bandwidth usage only.

neqo/neqo-transport/benches/min_bandwidth.rs

Line 38 in c535de9

const MINIMUM_EXPECTED_UTILIZATION: f64 = 0.85;

I re-triggered the benchmark. In case it is intermittent only, and there is no suspicion that this pull request introduces a regression, please ignore the failure. I will take a look tomorrow.

neqo-transport/src/crypto.rs

larseggert · 2025-08-22T08:23:20Z

Still failing:

     Running benches/min_bandwidth.rs (target/debug/deps/min_bandwidth-05de51a29f1fe196)

thread 'main' (6959) panicked at neqo-transport/benches/min_bandwidth.rs:73:5:
expected to reach 0.85 of maximum bandwidth (1000 Mbit/s) but got 811.3804914445187 Mbit/s
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: bench failed, to rerun pass `-p neqo-transport --bench min_bandwidth`
Error: Process completed with exit code 101.

mxinden · 2025-08-22T08:30:36Z

Taking a look. Will try to reproduce locally.

mxinden · 2025-08-22T09:37:49Z

I can reproduce this locally via:

while cargo bench --bench min_bandwidth --features bench;do :; done

Around every 3rd iteration fails.

I can not reproduce this on main.

This is surprising.

larseggert · 2025-08-22T09:46:11Z

Is there anything in our CC code that assumes packet numbers start at zero?

Just glancing at stuff, first_app_limited is for example initialized to 0.

martinthomson · 2025-08-22T10:50:12Z

Good question about CC. Worth checking.

@mxinden we need the end of the tail to be >256 with non-negligible probability because that is the point at which a lot of the bugs I recently found trigger (because that is what gets the packet number encoding to truncate.

What you have never does that. What we want is high probability of being less than 64, so we are typically most efficient, but then something that tends to spread across the range to about 512. How about this?

pn(r[0] & 0x1f) + (pn(r[1].saturating_sub(224)) << 5) + 1

That's simple: most of the time it's even odds between 1 and 32 (inclusive). Then about 1/8 of the time (tune as you like) the value is evenly distributed from 1 to 1024. At 1024, we're well into the second byte for packet numbers and still well within the 2 byte encoding for varints.

That way, we're at about 3/32 (10%) over 256 and 15/128 (a little higher) over 64.

martinthomson · 2025-08-22T10:53:15Z

Looking at first_app_limited, we start out not in the app limited state, so this code should just move the value up as needed:

neqo/neqo-transport/src/cc/classic_cc.rs

Lines 379 to 385 in b1c4293

    
           if !self.app_limited() { 
        
               // Given the current non-app-limited condition, we're fully utilizing the congestion 
        
               // window. Assume that all in-flight packets up to this one are NOT app-limited. 
        
               // However, subsequent packets might be app-limited. Set `first_app_limited` to the 
        
               // next packet number. 
        
               self.first_app_limited = pkt.pn() + 1; 
        
           }

As long as we avoid being app limited (which doesn't depend on packet numbers) we should be OK.

Edit: the handshake starts out app limited anyway, because we are often sending less than half the congestion window. That's true before as much as it is now. We should tolerate the random PN easily, because we just bump the value up when sending a packet. I can't see that being the issue.

mxinden · 2025-08-22T12:37:45Z

Thanks for the explainer @martinthomson. Feel free to ignore the below. I mostly want to challenge my own understanding.

pn(r[0] & 0x1f) + (pn(r[1].saturating_sub(224)) << 5) + 1

If I understand the saturating_sub(224) correctly it gives us a range of 0..31 in the bottom 5 bits. With the << 5 and r[0]r & 0x1f we thus get the "high probability of being less than 64", while occasionally going up to 1024.

Intuitively I would have said saturating_sub(224) is the same as a >> 3 followed by a << 5, but that would give us a uniform distribution across the possible values, instead of a distribution heavily biased towards 0 (i.e. with r[0] & 0x1f heavily biased towards < 64).

Both solutions are fine with me, preference for the latter. Maybe you can add a sentence to explain saturating_sub(224) for future me.

github-actions · 2025-08-25T00:52:07Z

Benchmark results

Performance differences relative to 8ce7f9b.

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: Change within noise threshold.

       time:   [201.34 ms 201.85 ms 202.51 ms]
       thrpt:  [493.79 MiB/s 495.41 MiB/s 496.67 MiB/s]
change:
       time:   [+0.8275% +1.1310% +1.5052%] (p = 0.00 < 0.05)
       thrpt:  [−1.4828% −1.1184% −0.8207%]
Found 2 outliers among 100 measurements (2.00%)

1 (1.00%) low mild

1 (1.00%) high severe

1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: Change within noise threshold.

       time:   [303.03 ms 304.54 ms 306.06 ms]
       thrpt:  [32.673 Kelem/s 32.836 Kelem/s 33.000 Kelem/s]
change:
       time:   [−1.4792% −0.7693% −0.0296%] (p = 0.04 < 0.05)
       thrpt:  [+0.0296% +0.7753% +1.5014%]

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.

       time:   [28.048 ms 28.185 ms 28.342 ms]
       thrpt:  [35.283   B/s 35.479   B/s 35.653   B/s]
change:
       time:   [−1.0007% −0.3783% +0.2960%] (p = 0.26 > 0.05)
       thrpt:  [−0.2951% +0.3798% +1.0108%]
Found 5 outliers among 100 measurements (5.00%)

5 (5.00%) high severe

1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: 💔 Performance has regressed.

       time:   [206.79 ms 207.19 ms 207.67 ms]
       thrpt:  [481.53 MiB/s 482.66 MiB/s 483.58 MiB/s]
change:
       time:   [+1.0495% +1.3112% +1.5991%] (p = 0.00 < 0.05)
       thrpt:  [−1.5740% −1.2942% −1.0386%]
Found 2 outliers among 100 measurements (2.00%)

2 (2.00%) high severe

decode 4096 bytes, mask ff: No change in performance detected.

       time:   [11.836 µs 11.908 µs 11.998 µs]
       change: [−0.5578% +0.0658% +0.7198%] (p = 0.84 > 0.05)
Found 19 outliers among 100 measurements (19.00%)

2 (2.00%) low severe

3 (3.00%) low mild

2 (2.00%) high mild

12 (12.00%) high severe

decode 1048576 bytes, mask ff: No change in performance detected.

       time:   [3.0194 ms 3.0286 ms 3.0396 ms]
       change: [−0.4484% +0.0287% +0.5151%] (p = 0.89 > 0.05)
Found 9 outliers among 100 measurements (9.00%)

1 (1.00%) low mild

8 (8.00%) high severe

decode 4096 bytes, mask 7f: No change in performance detected.

       time:   [19.959 µs 20.014 µs 20.075 µs]
       change: [−0.1815% +0.5315% +1.5563%] (p = 0.28 > 0.05)
Found 16 outliers among 100 measurements (16.00%)

16 (16.00%) high severe

decode 1048576 bytes, mask 7f: No change in performance detected.

       time:   [5.0434 ms 5.0597 ms 5.0797 ms]
       change: [−0.4870% +0.0391% +0.5474%] (p = 0.89 > 0.05)
Found 15 outliers among 100 measurements (15.00%)

2 (2.00%) high mild

13 (13.00%) high severe

decode 4096 bytes, mask 3f: No change in performance detected.

       time:   [8.2633 µs 8.2925 µs 8.3284 µs]
       change: [−0.3017% +0.1130% +0.5337%] (p = 0.61 > 0.05)
Found 12 outliers among 100 measurements (12.00%)

3 (3.00%) low mild

2 (2.00%) high mild

7 (7.00%) high severe

decode 1048576 bytes, mask 3f: No change in performance detected.

       time:   [1.5867 ms 1.5923 ms 1.5993 ms]
       change: [−0.4273% +0.1016% +0.6355%] (p = 0.69 > 0.05)
Found 8 outliers among 100 measurements (8.00%)

1 (1.00%) high mild

7 (7.00%) high severe

coalesce_acked_from_zero 1+1 entries: No change in performance detected.

       time:   [87.920 ns 88.239 ns 88.566 ns]
       change: [−4.8602% −1.9714% +0.0789%] (p = 0.13 > 0.05)
Found 10 outliers among 100 measurements (10.00%)

6 (6.00%) high mild

4 (4.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.

       time:   [105.96 ns 106.23 ns 106.54 ns]
       change: [−0.2478% +0.2277% +0.8546%] (p = 0.44 > 0.05)
Found 10 outliers among 100 measurements (10.00%)

3 (3.00%) high mild

7 (7.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.

       time:   [105.41 ns 106.69 ns 109.21 ns]
       change: [−0.5768% +0.4833% +1.8581%] (p = 0.56 > 0.05)
Found 13 outliers among 100 measurements (13.00%)

2 (2.00%) low severe

4 (4.00%) low mild

3 (3.00%) high mild

4 (4.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.

       time:   [88.961 ns 89.074 ns 89.200 ns]
       change: [−1.4811% −0.4640% +0.5587%] (p = 0.39 > 0.05)
Found 9 outliers among 100 measurements (9.00%)

2 (2.00%) high mild

7 (7.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.

       time:   [107.96 ms 108.02 ms 108.08 ms]
       change: [+0.0445% +0.1248% +0.2142%] (p = 0.00 < 0.05)
Found 11 outliers among 100 measurements (11.00%)

7 (7.00%) low mild

3 (3.00%) high mild

1 (1.00%) high severe

sent::Packets::take_ranges: :green_heart: Performance has improved.

       time:   [5.0216 µs 5.0910 µs 5.1497 µs]
       change: [−42.757% −36.585% −20.919%] (p = 0.00 < 0.05)
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high severe

transfer/pacing-false/varying-seeds: 💚 Performance has improved.

       time:   [28.392 ms 28.437 ms 28.484 ms]
       change: [−22.429% −22.175% −21.940%] (p = 0.00 < 0.05)
Found 4 outliers among 100 measurements (4.00%)

4 (4.00%) high mild

transfer/pacing-true/varying-seeds: 💚 Performance has improved.

       time:   [30.050 ms 30.130 ms 30.213 ms]
       change: [−19.888% −19.575% −19.265%] (p = 0.00 < 0.05)

transfer/pacing-false/same-seed: 💚 Performance has improved.

       time:   [29.848 ms 29.889 ms 29.929 ms]
       change: [−18.059% −17.882% −17.710%] (p = 0.00 < 0.05)
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) low mild

transfer/pacing-true/same-seed: 💚 Performance has improved.

       time:   [30.638 ms 30.680 ms 30.723 ms]
       change: [−19.941% −19.730% −19.532%] (p = 0.00 < 0.05)
Found 2 outliers among 100 measurements (2.00%)

2 (2.00%) high mild

Download data for profiler.firefox.com or download performance comparison data.

martinthomson · 2025-08-25T04:52:08Z

OK, I've fixed the fuzzing issue, but the bandwidth benchmark is still stuck. What is most interesting is that disabling randomization doesn't change the outcome. The problem is in other parts of the code.

The code on main doesn't get as high a congestion window in my test, but it hits the first loss much sooner. The main code pushes more packets through and loses more of them as a result.

I logged the cwnd increases and there are 4k+ times that it is increased by a little on main. On the branch, it increases less than 200 times. So, while the starting point is higher on the branch, it seems like it isn't increasing the congestion window in congestion avoidance anywhere near as much on the branch.

larseggert · 2025-08-25T05:55:24Z

CC @omansfeld, see @martinthomson's latest comment above - got any ideas?

martinthomson · 2025-08-25T06:12:04Z

OK, I have it. I don't understand it completely though. It seems like our Cubic implementation is a bit unreliable at dialing in a bandwidth estimate. Stability relies a LOT on the path characteristics being stable and our test here isn't that.

On main, we set a target ACK rate of 25.5 times per RTT. That's an awful lot, but it means that we end up with a lot of high quality feedback. This patch changed to 4 times per RTT. That's a big improvement in terms of its ability to operate efficiently.

I'm still collecting results, but I'll create a table here:

ACKS per RTT	Bandwidth	SSthresh	Drops
1	837.8016171598957	n/a	0
2	869.3036825291894	9710318, 6797222, 4768404	2199
3	833.6612203683653	8987533	105
4	812.1666041813006	9164203	312
6	859.5269729211594	9152156, 6406509, 4494042	1771
8	866.8427432985857	8483680	40
12	876.1943349293886	12511871, 8758309, 6142027	3617
16	874.6212767488254	12508118, 8755682, 6668839	3130
20	876.2535979874053	12518751, 8763125, 6145398	3874
24	870.5149282368627	12361204, 8652842, 6667568, 4681958	3813

I'm only logging the cwnd when we leave slow start, which sometimes doesn't happen at all, but generally happens first at close to the same value (~9M in this test). Two interesting things:

We seem to sit around the same point once we hit 10-ish acks per RTT, whereas it is unstable before that point.
The state seems to exit slow start more often as the bandwidth increases in this test. Why the congestion controller goes back to slow start is not obvious without more logging (I can't log very much here without the logs being very large).

This is a very simple simulation, so it's not really indicative of anything in particular, we have a fixed delay and a fixed-depth tail-drop queue that simulates a gigabit link. There's no jitter, we aren't doing PMTUD, and there is no baseline drop rate other than what happens if we exceed the link capacity. I used a fixed seed for the simulator, but that seed won't be used in this set; the variation we're getting is from things like the connection ID size, which can vary between runs.

martinthomson · 2025-08-25T06:29:28Z

An observation here is that @mxinden's estimate of achievable throughput seems pretty aggressive. Though the theoretical throughput of a 1Gb/s is high, our overheads are around 48 (IPv6) + 5 (short + 4 byte packet number) + 16 (tag) + 5 (stream frame) = 74, reducing that to around 940Mb/s. But the test seems like it settles on 870–880Mb/s.

It seems like the taildrop queue here is a problem, allowing the bandwidth to spike up. I'm going to take another look at the setup and see if it can't be made a little more sensible.

mxinden · 2025-08-25T06:31:29Z

This patch changed to 4 times per RTT.

Oh, how does this patch, i.e. the randomization of packet numbers, lead to a reduction in the ACK frequency?

martinthomson · 2025-08-25T06:34:15Z

Oh, how does this patch, i.e. the randomization of packet numbers, lead to a reduction in the ACK frequency?

It was an accident. I changed the test setup to remove the ack_rate(255) in the setup. Which was something that couldn't be overridden, so it seemed like a good idea to remove it. But then I failed to notice that this was a pretty substantial change.

martinthomson · 2025-08-25T07:30:22Z

OK, I'm going to suggest that we take this problem off to another thread and see if we can't work out what to do about the test separately. I've increased the ACK rate for the test to the maximum so that it passes. There's probably something more to do to fix that when operating at these rates, but that's for future study, not this change.

mxinden · 2025-08-25T07:36:24Z

My understanding of the way that this ends up working in practice is that once you hit steady state, you are ACK-clocked.

💡 never thought about ACKs forcing the sender to smooth over the whole RTT. Thanks. This is helpful.

OK, I'm going to suggest that we take this problem off to another thread and see if we can't work out what to do about the test separately.

Agreed. Doesn't need to block this effort. Thank you for investigating!

I've increased the ACK rate for the test to the maximum so that it passes. There's probably something more to do to fix that when operating at these rates, but that's for future study, not this change.

May I suggest lowering the MINIMUM_EXPECTED_UTILIZATION instead? That is simpler to reason about and does not deviate from the default configuration we use in Firefox.

neqo/neqo-transport/benches/min_bandwidth.rs

Line 38 in 1b16f83

const MINIMUM_EXPECTED_UTILIZATION: f64 = 0.85;

martinthomson · 2025-08-25T08:56:18Z

Let's look at tweaking the min_bandwidth setup in a separate effort. This just restores it to where it was before this change. Isolating it, in effect. Then we can look at changing the test in light of what we've just learned here separately.

larseggert · 2025-08-26T09:38:26Z

Where are we with this one? Should we merge this and investigate the CC issues separately?

martinthomson · 2025-08-26T10:46:08Z

The CC issues are latent and need attention at some point, but not here. This is good to go.

martinthomson · 2025-08-26T11:59:31Z

https://github.com/mozilla/neqo/actions/runs/17235761436?pr=2885 looks pretty sketchy to me @mxinden it looks like the new transfer tests are getting mangled. Can you take a look?

mxinden · 2025-08-26T12:49:39Z

Thanks for the ping. I will take a look. In the meantime, I suggest we revert it.

#2908

We parse the criterion benchmark output, alter it, then post it as a GitHub comment. https://github.com/mozilla/neqo/blob/8ce7f9bcdef89fa5a1eb37db6a2375db9a4531ea/.github/workflows/bench.yml#L163-L187 We want to split by benchmark. The easiest way to do so is through a newline. Criterion only inserts a newline for each benchmark group. Thus split the existing group into two, one per benchmark function (i.e. wallclock-time and simulated-time). Issue raised in mozilla#2885 (comment)

…ted time" (#2909) * Reapply "bench(transport/transfer): measure both wallclock and simulated time …" (#2908) This reverts commit 8ce7f9b. * fix(transport/bench): use new group for each benchmark We parse the criterion benchmark output, alter it, then post it as a GitHub comment. https://github.com/mozilla/neqo/blob/8ce7f9bcdef89fa5a1eb37db6a2375db9a4531ea/.github/workflows/bench.yml#L163-L187 We want to split by benchmark. The easiest way to do so is through a newline. Criterion only inserts a newline for each benchmark group. Thus split the existing group into two, one per benchmark function (i.e. wallclock-time and simulated-time). Issue raised in #2885 (comment)

)" This reverts commit 7e58040.

martinthomson requested review from KershawChang, larseggert and mxinden as code owners August 21, 2025 03:41

larseggert enabled auto-merge August 21, 2025 08:53

larseggert approved these changes Aug 21, 2025

View reviewed changes

larseggert disabled auto-merge August 21, 2025 08:55

mxinden reviewed Aug 22, 2025

View reviewed changes

neqo-transport/src/crypto.rs Outdated Show resolved Hide resolved

neqo-transport/src/crypto.rs Outdated Show resolved Hide resolved

neqo-transport/src/crypto.rs Outdated Show resolved Hide resolved

martinthomson added 2 commits August 25, 2025 09:42

Use a stepped distribution

59af9e9

Handle two byte packet numbers in the fuzzer

48912ee

Merge branch 'main' into first-pn-randomization

ee3c078

larseggert mentioned this pull request Aug 26, 2025

feat: Randomize CI packet number #2499

Closed

Merge branch 'main' into first-pn-randomization

f6be604

martinthomson enabled auto-merge August 26, 2025 10:46

martinthomson added this pull request to the merge queue Aug 26, 2025

larseggert mentioned this pull request Aug 26, 2025

Investigate CC issues w/CUBIC #2907

Open

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 26, 2025

mxinden mentioned this pull request Aug 26, 2025

Revert "bench(transport/transfer): measure both wallclock and simulated time" #2908

Merged

mxinden mentioned this pull request Aug 26, 2025

Reapply "bench(transport/transfer): measure both wallclock and simulated time" #2909

Merged

Merge branch 'main' into first-pn-randomization

f630722

mxinden changed the title ~~Randomize the first packet number~~ feat(transport): randomize the first packet number Aug 26, 2025

mxinden added this pull request to the merge queue Aug 26, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 26, 2025

martinthomson added this pull request to the merge queue Aug 27, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 27, 2025

martinthomson merged commit 7e58040 into mozilla:main Aug 27, 2025
81 of 82 checks passed

martinthomson deleted the first-pn-randomization branch August 27, 2025 00:41

mxinden added a commit to mxinden/neqo that referenced this pull request Sep 3, 2025

Revert "feat(transport): randomize the first packet number (mozilla#2885

ee272aa

)" This reverts commit 7e58040.

This was referenced Sep 17, 2025

chore: prepare v0.16.0 #2995

Merged

Running firefox in debug mode shows multiple "[recovery::Loss] ignoring in-2 from dropped space" warning #2996

Closed

feat(transport): randomize the first packet number #2885

feat(transport): randomize the first packet number #2885

Uh oh!

Conversation

martinthomson commented Aug 21, 2025

Uh oh!

codecov bot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Failed Interop Tests

neqo-latest as client

neqo-latest as server

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

Uh oh!

github-actions bot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Client/server transfer results

Uh oh!

larseggert commented Aug 21, 2025

Uh oh!

mxinden commented Aug 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

larseggert commented Aug 22, 2025

Uh oh!

mxinden commented Aug 22, 2025

Uh oh!

mxinden commented Aug 22, 2025

Uh oh!

larseggert commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinthomson commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinthomson commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mxinden commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark results

Uh oh!

martinthomson commented Aug 25, 2025

Uh oh!

larseggert commented Aug 25, 2025

Uh oh!

martinthomson commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinthomson commented Aug 25, 2025

Uh oh!

mxinden commented Aug 25, 2025

Uh oh!

martinthomson commented Aug 25, 2025

Uh oh!

martinthomson commented Aug 25, 2025

Uh oh!

mxinden commented Aug 25, 2025

Uh oh!

martinthomson commented Aug 25, 2025

Uh oh!

larseggert commented Aug 26, 2025

Uh oh!

martinthomson commented Aug 26, 2025

Uh oh!

Uh oh!

martinthomson commented Aug 26, 2025

codecov bot commented Aug 21, 2025 •

edited

Loading

github-actions bot commented Aug 21, 2025 •

edited

Loading

github-actions bot commented Aug 21, 2025 •

edited

Loading

larseggert commented Aug 22, 2025 •

edited

Loading

martinthomson commented Aug 22, 2025 •

edited

Loading

martinthomson commented Aug 22, 2025 •

edited

Loading

mxinden commented Aug 22, 2025 •

edited

Loading

github-actions bot commented Aug 25, 2025 •

edited

Loading

martinthomson commented Aug 25, 2025 •

edited

Loading