Skip to content

Conversation

@awelzel
Copy link
Contributor

@awelzel awelzel commented May 9, 2025

Buffering too much data into _rxbuf results in dispatch() processing to run into a performance issue due to calling _rxbuf.erase() for each frame.

Concretely, if _rxbuf grows large due to a client sending frames very fast, each processing in dispatch() results in moving the remaining buffer to the front.

Instead of restructuring to avoid .erase(), this patch limits the maximum size of _rxbuf to kChunkSize to alleviate the O^2 erase() overhead. It also has the side-effect of keeping the received data in the OS's TCP stack, building up back pressure to the client earlier if the server code can't keep up.

Fixes #429

Buffering too much data into _rxbuf results in dispatch() processing to
run into a performance issue due to calling _rxbuf.erase() for each
frame.

Concretely, if _rxbuf grows large due to a client sending frames very
fast, each processing in dispatch() results in moving the remaining
buffer to the front.

Instead of restructuring to avoid .erase(), this patch limits the
maximum size of _rxbuf to kChunkSize to alleviate the O^2 erase()
overhead. It also has the side-effect of keeping the received data
in the OS's TCP stack, building up back pressure to the client
earlier if the server code can't keep up.

Fixes machinezone#429
@awelzel
Copy link
Contributor Author

awelzel commented May 9, 2025

With this patch, the reproducer code from #429 completes in 3 to 4 seconds.

 /usr/bin/time python3 client-sender.py --url ws://localhost:1234 --messages 500000
0 Hello!
1.57user 1.77system 0:03.36elapsed 99%CPU (0avgtext+0avgdata 24832maxresident)k
0inputs+0outputs (0major+3880minor)pagefaults 0swaps

Without the patch, I've observed it to take up to 60 seconds locally, but depends on how the buffer builds up over time.

I do think the 32k kChunkSize that is read by default is a bit large, but would rather not change too many things.

@awelzel
Copy link
Contributor Author

awelzel commented May 9, 2025

The new server-side flamegraph shows that the .erase() overhead is essentially gone now. There's some overhead visible copying from _rxbuf into the frameData string. Not attempting to fix this here and mostly mentioning that as a future improvement could be to read client side data directly into a frameData buffer instead of using an intermediary _rxbuf in order to avoid the extra copy.

stacks-fix

@bsergean bsergean merged commit 8f00664 into machinezone:master May 14, 2025
7 checks passed
@bsergean
Copy link
Contributor

thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

O(N^2) time complexity bug in WebsocketTransport

2 participants