-
Notifications
You must be signed in to change notification settings - Fork 7.4k
document performance flags for serve #57845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -17,7 +17,7 @@ This section offers some tips and tricks to improve your Ray Serve application's | |||||||||
|
|
||||||||||
| Ray Serve is built on top of Ray, so its scalability is bounded by Ray’s scalability. See Ray’s [scalability envelope](https://github.com/ray-project/ray/blob/master/release/benchmarks/README.md) to learn more about the maximum number of nodes and other limitations. | ||||||||||
|
|
||||||||||
| ## Debugging performance issues | ||||||||||
| ## Debugging performance issues in request path | ||||||||||
|
|
||||||||||
| The performance issue you're most likely to encounter is high latency or low throughput for requests. | ||||||||||
|
|
||||||||||
|
|
@@ -75,7 +75,85 @@ Ray Serve allows you to fine-tune the backoff behavior of the request router, wh | |||||||||
| - `RAY_SERVE_ROUTER_RETRY_MAX_BACKOFF_S`: The maximum backoff time (in seconds) between retries. Default is `0.5`. | ||||||||||
|
|
||||||||||
|
|
||||||||||
| ### Give the Serve Controller more time to process requests | ||||||||||
| ### Enable throughput-optimized serving | ||||||||||
|
|
||||||||||
| :::{note} | ||||||||||
| In Ray v2.54.0, the defaults for `RAY_SERVE_RUN_USER_CODE_IN_SEPARATE_THREAD` and `RAY_SERVE_RUN_ROUTER_IN_SEPARATE_LOOP` will change to `0` for improved performance. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Typically, I avoid mentioning future state. Since this is a known planned migration, we should announce it. We should also draft customer comms and work with @tg-anyscale to address Anyscale customers. (I understand this is technically opt-in for the breaking change because it's a new Ray version, but still nice to encourage customers to start testing now.)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is not a breaking change from the customer POV, they don't need to take any action to opt into these optimizations.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, I'm confused: won't running the user code in the same loop as the serve code break customer workloads with blocking logic once the default changes (assuming upgrade to Ray 2.54.0)? Or will users not experience a performance degradation relative to now, they just won't see an improvement?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This ^ |
||||||||||
| ::: | ||||||||||
|
|
||||||||||
| This section details how to enable Ray Serve options focused on improving throughput and reducing latency. These configurations focus on the following: | ||||||||||
|
|
||||||||||
| - Reducing overhead associated with frequent logging. | ||||||||||
| - Disabling behavior that allowed Serve applications to include blocking operations. | ||||||||||
|
|
||||||||||
| If your Ray Serve code includes thread blocking operations, you must refactor your code to achieve enhanced throughput. The following table shows examples of blocking and non-blocking code: | ||||||||||
|
|
||||||||||
| <table> | ||||||||||
| <tr> | ||||||||||
| <th>Blocking operation (❌)</th> | ||||||||||
| <th>Non-blocking operation (✅)</th> | ||||||||||
| </tr> | ||||||||||
| <tr> | ||||||||||
| <td> | ||||||||||
|
|
||||||||||
| ```python | ||||||||||
| from ray import serve | ||||||||||
| from fastapi import FastAPI | ||||||||||
| import time | ||||||||||
|
|
||||||||||
| app = FastAPI() | ||||||||||
|
|
||||||||||
| @serve.deployment | ||||||||||
| @serve.ingress(app) | ||||||||||
| class BlockingDeployment: | ||||||||||
| @app.get("/process") | ||||||||||
akshay-anyscale marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
| async def process(self): | ||||||||||
| # ❌ Blocking operation | ||||||||||
| time.sleep(2) | ||||||||||
| return {"message": "Processed (blocking)"} | ||||||||||
|
|
||||||||||
| serve.run(BlockingDeployment.bind()) | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| </td> | ||||||||||
| <td> | ||||||||||
|
|
||||||||||
| ```python | ||||||||||
| from ray import serve | ||||||||||
| from fastapi import FastAPI | ||||||||||
| import asyncio | ||||||||||
|
|
||||||||||
| app = FastAPI() | ||||||||||
|
|
||||||||||
| @serve.deployment | ||||||||||
| @serve.ingress(app) | ||||||||||
| class NonBlockingDeployment: | ||||||||||
| @app.get("/process") | ||||||||||
| async def process(self): | ||||||||||
| # ✅ Non-blocking operation | ||||||||||
| await asyncio.sleep(2) | ||||||||||
| return {"message": "Processed (non-blocking)"} | ||||||||||
|
|
||||||||||
| serve.run(NonBlockingDeployment.bind()) | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| </td> | ||||||||||
| </tr> | ||||||||||
| </table> | ||||||||||
|
|
||||||||||
| To configure all options to the recommended settings, set the environment variable `RAY_SERVE_THROUGHPUT_OPTIMIZED=1`. | ||||||||||
|
|
||||||||||
| You can also configure each option individually. The following table details the recommended configurations and their impact: | ||||||||||
|
|
||||||||||
| | Configured value | Impact | | ||||||||||
| | --- | --- | | ||||||||||
| | `RAY_SERVE_RUN_USER_CODE_IN_SEPARATE_THREAD=0` | Your code runs in the same event loop as the replica's main event loop. You must avoid blocking operations in your request path. Set this configuration to `1` to run your code in a separate event loop, which protects the replica's ability to communicate with the Serve Controller if your code has blocking operations. | | ||||||||||
| | `RAY_SERVE_RUN_ROUTER_IN_SEPARATE_LOOP=0`| The request router runs in the same event loop as the your code's event loop. You must avoid blocking operations in your request path. Set this configuration to `1` to run the router in a separate event loop, which protect Ray Serve's request routing ability when your code has blocking operations | | ||||||||||
| | `RAY_SERVE_REQUEST_PATH_LOG_BUFFER_SIZE=1000` | Sets the log buffer to batch writes to every `1000` logs, flushing the buffer on write. The system always flushes the buffer and writes logs when it detects a line with level ERROR. Set the buffer size to `1` to disable buffering and write logs immediately. | | ||||||||||
| | `RAY_SERVE_LOG_TO_STDERR=0` | Only write logs to files under the `logs/serve/` directory. Proxy, Controller, and Replica logs no longer appear in the console, worker files, or the Actor Logs section of the Ray Dashboard. Set this property to `1` to enable additional logging. | | ||||||||||
|
|
||||||||||
|
|
||||||||||
| ## Debugging performance issues in controller | ||||||||||
|
|
||||||||||
| The Serve Controller runs on the Ray head node and is responsible for a variety of tasks, | ||||||||||
| including receiving autoscaling metrics from other Ray Serve components. | ||||||||||
|
|
||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but not the logging ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging can be made default after #57850 is implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we're making this out in the future 2.54, should we just include the logging one as well then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am inclined to leave it out, because when we implement the time-based logger, we can immediately roll it out without warning to the developer, since it has no perceivable impact.