Skip to content

Ray IPv6 support#44252

Closed
samrocketman wants to merge 15 commits intoray-project:masterfrom
samrocketman:ipv6
Closed

Ray IPv6 support#44252
samrocketman wants to merge 15 commits intoray-project:masterfrom
samrocketman:ipv6

Conversation

@samrocketman
Copy link
Copy Markdown

@samrocketman samrocketman commented Mar 23, 2024

Why are these changes needed?

Provides support for IPv6 only and dual-stack IPv4/IPv6.

Inspired by this discussion https://discuss.ray.io/t/lack-of-ipv6-support/2323

IPv6 is much more common than it used to be; in some cases the only available stack is IPv6. Commonly machines come with dual-stack connectivity enabled (IPv4 and IPv6 available at the same time).

I've already reviewed the source code of kuberay. I've determined due to its use of native Go net methods kuberay already supports IPv6.

Design

  • See python/ray/_private/net.py. Centralize networking code to make it easier to support IPv6 alongside IPv4. This is simpler and clarifies where in your network code you need to support this when using low level python sockets.
  • Favor IPv4 since that is what this project is typically used to; code will fall back to IPv6 if IPv4 is not available.
  • It uses dual-stack IPv4 and IPv6 when it is available; for example when listening on a port across all interfaces or determining free ports.
  • All places I could find for string parsing IP or host and port have been substituted. With a central method, updates can be made from which all dependent areas will benefit.

Shell code

I notice the following scripts may or may not work with IPv6. They should probably updated using bash parameter expansion.

  • doc/source/cluster/doc_code/slurm-basic.sh
  • doc/source/cluster/doc_code/slurm-template.sh

If I update them; it will be in a separate PR.

For example, the following bash code properly parses both IPv4 and IPv6 ip:port into two parts.

# IPv6 loopback port 8080
ip_and_port=::1:8080
ip="${ip_and_port%:*}"
port="${ip_and_port##*:}"

# IPv4 loopback port 8080
ip_and_port=127.0.0.1:8080
ip="${ip_and_port%:*}"
port="${ip_and_port##*:}"

Related issue number

I don't know if it will close all of these issues but I did an issues search for ipv6.

Duplicated by

Checks

Initially, this pull request is in-draft. This is to enable reviewing sooner and manual testing to occur. Automated tests to follow.

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in doc/source/tune/api/ under the corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@samrocketman samrocketman marked this pull request as draft March 23, 2024 01:29
Why are these changes needed?
-----------------------------

Provides support for IPv6 only and dual-stack IPv4/IPv6.

Inspired by this discussion https://discuss.ray.io/t/lack-of-ipv6-support/2323

IPv6 is much more common than it used to be; in some cases the only available stack is IPv6.  Commonly machines come with dual-stack connectivity enabled (IPv4 and IPv6 available at the same time).

I've already reviewed the source code of [kuberay][kuberay].  I've determined due to its use of native Go net methods kuberay already supports IPv6.

Design
------

- Centralize networking code to make it easier to support IPv6 alongside IPv4.  This is simpler and clarifies where in your network code you need to support this when using low level python sockets.
- Favor IPv4 since that is what this project is typically used to; code will fall back to IPv6 if IPv4 is not available.
- It uses dual-stack IPv4 and IPv6 when it is available; for example when listening on a port across all interfaces.

Shell code
----------

I notice the following scripts may or may not work with IPv6.  They should probably updated using [bash parameter expansion][bash-pe].

* `doc/source/cluster/doc_code/slurm-basic.sh`
* `doc/source/cluster/doc_code/slurm-template.sh`

If I update them; it will be in a separate PR.

For example, the following bash code properly parses both IPv4 and IPv6 `ip:port` into two parts.

    # IPv6 loopback port 8080
    ip_and_port=::1:8080
    ip="${ip_and_port%:*}"
    port="${ip_and_port##*:}"

    # IPv4 loopback port 8080
    ip_and_port=127.0.0.1:8080
    ip="${ip_and_port%:*}"
    port="${ip_and_port##*:}"

Related issue number
--------------------

I don't know if it will close all of these issues but I did an issues search for `ipv6`.

* closes ray-project#6967
* ray-project#43416
* ray-project#41966
* ray-project#43910

Checks
------

Initially, this pull request is in-draft.  This is to enable reviewing sooner and manual testing to occur.  Automated tests to follow.

- [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
    - [ ] I've added any new APIs to the API Reference. For example, if I added a
           method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [x] This PR is not tested :(

[bash-pe]: https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html
[kuberay]: https://github.com/ray-project/kuberay

Signed-off-by: Sam Gleske <sam.mxracer@gmail.com>
* Fixed a few function calls missing net import.
* Missed ip parse
* Fix net.py syntax
* Fix tls_utils.py syntax

Signed-off-by: Sam Gleske <sam.mxracer@gmail.com>
Misuse of dualstack.  It should only be used when listening for a port on
all interfaces.

flake8 fixes

Signed-off-by: Sam Gleske <sam.mxracer@gmail.com>
Copy link
Copy Markdown
Author

@samrocketman samrocketman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another review pass fixing issues I learned studying how Python sockets support IPv6.

I thought dualstack works when binding with a host but it does not (e.g. localhost dualstack only binds to IPv6 ::1 but not 127.0.0.1).

Binding dualstack only works when binding a port to all interfaces i.e. sock.bind(('', port)).

Copy link
Copy Markdown
Author

@samrocketman samrocketman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appear to be misusing @staticmethod.

@samrocketman
Copy link
Copy Markdown
Author

I guess my issues with staticmethod are best described here https://stackoverflow.com/questions/1859959/static-methods-how-to-call-a-method-from-another-method

@samrocketman
Copy link
Copy Markdown
Author

After messing around with @staticmethod across different versions of python; the way I used it (on a file of functions without a class definition) worked in Python 3.10 but not Python 3.9. Rather than trying to declare the methods static which would work across multiple versions of python I decided to remove the symbol from the private net module.

@samrocketman samrocketman force-pushed the ipv6 branch 2 times, most recently from b3adfab to 2e841fb Compare March 24, 2024 23:09
- Bugfix python/ray/util/client/server/proxier.py not obtaining local_ip address from localhost.
- Remove use of staticmethod; the way I used it isn't compatible below Python 3.10
- Supprt parsing IPv6 URL format `proto://[ip]:port/` in net.py
- Fixed API linting issues in net.py
- Fix pylint issues

Signed-off-by: Sam Gleske <sam.mxracer@gmail.com>
Signed-off-by: Sam Gleske <875669+samrocketman@users.noreply.github.com>
@samrocketman samrocketman changed the title [Core] Ray IPv6 support Ray IPv6 support Mar 25, 2024
- More pylint changes.
- Exception handling added for existing tests.

Signed-off-by: Sam Gleske <sam.mxracer@gmail.com>
Some methods pass bytes instead of strings when parsing IP and port.  I
added extra type safety instead of assuming.

Fixed some pylint issues.

Trying to use net class in pxi.

Signed-off-by: Sam Gleske <sam.mxracer@gmail.com>
@jjyao jjyao added the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label May 13, 2025
@samrocketman
Copy link
Copy Markdown
Author

samrocketman commented May 15, 2025

We took a closer look. Unfortunately, it seems to go much deeper than a localized fix. While your change correctly parses the address as IPv6, it gets rejected shortly after. It turns out the broader Ray codebase assumes IPv4 in many places, and parsing is only the beginning. Changes are probably needed in most services, as well as supporting code (SSL...)

Thanks for taking a look. I will likely continue chasing all of the broader changes in this pull request and continue to iterate until I succeed with an IPv6 supported build.

So, from what I've seen, supporting IPv6 across Ray would require a more comprehensive initiative, which is quite a bit more effort than we can realistically take on at this point.

That's okay I appreciate your review.

It might make sense to raise this with the Ray maintainers, see if IPv6 support could be prioritized or scoped out officially.

This PR is raised. I think, in general, maintainer support is a hard requirement. Since I initially opened this pull request there have been a few merged PRs which added socket code which would have broken IPv6 if it was hypothetically available.

Because of this, there needs to be supporting tests and an knowlege-sharing initiative for maintainers to actively preserve IPv6 support with new contributions. I am happy to produce some sort of written guide or whatever written form makes sense for maintainers. I mentioned this in an earlier comment but I am reiterating it here. I need maintainers to advise on how they want this done.

At some point, I'll come back to this and dig in again (I'm away on vacation at the moment and this change start to finish has been spare time with exception for updating the PR once).

@samrocketman
Copy link
Copy Markdown
Author

@jjyao can you clarify what adding that label means?

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jun 2, 2025

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 2, 2025
@samrocketman
Copy link
Copy Markdown
Author

samrocketman commented Jun 2, 2025

Not stale; maintainers please label so stale bot does not mark as stale.

See also my previous questions

@github-actions github-actions bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 3, 2025
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 18, 2025
@samrocketman
Copy link
Copy Markdown
Author

Not stale; maintainers please label so stale bot does not mark as stale.

See also my previous questions

@github-actions github-actions bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 19, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jul 3, 2025

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jul 3, 2025
@samrocketman
Copy link
Copy Markdown
Author

Not stale; maintainers please label so stale bot does not mark as stale.

See also my previous questions

@samrocketman
Copy link
Copy Markdown
Author

samrocketman commented Jul 3, 2025

To be clear: I do not plan to resolve conflicts or work on this further until some maintainers come out of the wood work to talk with me about these changes. I've requested multiple times in this pull request for a maintainer to chime in and there appears to be silence.

Because of this, I do not find it is worth my time to keep resolving conflicts with no maintainer conversation; my time is not free.

You're welcome to reach out to me in this pull request or you may advise another method that's preferred.

@robertnishihara
Copy link
Copy Markdown
Collaborator

robertnishihara commented Jul 3, 2025

@samrocketman thanks a ton for starting this work, and sorry about neglecting this!

I think we should do a quick call (or email / Slack) discussion to figure out the path forward. Some folks at ByteDance recently mentioned they have an implementation of this feature as well and are interested in contributing it.

Could you email me at robertnishihara [at] gmail.com or message me on the Ray Slack to have a higher bandwidth conversation? FYI @richardliaw @jjyao

@github-actions github-actions bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jul 3, 2025
@samrocketman
Copy link
Copy Markdown
Author

@robertnishihara emailed

@samrocketman
Copy link
Copy Markdown
Author

samrocketman commented Jul 8, 2025

Update: today at 2pm EDT (18:00 UTC-0) there was an in-person meeting between maintainers and I where we went over the current pull request and general information on how to support IPv6.

Much of what I've already shared in this pull request was shared within the meeting.

@psaab
Copy link
Copy Markdown

psaab commented Jul 12, 2025

You need this diff as well or something similar

diff --git a/ray/python/ray/dashboard/modules/dashboard_sdk.py b/ray/python/ray/dashboard/modules/dashboard_sdk.py
--- a/ray/python/ray/dashboard/modules/dashboard_sdk.py
+++ b/ray/python/ray/dashboard/modules/dashboard_sdk.py
@@ -298,7 +298,13 @@
         Keyword arguments other than "cookies", "headers" are forwarded to the
         `requests.request()`.
         """
-        url = self._address + endpoint
+        url = f"{self._address}{endpoint}"
+        try:
+            if ipaddress.ip_address(self._address).version == 6:
+                url = f"[{self._address}]{endpoint}"
+        except ValueError:
+            pass
+

@samrocketman
Copy link
Copy Markdown
Author

@psaab mind opening a PR to my fork so that you get attributed for that diff?

@jjyao jjyao removed the P1 Issue that should be fixed within a few weeks label Jul 24, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Aug 7, 2025

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Aug 7, 2025
@robertnishihara
Copy link
Copy Markdown
Collaborator

@samrocketman FYI @Yicheng-Lu-llll has been doing some work on this in #55129, #55281, #55282.

@github-actions github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Aug 7, 2025
@richardliaw
Copy link
Copy Markdown
Contributor

Closing this in lieu of the other PRs from @Yicheng-Lu-llll

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community core Issues that should be addressed in Ray Core @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[core] ipv6 support