Force use of torch.compile on deterministic roi_align implementation by ezyang · Pull Request #8436 · pytorch/vision

ezyang · 2024-05-21T15:19:21Z

Fixes #8168

Signed-off-by: Edward Z. Yang ezyang@meta.com

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

pytorch-bot · 2024-05-21T15:19:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8436

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 12 New Failures, 1 Unrelated Failure

As of commit ee25749 with merge base 775dd2d ():

NEW FAILURES - The following jobs have failed:

CMake / macos (macos-m1-stable) / macos-job (gh)
The process '/usr/bin/git' failed with exit code 128
CMake / windows (windows.4xlarge, cpu) / windows-job (gh)
The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128
Tests / unittests-macos (3.10, macos-m1-stable) / macos-job (gh)
The process '/usr/bin/git' failed with exit code 128
Tests / unittests-macos (3.11, macos-m1-stable) / macos-job (gh)
The process '/usr/bin/git' failed with exit code 128
Tests / unittests-macos (3.12, macos-m1-stable) / macos-job (gh)
The process '/usr/bin/git' failed with exit code 128
Tests / unittests-macos (3.8, macos-m1-stable) / macos-job (gh)
The process '/usr/bin/git' failed with exit code 128
Tests / unittests-macos (3.9, macos-m1-stable) / macos-job (gh)
The process '/usr/bin/git' failed with exit code 128
Tests / unittests-windows (3.10, windows.4xlarge, cpu) / windows-job (gh)
The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128
Tests / unittests-windows (3.11, windows.4xlarge, cpu) / windows-job (gh)
The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128
Tests / unittests-windows (3.12, windows.4xlarge, cpu) / windows-job (gh)
The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128
Tests / unittests-windows (3.8, windows.4xlarge, cpu) / windows-job (gh)
The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128
Tests / unittests-windows (3.9, windows.4xlarge, cpu) / windows-job (gh)
The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128

FLAKY - The following job failed but was likely due to flakiness present on trunk:

Tests / unittests-linux (3.8, linux.g5.4xlarge.nvidia.gpu, cuda, 11.8) / linux-job (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 128

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang · 2024-05-21T15:21:37Z

cc @qqaatw, I removed the MPS knob because of how memory hungry the eager implementation is, I doubt torch.compile works on MPS.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

NicolasHug

Thanks @ezyang , 2 questions below but LGTM anyway. Unfortunately the MPS-related tests are all toasted (#8433), it's not related to this PR.

NicolasHug · 2024-05-23T12:05:21Z

torchvision/ops/roi_align.py

+def lazy_compile(**compile_kwargs):
+    """Lazily wrap a function with torch.compile on the first call
+
+    This avoids eagerly importing dynamo.


Am I understanding this correctly?

Suggested change

This avoids eagerly importing dynamo.

This avoids eagerly compiling a function at import time.

Nope. Even with torch.compile at top level it isn't compiled until you call it the first time. But importing dynamo has undesirable side effects for eager mode only users so it's best not to do it.

NicolasHug · 2024-05-23T12:05:47Z

torchvision/ops/roi_align.py

    if not torch.jit.is_scripting():
-        if not _has_ops() or (torch.are_deterministic_algorithms_enabled() and (input.is_cuda or input.is_mps)):
+        if (
+            not _has_ops() or (torch.are_deterministic_algorithms_enabled() and (input.is_cuda or input.is_mps))


Should we just remove the mps part here since you mentioned MPS doesn't even work with torch.compile?

Suggested change

not _has_ops() or (torch.are_deterministic_algorithms_enabled() and (input.is_cuda or input.is_mps))

not _has_ops() or (torch.are_deterministic_algorithms_enabled() and input.is_cuda)

I opted to keep it around, because it was explicitly added by @qqaatw, but I don't really mind either way

Sorry for the late reply! I'm ok with either way that is best for the development. From the mentioned issue it seems only relevant to CUDA, is MPS similarly memory hungry with deterministic algorithm?

…entation (#8436) Summary: Signed-off-by: Edward Z. Yang <ezyang@meta.com> Reviewed By: vmoens Differential Revision: D58283855 fbshipit-source-id: 914a91877c193b38f29af450a5935dd1ab5b20d7 Co-authored-by: Nicolas Hug <nh.nicolas.hug@gmail.com>

Force use of torch.compile on deterministic roi_align implementation

3e2bf5d

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

facebook-github-bot added the cla signed label May 21, 2024

ezyang requested a review from NicolasHug May 21, 2024 15:21

ezyang and others added 5 commits May 21, 2024 08:23

Fixup

4237c18

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

fixup

0da86f1

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

skip broken mps

d6e3353

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

skip harder

0d8c510

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

lint

31a78a9

NicolasHug approved these changes May 23, 2024

View reviewed changes

JohannesTheo mentioned this pull request May 29, 2024

OOM Error with roi_align in PyTorch 2.1.1 but fine in PyTorch 2.0.1 #8168

Open

NicolasHug added 2 commits May 29, 2024 12:44

Merge branch 'main' into roi-align-compile

9e56286

Merge branch 'main' into roi-align-compile

ee25749

NicolasHug added enhancement module: ops labels May 29, 2024

NicolasHug merged commit a5f531a into pytorch:main May 29, 2024

NicolasHug mentioned this pull request May 29, 2024

Remove unused dynamo import #8451

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force use of torch.compile on deterministic roi_align implementation#8436

Force use of torch.compile on deterministic roi_align implementation#8436
NicolasHug merged 8 commits intopytorch:mainfrom
ezyang:roi-align-compile

ezyang commented May 21, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented May 21, 2024 •

edited

Loading

Uh oh!

ezyang commented May 21, 2024

Uh oh!

NicolasHug left a comment

Uh oh!

NicolasHug May 23, 2024

Uh oh!

ezyang May 29, 2024

Uh oh!

NicolasHug May 23, 2024

Uh oh!

ezyang May 29, 2024

Uh oh!

qqaatw May 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	This avoids eagerly importing dynamo.
	This avoids eagerly compiling a function at import time.

	not _has_ops() or (torch.are_deterministic_algorithms_enabled() and (input.is_cuda or input.is_mps))
	not _has_ops() or (torch.are_deterministic_algorithms_enabled() and input.is_cuda)

Conversation

ezyang commented May 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8436

❌ 12 New Failures, 1 Unrelated Failure

Uh oh!

ezyang commented May 21, 2024

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug May 23, 2024

Choose a reason for hiding this comment

Uh oh!

ezyang May 29, 2024

Choose a reason for hiding this comment

Uh oh!

NicolasHug May 23, 2024

Choose a reason for hiding this comment

Uh oh!

ezyang May 29, 2024

Choose a reason for hiding this comment

Uh oh!

qqaatw May 29, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ezyang commented May 21, 2024 •

edited

Loading

pytorch-bot bot commented May 21, 2024 •

edited

Loading