Skip to content

Conversation

@FabioLuporini
Copy link
Contributor

besides making things generally more efficient, by generating code that sometimes avoids redundant halo exchanges, we are here achieving:

fixes #2622

@FabioLuporini FabioLuporini added the bug-C bug in the generated code label Jun 11, 2025
@codecov
Copy link

codecov bot commented Jun 11, 2025

Codecov Report

Attention: Patch coverage is 95.17241% with 14 lines in your changes missing coverage. Please review.

Project coverage is 92.01%. Comparing base (2a0076f) to head (b3a3e00).
Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
devito/mpi/halo_scheme.py 69.56% 4 Missing and 3 partials ⚠️
devito/passes/iet/mpi.py 96.22% 2 Missing and 2 partials ⚠️
tests/test_mpi.py 97.47% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2629      +/-   ##
==========================================
+ Coverage   91.99%   92.01%   +0.01%     
==========================================
  Files         245      245              
  Lines       48495    48710     +215     
  Branches     4261     4292      +31     
==========================================
+ Hits        44612    44819     +207     
- Misses       3202     3205       +3     
- Partials      681      686       +5     
Flag Coverage Δ
pytest-gpu-aomp-amdgpuX 72.59% <82.24%> (+0.04%) ⬆️
pytest-gpu-nvc-nvidiaX 73.66% <83.43%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@mloubout mloubout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments but it looks all sane, will test it as much as possible

otherwise.
"""

def rule0(dep):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two don't need to be inside _is_iter_carried

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but they are so local...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep them there purely for neatness

if flag and self.rule(self.match, o):
found.append(o)
for i in o.children:
found, flag = self._visit(i, ret=(found, flag))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably exit if flag switch from true to false. Or it's gonnavisit everything afterward for nothing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then I'd have to add logic to keep track that a switch from False to True had already happened (that is, that a start-stop has been encountered as opposed to not yet), which I prefer to avoid (less complexity)

Copy link
Contributor

@mloubout mloubout Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somethign like


for i in o.children:
            found, newflag = self._visit(i, ret=(found, flag))
             if flag and not newflag:
                 return found, newflag
             else:
                 flag = newflag

Should do it and avoid quite a lot of visit


# Find candidate for subsequent merging
for hs1 in halo_spots:
if hs0 is hs1 or cond_mapper[hs1]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could check if cond_mapper[hs1] is a "subset" of "cond_mapper[hs0]" , i.e less conditions in which case it could be hoisted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could, but that really is an impossible situation

return False

# Then, check the data dependences would be satisfied
if not _is_iter_carried(hsf1, scope):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just return _is_iter_carried(hsf1, scope)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

and ends at the HaloSpot `hs1`.
"""
expressions = FindWithin(Expression, hs0, stop=hs1).visit(it)
assert len(expressions) > 0, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it even possible? hs0 will always be in it so not sure worth the assert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah I'll drop it

return ret


class FindWithin(FindNodes):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will want rebasing on top of #2621 in due course

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's more likely the other way round

if self.function._mem_shared:
# Special case: the distance between two regular, thread-shared
# objects fallbacks to zero, as any other value would be nonsensical
# objects fallbacks to zero, as any other value would be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: "falls back" or "is set to zero as a fallback"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

same value, False otherwise.
"""
# Case 1: ModuloDimensions of size 1
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be:

return d0.is_Modulo and d1.is_Modulo and d0.modulo == d1.modulo == 1

in what case would the attribute error be triggered? Regardless the if can just be a return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right

the try is necessary because not all Dimensions may be ModuloDimensions

anyway, dropping the if

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah no, now I remember why I made it that way...

as you can see, the function starts with

# Case 1: ModuloDimensions of size 1

as I'm retaining the possibility that at some point in the future this function may be extended. In which case, the if makes more sense. So I'm keeping it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. Possibly leave a comment mentioning this so it doesn't get factored out in the future?

return hash((self.loc_indices, self.loc_dirs, self.halos, self.dims,
self.bundle))

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cached property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

iet = _hoist_invariant(iet)
iet = _hoist_redundant_from_conditionals(iet)
iet = _merge_halospots(iet)
iet = _hoist_invariant(iet)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this switch also yield a bit of compiler speedup, since halospots are merged before hoisting, so there are fewer to hoist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe but not sure

not isinstance(i.condition, GuardFactorEq)}
cond_mapper[hs] = conditionals
return {hs: tuple(i for i in v if i.is_Conditional)
for hs, v in MapHaloSpots().visit(iet).items()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MapHaloSpots().visit(iet).items() <- technically fine, but upsetting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, assigning mapper = MapHaloSpots().visit(iet) first

and ends at the HaloSpot `hs1`.
"""
expressions = FindWithin(Expression, hs0, stop=hs1).visit(it)
return Scope([e.expr for e in expressions])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just use a generator without the list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

otherwise.
"""

def rule0(dep):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep them there purely for neatness


for f, v in hsf.fmapper.items():
for dep in scope.d_flow.project(f):
if not rule0(dep) and not rule1(dep, v.loc_indices):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rule0 has a double not here. Why not fold these nots into the rules themselves?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I could, but the semantics of rule0 and rule1 hasn't been changed in years, is it really worth it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, just figured it was worth mentioning

assert np.allclose(h.data_ro_domain[0, 5:], [4.4, 4.4, 4.4, 3.4, 3.1], rtol=R)

@pytest.mark.parallel(mode=1)
def test_merge_and_hoist_haloupdate_if_diff_locindices_v2(self, mode):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snappy 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who needs docstrings ^^

Copy link
Contributor

@EdCaunt EdCaunt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ready to go

@mloubout mloubout force-pushed the fix-mpi-hoisting-2 branch from 8ca5173 to b3a3e00 Compare June 13, 2025 12:46
@mloubout mloubout merged commit b3408d8 into main Jun 13, 2025
35 checks passed
@mloubout mloubout deleted the fix-mpi-hoisting-2 branch June 13, 2025 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-C bug in the generated code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Haloupdate miss-placed with Buffer(1)

3 participants