-
Notifications
You must be signed in to change notification settings - Fork 248
compiler: Add various performance optimization variants #2446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2446 +/- ##
==========================================
+ Coverage 87.04% 87.09% +0.04%
==========================================
Files 237 238 +1
Lines 44998 45106 +108
Branches 8372 8405 +33
==========================================
+ Hits 39167 39283 +116
+ Misses 5102 5091 -11
- Partials 729 732 +3 ☔ View full report in Codecov by Sentry. |
| * 'cse-algo': str. The CSE algorithm to apply. Accepted: ['basic', | ||
| 'smartsort', 'advanced']. Default is 'basic'. | ||
| """ | ||
| min_cost = options['cse-min-cost'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are defaults set on these somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core/operator.py core/cpu.py etc like all other options
| ----- | ||
| We're not using SymPy's CSE for three reasons: | ||
|
|
||
| * It also captures array index access functions (e.g., i+1 in A[i+1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In future, we will want to capture these for things like x+1-ltkn and INT(abs(x)) etc, but we probably want some kind of filter specifying a minimum verbosity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will also probably want a unique kind of variable name to improve readability. Something like i0 rather than r0
mloubout
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bunch of comments, looks nice.
| return subs | ||
|
|
||
|
|
||
| def catch(exprs, mode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retrieve_cse to match or "search" names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't use search because it'd be an alias of symbolics.search, and retrieve_cse is misleading given what "cse" means and in particular given that we're just looking for potential common subexpressions (that is, candidates)
I think catch (for Candidates) is fine
|
|
||
| mapper[Candidate(expr)].append(expr) | ||
|
|
||
| for n in range(2, len(expr.args)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually fairly "limited" because it ignores the fact Add/Mul are commutative. We would catch a lot more cse if we take commutation in consideration and take all sub-args of length at least two.
So you can do
for terms in sub_args(expr.args):
with
def sub_args(args):
nargs = []
for r in range(2, len(args) + 1):
nargs.extend(itertools.combinations(args, r))
return nargs
Wich for example for a*b*c*d would capture all a*b, a*c, a*d, b*d, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is all carefully explained in the docstring... it's an expensive algorithm and deliberately we're implementing the heuristic I chose (relying already on the canonical ordering of the arguments etc etc)
|
|
||
| # Any factorization possible? | ||
| if len(inverse_mapper) == len(expr.args) or \ | ||
| list(inverse_mapper) == [1]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid list on (potentially large) set and len(inverse_mapper) == 1 && 1 in inverse_mapper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| elif expr.is_Equality: | ||
| args, candidates = zip(*[_collect_nested(expr.lhs), | ||
| _collect_nested(expr.rhs)]) | ||
| args, candidates = zip(*[_collect_nested(expr.lhs, strategy), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to run it on lhs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is legacy, I can probably drop it; will try
| 'in_critical_region'] | ||
|
|
||
|
|
||
| def makeit_ssa(exprs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
8f15886 to
8171523
Compare
0a3a196 to
b1a10ee
Compare
| @@ -196,14 +198,14 @@ def _specialize_clusters(cls, clusters, **kwargs): | |||
|
|
|||
| # Reduce flops | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flops minimization and Reduce flops as comments subsections looks duplicate....
I would keep CSE in the first subtitle, line 49
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, it is not applied here now
|
|
||
|
|
||
| @pytest.mark.parametrize('exprs,expected,min_cost', [ | ||
| # Simple cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are all these just moved, or edited/enhanced as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very few modifications
| @@ -307,8 +307,7 @@ | |||
| " {\n", | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @@ -196,14 +198,14 @@ def _specialize_clusters(cls, clusters, **kwargs): | |||
|
|
|||
| # Reduce flops | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
| raise InvalidOperator("Unsupported MPI mode `%s`" % oo['mpi']) | ||
|
|
||
| if oo['cse-algo'] not in ('basic', 'smartsort', 'advanced'): | ||
| raise InvalidArgument("Illegal `cse-algo` value") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message should say what the illegal value was, much like the one above
| ----- | ||
| We're not using SymPy's CSE for three reasons: | ||
|
|
||
| * It also captures array index access functions (e.g., i+1 in A[i+1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will also probably want a unique kind of variable name to improve readability. Something like i0 rather than r0
|
|
||
| def _compact(exprs, exclude): | ||
| """ | ||
| Drop useless temporaries: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there also edge cases where it creates expressions like r1 = r0? I'm sure we had some issue like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the roles of _compact is catching them all !
No description provided.