Making int.Parse("") faster#12196
Conversation
|
Have you tried using separate loops instead of a single loop and "state"? |
|
@mikedn Yeah, there's definitely room for optimizations!! I just wanted to initiate a discussion. If it's worth to do it (it's not only |
|
Note that I did some quick experiments on these lines some months ago and my conclusion was that this alternative wasn't worth pursuing (didn't try it too hard though). I think that I mostly focused on I am also in a position to say that further improving the main existing algorithm would be quite difficult, as far as it is already over-optimised (even though, some parts might look like easily optimisable at first sight). I am not trying to demotivate you, but to give some context about the kind of difficulties which you are about to face. As a first step (I learned that the hard way, mostly thanks to @jkotas contributions :)), you should set up a reliable enough testing framework (in principle, many iterations and quite simplistic actions) to get a proper feeling about what you really have (is your table referring to fractions of nanosecond improvements?!). |
|
It could be much better if I also have a question about unsafe: why? Bound checks are elminitated anyway because you are passing whole string, but you even may make it slower by pinning a string. DId you try benchmark very same algorithm without unsafe? |
|
@Pzixel There is an issue for being able to use |
|
@varocarbas your comment was deleted as a violation of the ,NET Foundation Code of Conduct (adopted by CoreCLR repo) as it was insulting/derogatory. You may consider this an official warning. |
|
@varocarbas your comment was deleted as a violation of the .NET Foundation Code of Conduct (adopted by CoreCLR repo) as it was insulting/derogatory. You may consider this an official warning. |
|
@varocarbas we'll be happy to share with you your original comments offline and explain in more details which parts of your comments are insulting/derogatory. Please reach out to us via email at .NET Foundation Code of Conduct (conduct@dotnetfoundation.org). In case of Code of Conduct violations, we delete entire post. No exceptions. We want to create safe and creative environment for our community, where people are respectful to each other. @varocarbas please do not continue discussing this issue here. Please reach out to us offline, so that we can discuss acceptable behavior according to our Code of Conduct. |
|
@varocarbas I have to admit that I really got hurted by your post. But after rereading it several times I found that it prolably have neutral intention. So, I have no beefing toward you, but someone may be less tolerant. About your position above, I have studied this code snippet more carefuly and I have found that the reason why it cannot be converted is Finally, I'm not saying that everythigh is good with vectorisation. But some very common patterns may heavily profit from it. Operation with strings is 90% of time in some cases. I understand, that there is plenty of problems that need solution and it may be not first-priority task, but I propose to assume it at least as worthwhile task. |
src/mscorlib/src/System/Number.cs
Outdated
| { | ||
| state |= StateSign; | ||
| ret = ret * 10 + (c - '0'); | ||
| } |
There was a problem hiding this comment.
Another possible optimization here:
var c = *cptr;
uint cbased = (uint)(c - '0');
if (cbased <= 9 && (state & StateDigits) == 0)
{
state |= StateSign;
ret = ret * 10 + (int)cbased;
}
There was a problem hiding this comment.
applied, thanks. The asm code looks better but it doesn't give much in a benchmark.
src/mscorlib/src/System/Number.cs
Outdated
| } | ||
|
|
||
|
|
||
| if ((state & StateNegative) > 0) |
There was a problem hiding this comment.
This check relies on StateNegative not being the sign bit. Would probably be more appropriate to just use != 0 (at least this is how I usually see flag checks).
@mikedn I pushed the three loop version and its benchmark results to the repo. |
|
@varocarbas don't worry, I have all your posts persistet on my email because I have a notification on. I can resend it for you somehow, If you want so. About your post: I know what is |
Looks like it's slower. Most likely due to the cost of This
|
src/mscorlib/src/System/Number.cs
Outdated
| if ((state & StateNegative) > 0) | ||
| { | ||
| ret = -ret; | ||
| if (ret > 0) |
There was a problem hiding this comment.
I don't think these 2 checks are enough to detect overflow. They're enough only if we limit the number of digits you parse and I don't see such a limit in your code.
There was a problem hiding this comment.
addressed using ret >= 0 check
handleNumber:
var c = (uint)(*cptr - '0');
if (c <= 9 && ret >= 0)
{
ret = ret * 10 + (int)c;
cptr++;
goto handleNumber;
}
dab26ea to
0b79cc8
Compare
|
I rebased and pushed a new version that addresses comments. I extracted code that handles leading and trailing whitespaces into separate NoInlining methods. Benchmark results:
The assembly code for |
|
What is the status of the PR? No update for 3 weeks ... |
|
@karelz Yes, it's waiting for review. Open questions:
|
It's an abstraction that includes strings as well as char[] and other buffers of chars. So as you see in 12d5014 we are exposing new overloads that accept span, and implementing in terms of span (because one can have a span over string but not vice versa) |
|
@joperezr @AlexGhiondea can you please comment? It's been 9 days :( ... |
|
adding @stephentoub as he has been looking at improving double.Parse (https://github.com/dotnet/coreclr/issues/13544) |
FWIW, I wasn't actually looking at improving it, only at making it work in coreclr... hadn't yet gotten around to the perf of it. |
|
@alexandrnikitin thanks for your contribution, first of all I want to apologize it's taken so long for us to look into this. Before we start the review though, do you mind rebasing your changes with the changes in master since it looks like you have merge conflicts? I ask this so that I can re-run CI and to be able to run all of our corefx tests against this change to make sure that it passes them. |
0b79cc8 to
54c827a
Compare
Handles only NumberStyles.Integer (AllowLeadingWhite | AllowTrailingWhite | AllowLeadingSign) Leading symbols handling logic wrapped into nonilineable HandleLeadingSymbols method Trailing symbols handling logic wrapped into noninlineable HandleTrailingWhite method
54c827a to
9b8fcd4
Compare
|
I rebased and forcepushed the branch that works with Questions I have:
|
|
@dotnet-bot test Windows_NT x86 corefx_baseline |
usually the tests are located in corefx repo which will consume these changes. |
|
@alexandrnikitin, are you still working on this / still interested in working on this? |
|
(It would be very welcome!) |
|
@stephentoub I want it to be merged 😊 What options do we have from now? And what needs to be done to have it merged?
|
😄
I would say:
|
|
Closing stale PRs. Ideally, this issue should be fixed by porting the fine-tuned UTF8 parser: https://github.com/dotnet/corefx/issues/30612#issuecomment-399731130 |
This is a PoC to to make int.Parse() faster for 99%? cases.
The fast path only handles the
NumberStyles.Integerstyle (sign, leading and trailing whitespaces)This is a draft version. I will polish it if there's interest to have it in CLR. It reuses two functions
MatchCharsandIsWhite. I believe it's possible to make it faster (some magic with the loop and conditions, get rid of unsafe code andMatchCharsfunction). It also possible to improve all cases whereNumberStyles.IntegerusedI pushed initial benchmarks to a separate repo (I didn't manage to do them in the coreclr repos 😕 )
Benchmark results:
Not sure what's wrong with the Whitespaces case 😊
Related issues: #3995 https://github.com/dotnet/corefx/issues/13293 #3163
cc who participated in previous discussions @jkotas @varocarbas @hughbe @danmosemsft @jakobbotsch @mikedn