Add SSE2 version of Vp8Sse4X4#1817
Conversation
# Conflicts: # src/ImageSharp/Formats/Webp/Lossy/LossyUtils.cs
Codecov Report
@@ Coverage Diff @@
## master #1817 +/- ##
==========================================
- Coverage 87.34% 87.19% -0.15%
==========================================
Files 936 936
Lines 48154 48164 +10
Branches 6038 6037 -1
==========================================
- Hits 42058 41998 -60
- Misses 5095 5162 +67
- Partials 1001 1004 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
| Vector128<byte> a0 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a)); | ||
| Vector128<byte> a1 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a.Slice(WebpConstants.Bps, 8))); | ||
| Vector128<byte> a2 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a.Slice(WebpConstants.Bps * 2, 8))); | ||
| Vector128<byte> a3 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a.Slice(WebpConstants.Bps * 3, 8))); |
There was a problem hiding this comment.
Slicing has some unnecessary extra costs, if input is safe we can do everything with pointer Unsafe arithmetics:
| Vector128<byte> a0 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a)); | |
| Vector128<byte> a1 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a.Slice(WebpConstants.Bps, 8))); | |
| Vector128<byte> a2 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a.Slice(WebpConstants.Bps * 2, 8))); | |
| Vector128<byte> a3 = Unsafe.As<byte, Vector128<byte>>(ref MemoryMarshal.GetReference(a.Slice(WebpConstants.Bps * 3, 8))); | |
| ref byte aRef = ref MemoryMarshal.GetReference(a); | |
| Vector128<byte> a0 = Unsafe.As<byte, Vector128<byte>>(ref aRef); | |
| Vector128<byte> a1 = Unsafe.As<byte, Vector128<byte>>(ref Unsafe.Add(ref aRef, WebpConstants.Bps)); | |
| Vector128<byte> a2 = Unsafe.As<byte, Vector128<byte>>(ref Unsafe.Add(ref aRef, WebpConstants.Bps * 2)); | |
| Vector128<byte> a3 = Unsafe.As<byte, Vector128<byte>>(ref Unsafe.Add(ref aRef, WebpConstants.Bps * 3)); |
Same for b.
# Conflicts: # tests/ImageSharp.Tests/Formats/WebP/LossyUtilsTests.cs
| } | ||
|
|
||
| [MethodImpl(InliningOptions.ShortMethod)] | ||
| public static int GetSse(Span<byte> a, Span<byte> b, int w, int h) |
There was a problem hiding this comment.
This method name confuses me shouldn't it be Vp84X4?
There was a problem hiding this comment.
Vp84X4 is confusing because of 84. Parent method can be Vp8_4x4 while this one can be Vp8_4X4Scalar. Or maybe Vp8_4x4Scalar could be inlined if it's not used anywhere else?
There was a problem hiding this comment.
I agree that the name is not ideal, but its called VP8SSE4x4 in the original libwebp source and I wanted to keep the method names to be as close to the original as possible to make it easier to follow/commpare the original code.
A better name might have been CalculateDistortion4x4, but again I am still in favor of keeping the names similar to the original.
There was a problem hiding this comment.
This method name confuses me shouldn't it be
Vp84X4?
GetSse is not called Vp84X4 because of the variable parameters w and h.
I tried to rename the methods: 7e20c5d
Is that better (or worse)?
| } | ||
|
|
||
| [MethodImpl(InliningOptions.ShortMethod)] | ||
| public static int GetSse(Span<byte> a, Span<byte> b, int w, int h) |
There was a problem hiding this comment.
Vp84X4 is confusing because of 84. Parent method can be Vp8_4x4 while this one can be Vp8_4X4Scalar. Or maybe Vp8_4x4Scalar could be inlined if it's not used anywhere else?
| Array.Clear(this.YDcLevels, 0, this.YDcLevels.Length); | ||
| Array.Clear(this.YAcLevels, 0, this.YAcLevels.Length); | ||
| Array.Clear(this.UvLevels, 0, this.UvLevels.Length); | ||
| Array.Clear(this.ModesI4, 0, this.ModesI4.Length); | ||
| Array.Clear(this.Derr, 0, this.Derr.Length); |
There was a problem hiding this comment.
Unrelated to the main thing in the PR, but Vp8ModeScore is also a good candidate to be made a value type with fixed buffers.
There was a problem hiding this comment.
yes seems reasonable, Vp8ModeScore makes up the most of the allocations during encoding, but I might try that in a different PR.
There was a problem hiding this comment.
Definitely worth a look after.
Prerequisites
Description
This adds a SSE2 version of the method
Vp8Sse4X4, which is used during lossy encoding of webp images.Related to #1786
Profiling results:
Master Branch:
PR:
P.S. im not sure again why the call count is different, it was the same image