Api brotli changes by Vedin · Pull Request #1621 · dotnet/corefxlab

Vedin · 2017-06-21T19:07:12Z

Changes to the Brotli API. What's new:

Brotli (Primitives) was rewrote and now return all possible TransformationStatus.
BroltiStream now base on Brotli class. Also it uses twice less memory as before and doesn't use marshaling.
@KrzysztofCwalina Finally, I remove everything from State and rewrite all in way where we need only BrotliNativeState and LastDecoderResult, So how to Dispose it know (there are 2 commented methods for compression/decompression mode? May be add some flag to detect in what mode state now?
BrotliPrimitivesTests needs an update.

dnfclas · 2017-06-21T19:07:16Z

@Vedin,
Thanks for having already signed the Contribution License Agreement. Your agreement was validated by .NET Foundation. We will now review your pull request.
Thanks,
.NET Foundation Pull Request Bot

Vedin · 2017-06-21T19:09:47Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

+            public void Dispose()
+            {
+                //    BrotliNative.BrotliDecoderDestroyInstance(BrotliNativeState);
+                //   BrotliNative.BrotliEncoderDestroyInstance(BrotliNativeState);


@KrzysztofCwalina This 2 methods, which I mentioned above

I think you are going to have to hold enough information that allows you to distinguish between whether you are doing compress or decompress, then call the correct de-allocation function.

…into APIBrotliChanges2

ahsonkhan · 2017-06-21T20:36:33Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

+
+            public void SetQuality(uint quality)
+            {
+                if (quality < MinQuality || quality > MaxQuality)


MinQuality is 0 and quality is an unsigned int. Quality will never be < MinQuality and hence this check is unnecessary.

ahsonkhan · 2017-06-21T20:42:02Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

+
+            public void SetWindow(uint window)
+            {
+                if (window < MinWindowBits || window > MaxWindowBits)


if (window - MinWindowBits > MaxWindowBits - MinWindowBits) saves an extra conditional branch.

See: #1616 (comment)

make sense, nice optimization

ahsonkhan · 2017-06-21T20:42:31Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

-        }
-
-        internal static TransformationStatus Compress(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesConsumed, out int bytesWritten, int quality = DefaultQuality, int windowSize = DefaultWindowSize, BrotliEncoderMode encMode = BrotliEncoderMode.Generic)
+        public static TransformationStatus FlushEncoder(Span<byte> source, Span<byte> destination, out int bytesConsumed, out int bytesWritten, ref State state, bool is_finished = true)


nit: argument name doesn't follow coding style

ahsonkhan · 2017-06-21T20:43:34Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

-            bytesConsumed = bytesWritten = 0;
+            BrotliEncoderOperation operation = is_finished ? BrotliEncoderOperation.Finish : BrotliEncoderOperation.Flush;
+            bytesConsumed = source.Length;
+            bytesWritten = destination.Length;


These should not be set to source.Length and destination.Length in the beginning.

really it should be because we send this variables to the native library (where they means array size)
(see written and consumed)

ahsonkhan · 2017-06-21T20:45:02Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

                    bytesWritten = (int)written;
-                    return TransformationStatus.Done;
                }
+                bytesWritten = destination.Length - bytesWritten;


I am curious, why do we do this?

because bytesWritten after calling a native func is: "how many bytes have already free", and we convert it to how many bytes have already written

ahsonkhan · 2017-06-21T20:46:20Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

+                            return TransformationStatus.InvalidData;
+                        };
+                        bytesConsumed = (int)consumed;
+                        bytesWritten = (int)((nuint)destination.Length - written);


Won't this set bytesWritten to 0 in the success/done case, if written == destination.Length?

Yes, it's true and it should works like this

ahsonkhan · 2017-06-21T20:46:57Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

+            bool errorDetected = false;
+            bytesConsumed = source.Length;
+            bytesWritten = destination.Length;
+            BrotliDecoderResult LastDecoderResult = BrotliDecoderResult.NeedsMoreInput;


Why is NeedsMoreInput the starting state of LastDecoderResult?

Because at start we haven't send anything for decompress, so we need more input

ahsonkhan · 2017-06-21T20:48:21Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

+                }
+                else
+                {
+                    endOfStream = true;


Is this bool necessary? The other two conditions return, so reaching here automatically implies endOfStream is true. No?

endOfStream was deleted

ahsonkhan · 2017-06-21T20:49:52Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

+
+                if (endOfStream && !BrotliNative.BrotliDecoderIsFinished(state.BrotliNativeState))
+                {
+                    errorDetected = true;


Similarly, I don't think we need this. You can just use !BrotliNative.BrotliDecoderIsFinished(state.BrotliNativeState)) at line 209/215. Let me know if I am mistaken.

I rewrote the logic of detecting errors, but you are right

ahsonkhan · 2017-06-21T20:51:42Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

+                    var text = BrotliNative.BrotliDecoderErrorString(error);
+                    throw new System.IO.IOException(text + BrotliEx.unableDecode);
+                }
+                if (endOfStream && !BrotliNative.BrotliDecoderIsFinished(state.BrotliNativeState) && LastDecoderResult == BrotliDecoderResult.NeedsMoreInput)


I don't know if this will ever be true.
if (x && y && LastDecoderResult == BrotliDecoderResult.NeedsMoreInput) is false if LastDecoderResult == BrotliDecoderResult.NeedsMoreInput is false (regardless of what x and y are), and we only get here if that is the case.

yes, fixed it

shiftylogic · 2017-06-22T17:08:57Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

+            public void Dispose()
+            {
+                //    BrotliNative.BrotliDecoderDestroyInstance(BrotliNativeState);
+                //   BrotliNative.BrotliEncoderDestroyInstance(BrotliNativeState);


I think you are going to have to hold enough information that allows you to distinguish between whether you are doing compress or decompress, then call the correct de-allocation function.

shiftylogic · 2017-06-22T17:34:11Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

+                //   BrotliNative.BrotliEncoderDestroyInstance(BrotliNativeState);
+            }
+
+            public void InitializeDecoder()


I suggest a single initialize function with a flag that says whether they want compress or decompress. Also, I'd make the initialization protect against a second call. If you don't you will leak native memory.

You might also consider making these internal. State is a struct, so have the default object be "uninitialized" then have the state initialize itself on first use. That makes the developer have to so less work and won't get it wrong.

The problem is that in encoder mode developer also may want to set quality and window size, in this case He needs to have a state before first call. Possible solution is add another Compress method with quality and window size, but it doesn't look obvious for user to call one method at first iteration and other on next iterations, does it?

shiftylogic · 2017-06-22T17:40:03Z

src/System.IO.Compression.Brotli/System/IO/Compression/BrotliStream.cs

-                _encoder = new Encoder();
-                _encoder.SetQuality();
-                _encoder.SetWindow();
+                _state.InitializeEncoder();


In the stream case, I would say that an exception bubbling up is probably a good thing, unless you plan on wrapping and re-throwing.

shiftylogic · 2017-06-22T17:40:45Z

src/System.IO.Compression.Brotli/System/IO/Compression/BrotliStream.cs

+            if (BrotliNative.BrotliEncoderIsFinished(_state.BrotliNativeState)) return;
+            _nextInput = new byte[0];
+            _availableInput = 0;
+            TransformationStatus ts = TransformationStatus.DestinationTooSmall;


Rename ts to something more meaningful. Even "status" would be better.

renamed to flushStatus

shiftylogic · 2017-06-22T17:43:45Z

src/System.IO.Compression.Brotli/System/IO/Compression/BrotliStream.cs

+            }
+            else
+            {
+                throw new UnauthorizedAccessException();


That exception is used for security checks. It is not appropriate here.

shiftylogic · 2017-06-22T17:48:18Z

src/System.IO.Compression.Brotli/System/IO/Compression/BrotliStream.cs

                }
                copyLen = bytesRemain > _bufferSize ? _bufferSize : bytesRemain;
-                Marshal.Copy(buffer, currentOffset, _bufferInput, copyLen);
+                byte[] bufferInput = new byte[copyLen];


Why is this extra buffer allocation and copy necessary? If you are going to need an extra buffer on each write call, you might want to rent a buffer from a pool, have your own buffer that you re-use, or use a stack buffer (if it isn't too large).

Not necessary It is temporary. What the problem is that signature of Compress takes out parameter (consumed) and input is exactly as source size. I can change consumed to ref parameter, but it is inconsistent with other Encode/Decode methods, which using Span.

May be I can fix it using Span.Slice and just cut the buffer instead of copy it to _bufferInput?

@shiftylogic Is something like this ok?

Span<byte> bufferInput = new Span<byte>(buffer); bufferInput.Slice(0, copyLen); transformationResult = Brotli.Compress(bufferInput, _bufferOutput, out _availableInput, out _availableOutput, ref _state);

ahsonkhan · 2017-06-22T23:18:18Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

            BrotliEncoderOperation operation = isFinished ? BrotliEncoderOperation.Finish : BrotliEncoderOperation.Flush;
            bytesConsumed = source.Length;
            bytesWritten = destination.Length;
+            bytesConsumed = 0;


Why set bytesConsumed to source.Length first and then to 0?

ahsonkhan · 2017-06-22T23:19:16Z

src/System.IO.Compression.Brotli/System/IO/Compression/BrotliStream.cs

                TransformationStatus transformationResult = TransformationStatus.DestinationTooSmall;
-                while (transformationResult == TransformationStatus.DestinationTooSmall)
+                /*Span<byte> bufferInput = new Span<byte>(buffer);
+                bufferInput.Slice(0, copyLen);*/


nit: remove commented out code?

It was a question to @shiftylogic Can I use smth like this instead of allocate and copy array?

ahsonkhan · 2017-06-22T23:19:51Z

tests/System.IO.Compression.Tests/BrotliPrimitivesTests.cs

        }

-        private void ValidateCompressedData(Span<byte> data, byte[] expected)
+        private void ValidateCompressedData(byte[] data, byte[] expected)


Why was this change required?

Just to understand that I send byte array not span, in fact it wasn't change anything

ahsonkhan · 2017-06-22T23:27:21Z

tests/System.IO.Compression.Tests/BrotliPerfomanceTests.cs

            byte[] data = File.ReadAllBytes(testFilePath);
            var bytes = new byte[bufferSize];
            foreach (var iteration in Benchmark.Iterations)
+            {


FYI, this filename has a typo:
BrotliPerfomanceTests.cs => BrotliPerformanceTests.cs

wow, really

shiftylogic · 2017-06-23T17:04:28Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

+        {
+            internal IntPtr BrotliNativeState { get; private set; }
+            internal BrotliDecoderResult LastDecoderResult;
+            public bool CompressMode { get; set; }


This should probably be readonly and set internally when InitializeDecoder or InitializeEncoder is called.

shiftylogic · 2017-06-23T17:13:24Z

src/System.IO.Compression.Brotli/System/IO/Compression/BrotliStream.cs

-                _availableInput = (nuint)copyLen;
-                _nextInput = _bufferInput;
-                while ((int)_availableInput > 0)
+                byte[] bufferInput = new byte[copyLen];


Why is this extra array and copy needed? Why can't you use the input directly for the compress call?

shiftylogic · 2017-06-23T17:15:01Z

src/System.IO.Compression.Brotli/System/IO/Compression/BrotliStream.cs

-            while (true)
+            if (_state.BrotliNativeState == IntPtr.Zero) return;
+            if (BrotliNative.BrotliEncoderIsFinished(_state.BrotliNativeState)) return;
+            _nextInput = new byte[0];


Why allocate an empty array and throw away the one you allocated in the constructor?

shiftylogic · 2017-06-23T18:58:31Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

        public static TransformationStatus Compress(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesConsumed, out int bytesWritten, ref State state)
        {
-            EnsureInitialized(ref state, true);
+            try


This try catch and throw is not necessary if all you are doing it re-throwing the same exception. Just let it bubble up to the caller.

shiftylogic · 2017-06-23T18:59:17Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

        public static TransformationStatus Decompress(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesConsumed, out int bytesWritten, ref State state)
        {
-            EnsureInitialized(ref state, false);
+            try


Same here. Just let this bubble up to the caller.

shiftylogic · 2017-06-26T18:29:18Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

+            }
+        }
+
+        public static void EnsureInitialized(ref State state, bool compress)


Probably want this internal, unless you think someone needs this for external use.

ahsonkhan · 2017-06-26T19:49:51Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

        }

-        public static TransformationStatus Compress(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesConsumed, out int bytesWritten, CompressionLevel quality = (CompressionLevel)DefaultQuality, int windowSize = DefaultWindowSize, BrotliEncoderMode encMode = BrotliEncoderMode.Generic)
+        public static TransformationStatus FlushEncoder(Span<byte> source, Span<byte> destination, out int bytesConsumed, out int bytesWritten, ref State state, bool isFinished = true)


Why change the input source to Span instead of ReadOnlySpan?

It is different methods, but yes their should be ReadOnlySpan also

ahsonkhan · 2017-06-26T19:51:53Z

src/System.IO.Compression.Brotli/System/IO/Compression/Brotli.cs

+                    nuint availableOutput = (nuint)bytesWritten;
+                    nuint consumed = (nuint)bytesConsumed;
+                    state.LastDecoderResult = BrotliNative.BrotliDecoderDecompressStream(state.BrotliNativeState, ref consumed, ref bufIn, ref availableOutput, ref bufOut, out nuint totalOut);
+                    bytesWritten = (int)((nuint)destination.Length - availableOutput);


This is simpler. Would it work?
bytesWritten = destination.Length - (int)availableOutput;

Yes, I should change it, I just think that it is better to do arithmetic operations in the narrowed type, than expand it to int and than do minus

ahsonkhan · 2017-06-26T19:52:39Z

src/System.IO.Compression.Brotli/System/IO/Compression/BrotliStream.cs

    public partial class BrotliStream : Stream
    {
-        private const int DefaultBufferSize = (1 << 16) - 1;
+        private const int DefaultBufferSize = (1 << 16) - 16;


Can you add the decimal value of DefaultBufferSize as a comment?

ahsonkhan · 2017-06-26T19:56:21Z

src/System.IO.Compression.Brotli/System/IO/Compression/BrotliStream.cs

-                    throw new System.IO.IOException(BrotliEx.unableEncode);
-                var extraData = (nuint)_availableOutput != (nuint)_bufferSize;
-                if (extraData)
+                flushStatus = Brotli.FlushEncoder(Array.Empty<byte>(), _buffer, out _availableInput, out _availableOutput, ref _state, finished);


Out of curiosity, why do we pass an empty array here?

Because, when we start flush stream, we shouldn't send any more date to the original native lib

Why not Span.Empty?

Hmm...Does Span.Empty work faster? I just thought it is the same

Hmm...Does Span.Empty work faster? I just thought it is the same

I am not certain (will have to look at assembly). However, FlushEncoder takes a span, and if Span.Empty works, I would use that. If there is a specific reason to keep Array.Empty instead and use the implicit cast to Span, then use array. I believe you avoid the implicit cast: https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/Span.cs#L376

Span.Empty should be faster because it avoids the implicit cast and thus the extra ctor call.

* Optimizing some int32 parsers and clean up * wip - adding tests and fixing impl bugs * WIP - new implementation and tests to compare with previous * WIP - fixing non-invariant parser bugs and adding tests * Cleaning up functional and performance tests and removing old code. * Removing unnecessary using directive * Addressing PR comments and adding more tests. * Addressing PR comments and adding loop unrolling. * Fixing issues missed after merge * Fixing non-invariant int32 parser * Addressing PR comment, removing unused DangerousGetPinnableReference * Cleanup and updating test helpers * Patch to dotnet install scripts for downloading via new blob URL. (#1632) * Removing text.Length cache and changing to unsigned comparison * Adding comment and removing unnecessary math operations using 0 (D0) * Api brotli changes (#1621) * CaterburyPerf * flush/compress * huge changes * space * small * flush/compress * huge changes * space * small * unsaved chagnes * test fix depends on APIchanges * resolve issues * Less alocation, change State * issue * remove unnecessary * resolve * clean up changes, change state * resolve * guidelines * fix bug * resolve issues * change access * small issues

Vedin added 5 commits June 16, 2017 11:15

CaterburyPerf

f2da3ba

flush/compress

da92380

huge changes

790191c

space

f3b3e58

small

c87498f

dnfclas added the cla-already-signed label Jun 21, 2017

Vedin requested review from KrzysztofCwalina, ahsonkhan, ianhays and shiftylogic June 21, 2017 19:07

Vedin commented Jun 21, 2017

View reviewed changes

Vedin added 8 commits June 21, 2017 12:11

flush/compress

e152fd1

huge changes

d06e4b4

space

fb317f5

small

8fef201

Merge branch 'APIBrotliChanges2' of https://github.com/Vedin/corefxlab …

d31a1c6

…into APIBrotliChanges2

resolve merge conflicts

78a1a75

unsaved chagnes

d9b7587

test fix depends on APIchanges

5a6f037

ahsonkhan reviewed Jun 21, 2017

View reviewed changes

shiftylogic reviewed Jun 22, 2017

View reviewed changes

Vedin added 3 commits June 22, 2017 15:07

Less alocation, change State

1972d8b

issue

6739e69

remove unnecessary

59f3981

ahsonkhan reviewed Jun 22, 2017

View reviewed changes

resolve

959a119

shiftylogic reviewed Jun 23, 2017

View reviewed changes

Vedin added 2 commits June 23, 2017 11:41

clean up changes, change state

7f8b3d4

merge conflicts

d4ddd1f

shiftylogic reviewed Jun 23, 2017

View reviewed changes

Vedin added 4 commits June 23, 2017 13:10

resolve

38ef5ec

guidelines

649f42c

fix bug

516f4c6

resolve issues

e1aa814

shiftylogic approved these changes Jun 26, 2017

View reviewed changes

change access

1120e4f

ahsonkhan reviewed Jun 26, 2017

View reviewed changes

ahsonkhan approved these changes Jun 26, 2017

View reviewed changes

small issues

2f71a9b

Vedin merged commit 1a329f4 into dotnet:master Jun 27, 2017

Conversation

Vedin commented Jun 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dnfclas commented Jun 21, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahsonkhan Jun 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahsonkhan Jun 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Vedin Jun 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Vedin Jun 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Vedin commented Jun 21, 2017 •

edited

Loading

ahsonkhan Jun 21, 2017 •

edited

Loading

ahsonkhan Jun 21, 2017 •

edited

Loading

Vedin Jun 22, 2017 •

edited

Loading

Vedin Jun 22, 2017 •

edited

Loading

Vedin Jun 22, 2017 •

edited

Loading