-
Notifications
You must be signed in to change notification settings - Fork 821
Enable 16-bit activations in Cadence Quantizer For fully_connected and linear #15010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR needs a
|
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15010
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit 388201c with merge base a12219d ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ee7a0f7 to
408cbcd
Compare
…d linear (pytorch#15010) Summary: # Context We currently only support 8-bit for most operators. We would like to add generic ops for 16-bit activations, for the following ops: - quantized_fully_connected - quantized_linear - quantized_conv (all flavors) - quantized_matmul # This Diff Here, we add support for `quantized_linear` and `quantized_fully_connected`. We need to do the following: 1. Allow 16-bit activations in `quantized_fully_connected_out.cpp` and `quantized_linear_out.cpp`. 2. Allow 16-bit activations in `ref_implementations.py`, so tests can run with 16-bit activations to validate the quantization is correct. 3. Add a quantizer(`CadenceWith16BitLinearActivationsQuantizer`) for checking this works and create a unit test. Reviewed By: hsharma35 Differential Revision: D84284794
…d linear (pytorch#15010) Summary: # Context We currently only support 8-bit for most operators. We would like to add generic ops for 16-bit activations, for the following ops: - quantized_fully_connected - quantized_linear - quantized_conv (all flavors) - quantized_matmul # This Diff Here, we add support for `quantized_linear` and `quantized_fully_connected`. We need to do the following: 1. Allow 16-bit activations in `quantized_fully_connected_out.cpp` and `quantized_linear_out.cpp`. 2. Allow 16-bit activations in `ref_implementations.py`, so tests can run with 16-bit activations to validate the quantization is correct. 3. Add a quantizer(`CadenceWith16BitLinearActivationsQuantizer`) for checking this works and create a unit test. Reviewed By: DrJessop, hsharma35 Differential Revision: D84284794
408cbcd to
388201c
Compare
…d linear Differential Revision: D84284794 Pull Request resolved: pytorch#15010
Summary:
Context
We currently only support 8-bit for most operators. We would like to add generic ops for 16-bit activations, for the following ops:
This Diff
Here, we add support for
quantized_linearandquantized_fully_connected. We need to do the following:quantized_fully_connected_out.cppandquantized_linear_out.cpp.ref_implementations.py, so tests can run with 16-bit activations to validate the quantization is correct.CadenceWith16BitLinearActivationsQuantizer) for checking this works and create a unit test.Differential Revision: D84284794