-
Notifications
You must be signed in to change notification settings - Fork 822
Vulkan Q8 Conv2D: specialize shader on static parameters and tensor sizes #16036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16036
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Unrelated FailureAs of commit 010d843 with merge base 1cb85ef ( NEW FAILURE - The following job has failed:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Hi @morgolock! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
This PR needs a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request migrates Vulkan Q8 Conv2D operations from using Uniform Buffer Objects (UBOs) to Vulkan specialization constants for all fixed Conv2D parameters (kernel shape, stride, padding, dilation, groups) and tensor dimensions. This architectural change enables the Vulkan compiler to perform compile-time optimizations, eliminate generic fallback paths, and reduce dynamic indexing overhead in the shaders.
Key changes:
- Added
GenerateSpecConstantsfunction in C++ to create specialization constants from Conv2D parameters and tensor sizes - Replaced UBO parameter buffers with specialization constants across all Conv2D shader variants
- Updated GLSL shaders to use specialization constants with flattened naming (e.g.,
conv2d_params_stride_xinstead ofconv2d_params.stride.x)
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 17 comments.
Show a summary per file
| File | Description |
|---|---|
| backends/vulkan/runtime/graph/ops/impl/QuantizedConvolution.cpp | Implements GenerateSpecConstants function; removes UBO parameter buffers from all conv2d dispatch nodes and replaces with spec constants |
| backends/vulkan/test/custom_ops/utils.cpp | Contains commented-out exception throw (debug code that should be removed) |
| backends/vulkan/test/custom_ops/q8ta_q8csw_q8to_conv2d.cpp | Contains preprocessor-guarded test configuration changes and commented-out validation (debug code) |
| backends/vulkan/runtime/graph/ops/glsl/quantize_and_pack_im2col.glsl | Migrates from UBO to specialization constants; reconstructs size vectors from spec constants |
| backends/vulkan/runtime/graph/ops/glsl/im2col_packed_int8_utils.glslh | Updates to use flattened spec constant names instead of struct member access |
| backends/vulkan/runtime/graph/ops/glsl/im2col_packed_int8.glsl | Migrates from UBO to specialization constants for Conv2D parameters and tensor sizes |
| backends/vulkan/runtime/graph/ops/glsl/im2col.glsl | Migrates from UBO to specialization constants; removes struct member access syntax |
| backends/vulkan/runtime/graph/ops/glsl/conv2d_q8ta_q8csw_q8to_linear_tiled.glsl | Migrates from UBO to specialization constants; contains commented-out input_sizes |
| backends/vulkan/runtime/graph/ops/glsl/conv2d_q8ta_q8csw_q8to.glsl | Migrates from UBO to specialization constants; updates all parameter references |
| backends/vulkan/runtime/graph/ops/glsl/conv2d_q8ta_q8csw_linear_tiled.glsl | Migrates from UBO to specialization constants for linear tiled convolution |
| backends/vulkan/runtime/graph/ops/glsl/conv2d_q8csw_linear_tiled.glsl | Migrates from UBO to specialization constants; updates loop bounds and calculations |
| backends/vulkan/runtime/graph/ops/glsl/conv2d_q8_utils.glslh | Updates utility functions to use flattened spec constant names |
| backends/vulkan/runtime/graph/ops/glsl/conv2d_pw_q8ta_q8csw_q8to_tiled.glsl | Migrates pointwise convolution from UBO to specialization constants |
| backends/vulkan/runtime/graph/ops/glsl/conv2d_fp_im2col_block_load.glslh | Updates block loading logic to use flattened spec constant names |
| backends/vulkan/runtime/graph/ops/glsl/conv2d_fp_im2col_block.glslh | Updates im2col indexing calculations to use flattened spec constant names |
| backends/vulkan/runtime/graph/ops/glsl/conv2d_dw_q8ta_q8csw_q8to.glsl | Migrates depthwise convolution from UBO to specialization constants |
| backends/vulkan/runtime/graph/ops/glsl/conv2d_dw_q8_utils.glslh | Updates depthwise convolution utilities to use flattened spec constant names |
| backends/vulkan/runtime/graph/ops/glsl/col2im.glsl | Migrates col2im operation from UBO to specialization constants |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #if 0 | ||
| OutInChannels(16, 8), // channels (out, in) | ||
| InputSize2D(21, 17), // input_size (h, w) | ||
| KernelSize(3, 3), // kernel | ||
| Stride(1, 1), // stride | ||
| Padding(1, 1), // padding | ||
| Dilation(1, 1), // dilation | ||
| 2, // groups | ||
| #else | ||
| OutInChannels(128, 128), | ||
| InputSize2D(128, 128), | ||
| KernelSize(5, 5), | ||
| Stride(2, 2), | ||
| Padding(2, 2), | ||
| Dilation(1, 1), | ||
| 1, | ||
| #endif |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test configuration has been changed using #if 0 preprocessor directives, leaving the old configuration commented out. This appears to be temporary debug code. Either remove the old configuration entirely, or if both configurations are needed, use a more maintainable approach such as creating separate test functions or configuration parameters.
| // throw std::invalid_argument( | ||
| // "One or more dimensions exceed the allowed limit for reference implementation."); | ||
| std::cout << "Reference implementation: computation may take some time for large tensors..." << std::endl; |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception throw for dimension limit validation has been commented out and replaced with a warning message. This removes an important safety check that prevents extremely slow reference implementation execution. This appears to be debug code that should not be committed. The exception should be restored to prevent unintended long-running tests.
| // throw std::invalid_argument( | |
| // "One or more dimensions exceed the allowed limit for reference implementation."); | |
| std::cout << "Reference implementation: computation may take some time for large tensors..." << std::endl; | |
| throw std::invalid_argument( | |
| "One or more dimensions exceed the allowed limit for reference implementation."); |
| // Execute test cases using the new framework with custom FLOP calculator | ||
| auto results = execute_test_cases( | ||
| #if 1 | ||
| generate_quantized_conv2d_test_cases, | ||
| #else | ||
| generate_quantized_conv2d_easy_cases, | ||
| #endif |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test case generator selection is controlled by #if 1 preprocessor directive with the alternative commented out. This appears to be temporary debug code. Either remove the unused alternative or use a more maintainable approach like a command-line flag or configuration parameter.
| // Execute test cases using the new framework with custom FLOP calculator | |
| auto results = execute_test_cases( | |
| #if 1 | |
| generate_quantized_conv2d_test_cases, | |
| #else | |
| generate_quantized_conv2d_easy_cases, | |
| #endif | |
| // Select test case generator based on command-line argument | |
| // Usage: ./q8ta_q8csw_q8to_conv2d [easy] | |
| auto test_case_generator = generate_quantized_conv2d_test_cases; | |
| if (argc > 1 && std::string(argv[1]) == "easy") { | |
| test_case_generator = generate_quantized_conv2d_easy_cases; | |
| } | |
| // Execute test cases using the new framework with custom FLOP calculator | |
| auto results = execute_test_cases( | |
| test_case_generator, |
| //${layout_declare_ubo(B, "ivec4", "output_sizes")} | ||
| // Sizes of the input image | ||
| ${layout_declare_ubo(B, "ivec4", "input_sizes")} | ||
|
|
||
| ${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")} | ||
| //${layout_declare_ubo(B, "ivec4", "input_sizes")} | ||
|
|
||
| //${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")} |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old UBO declarations are commented out but not removed. Since these are being replaced by specialization constants, the commented-out code should be completely removed to keep the codebase clean and avoid confusion.
| //${layout_declare_ubo(B, "ivec4", "output_sizes")} | ||
| ${layout_declare_ubo(B, "ivec4", "im2col_sizes")} | ||
| ${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")} | ||
| //${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")} |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old UBO declarations are commented out but not removed. Since these are being replaced by specialization constants, the commented-out code should be completely removed to keep the codebase clean and avoid confusion.
| //${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")} |
| * LICENSE file in the root directory of this source tree. | ||
| */ | ||
|
|
||
| #include <iostream> |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The #include <iostream> is only used within #if 0 debug code (lines 642-653). Since that debug code should be removed, this include should also be removed to avoid unnecessary dependencies in production code.
| #include <iostream> |
| print_valuespec_data(output_spec, "ref output", true); | ||
|
|
||
| throw std::runtime_error("Correctness validation failed"); | ||
| // throw std::runtime_error("Correctness validation failed"); |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exception throw has been commented out, which means correctness validation failures will no longer stop test execution with an error. This appears to be debug/development code that should not be committed. The proper behavior is to throw an exception when correctness validation fails.
| // throw std::runtime_error("Correctness validation failed"); | |
| throw std::runtime_error("Correctness validation failed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely needs to be undone
| //${layout_declare_ubo(B, "ivec4", "input_sizes")} | ||
| // Sizes of the output image | ||
| ${layout_declare_ubo(B, "ivec4", "output_sizes")} | ||
| //${layout_declare_ubo(B, "ivec4", "output_sizes")} | ||
|
|
||
| //${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")} |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old UBO declarations are commented out but not removed. Since these are being replaced by specialization constants, the commented-out code should be completely removed to keep the codebase clean and avoid confusion.
| //${layout_declare_ubo(B, "ivec4", "output_sizes")} | ||
| //${layout_declare_ubo(B, "ivec4", "input_sizes")} | ||
| //${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")} |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old UBO declarations are commented out but not removed. Since these are being replaced by specialization constants, the commented-out code should be completely removed to keep the codebase clean and avoid confusion.
| //${layout_declare_ubo(B, "ivec4", "output_sizes")} | |
| //${layout_declare_ubo(B, "ivec4", "input_sizes")} | |
| //${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")} |
| //${layout_declare_ubo(B, "ivec4", "output_sizes")} | ||
| //${layout_declare_ubo(B, "ivec4", "input_sizes")} | ||
| //${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")} | ||
|
|
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old UBO declarations are commented out but not removed. Since these are being replaced by specialization constants, the commented-out code should be completely removed to keep the codebase clean and avoid confusion.
| //${layout_declare_ubo(B, "ivec4", "output_sizes")} | |
| //${layout_declare_ubo(B, "ivec4", "input_sizes")} | |
| //${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")} |
d902b0f to
960ac20
Compare
960ac20 to
b87778b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #if 0 | ||
| OutInChannels(16, 8), // channels (out, in) | ||
| InputSize2D(21, 17), // input_size (h, w) | ||
| KernelSize(3, 3), // kernel | ||
| Stride(1, 1), // stride | ||
| Padding(1, 1), // padding | ||
| Dilation(1, 1), // dilation | ||
| 2, // groups | ||
| #else | ||
| OutInChannels(128, 128), | ||
| InputSize2D(128, 128), | ||
| KernelSize(5, 5), | ||
| Stride(2, 2), | ||
| Padding(2, 2), | ||
| Dilation(1, 1), | ||
| 1, | ||
| #endif |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The #if 0 preprocessor directive disables the original test configuration. This appears to be temporary debugging code that should not be committed. Please remove the #if 0 ... #else ... #endif block and restore the appropriate test configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed #if 0
| apply_bias = 0; | ||
| } | ||
|
|
||
| vkapi::SpecVarList spec_constants = GenerateSpecConstants(graph, conv_params, groups, output_image); |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect parameter passed to GenerateSpecConstants. The fourth parameter should be apply_bias, but output_image (a ValueRef) is being passed instead. This will not compile or will produce incorrect specialization constants. Should be: GenerateSpecConstants(graph, conv_params, groups, apply_bias)
| vkapi::SpecVarList spec_constants = GenerateSpecConstants(graph, conv_params, groups, output_image); | |
| vkapi::SpecVarList spec_constants = GenerateSpecConstants(graph, conv_params, groups, apply_bias); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid comment here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made changes as suggested
| conv2d_params_dilation_x, conv2d_params_dilation_y, | ||
| conv2d_params_kernel_size_x, conv2d_params_kernel_size_y, | ||
| in_channels_per_group, out_channels_per_group, | ||
| K4_per_group, K4, K_per_group, logical_K, logical_K_per_group, groups |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect type for groups in the specialization constants list. The groups parameter is a ValueRef, but it should be converted to a uint32_t using graph.get_int(groups) before being added to the spec_constants list. This will cause a type mismatch. Should be: K4_per_group, K4, K_per_group, logical_K, logical_K_per_group, graph.get_int(groups)
| K4_per_group, K4, K_per_group, logical_K, logical_K_per_group, groups | |
| K4_per_group, K4, K_per_group, logical_K, logical_K_per_group, graph.get_int(groups) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valid comment - need to first extract the value of groups i.e.
uint32_t groups_val = graph.get_int(groups);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested using uint32_t groups_val = graph.get_int(groups);
| * LICENSE file in the root directory of this source tree. | ||
| */ | ||
|
|
||
| #include <iostream> |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The #include <iostream> directive is added but doesn't appear to be used anywhere in this file. Consider removing this unused include.
| #include <iostream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
b87778b to
78622ab
Compare
SS-JIA
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some valid comments by the AI reviewer
| ${layout_declare_ubo(B, "ivec4", "output_sizes")} | ||
| ${layout_declare_ubo(B, "ivec4", "input_sizes")} | ||
| ${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")} | ||
| //${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
| conv2d_params_dilation_x, conv2d_params_dilation_y, | ||
| conv2d_params_kernel_size_x, conv2d_params_kernel_size_y, | ||
| in_channels_per_group, out_channels_per_group, | ||
| K4_per_group, K4, K_per_group, logical_K, logical_K_per_group, groups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valid comment - need to first extract the value of groups i.e.
uint32_t groups_val = graph.get_int(groups);
| apply_bias = 0; | ||
| } | ||
|
|
||
| vkapi::SpecVarList spec_constants = GenerateSpecConstants(graph, conv_params, groups, output_image); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid comment here as well
| print_valuespec_data(output_spec, "ref output", true); | ||
|
|
||
| throw std::runtime_error("Correctness validation failed"); | ||
| // throw std::runtime_error("Correctness validation failed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely needs to be undone
78622ab to
a9228e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 16 out of 17 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| throw std::invalid_argument( | ||
| "One or more dimensions exceed the allowed limit for reference implementation."); | ||
| std::cout << "Reference implementation: computation may take some time for large tensors..." << std::endl; |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line of code is unreachable because it appears after a throw statement on line 459. The exception will be thrown and execution will not reach this cout statement. Either the cout should be moved before the throw, or the throw should be removed if this was intended to be a warning rather than an error.
| throw std::invalid_argument( | |
| "One or more dimensions exceed the allowed limit for reference implementation."); | |
| std::cout << "Reference implementation: computation may take some time for large tensors..." << std::endl; | |
| std::cout << "Reference implementation: computation may take some time for large tensors..." << std::endl; | |
| throw std::invalid_argument( | |
| "One or more dimensions exceed the allowed limit for reference implementation."); |
| ${layout_declare_tensor(B, "r", "t_bias", DTYPE, "buffer", is_scalar_array=False)} | ||
|
|
||
| ${layout_declare_ubo(B, "ivec4", "output_sizes")} | ||
| //${layout_declare_ubo(B, "ivec4", "output_sizes")} |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output_sizes UBO is commented out but is still being used on line 91 where it's passed to make_block_extents(output_sizes). This will cause a compilation error as output_sizes is not defined. Either this UBO declaration should not be commented out, or the usage should be replaced with an alternative method to obtain output sizes.
| //${layout_declare_ubo(B, "ivec4", "output_sizes")} | |
| ${layout_declare_ubo(B, "ivec4", "output_sizes")} |
a9228e7 to
a3f3d21
Compare
a3f3d21 to
c1067ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 16 out of 17 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| throw std::invalid_argument( | ||
| throw std::invalid_argument( | ||
| "One or more dimensions exceed the allowed limit for reference implementation."); | ||
| std::cout << "Reference implementation: computation may take some time for large tensors..." << std::endl; |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The std::cout statement on line 461 is unreachable code because the exception is thrown on lines 459-460 immediately before it. If the dimensions exceed the limit, the function will exit via the exception and never print the message. Either remove this line or move it before the throw statement if you want to log this information before throwing.
| std::cout << "Reference implementation: computation may take some time for large tensors..." << std::endl; |
c1067ec to
66d4f1a
Compare
SS-JIA
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for this optimization - it will be critical for deploying ET-VK effectively on Arm GPUs!
SS-JIA
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting a temporary RC since I discovered that using specialization constants this heavily almost doubles model load time.
Let's discuss this further in tomorrow's meeting!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
|
@morgolock it seems that you will need to run To do so: pip install lintrunner
pip install lintrunner_adapters
cd ~/executorch
lintrunner init
lintrunner lint --apply-patches --verbose -m main |
…izes This change moves all fixed Conv2D parameters (kernel shape, stride, padding, dilation, groups) into Vulkan specialization constants. By making these values compile-time constants, the backend can generate more optimized pipelines, eliminate generic fallback paths, and reduce dynamic indexing overhead. This significantly improves performance across large and compute-intensive convolution workloads. Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Change-Id: I3efe3de80dece91341ae4111bef1254c6779a1db
b624b7c to
010d843
Compare
This change moves all fixed Conv2D parameters (kernel shape, stride, padding, dilation, groups) and the input/output tensor dimensions into Vulkan specialization constants. By making these values compile-time constants, the backend can generate more optimized pipelines, eliminate generic fallback paths, and reduce dynamic indexing overhead. This significantly improves performance across large and compute-intensive convolution workloads.
cc @SS-JIA @manuelcandales @digantdesai @cbilgin