Skip to content

Conversation

@morgolock
Copy link
Contributor

@morgolock morgolock commented Dec 2, 2025

This change moves all fixed Conv2D parameters (kernel shape, stride, padding, dilation, groups) and the input/output tensor dimensions into Vulkan specialization constants. By making these values compile-time constants, the backend can generate more optimized pipelines, eliminate generic fallback paths, and reduce dynamic indexing overhead. This significantly improves performance across large and compute-intensive convolution workloads.

cc @SS-JIA @manuelcandales @digantdesai @cbilgin

Copilot AI review requested due to automatic review settings December 2, 2025 13:16
@morgolock morgolock requested a review from SS-JIA as a code owner December 2, 2025 13:16
@pytorch-bot pytorch-bot bot added the module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/ label Dec 2, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 2, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16036

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 010d843 with merge base 1cb85ef (image):

NEW FAILURE - The following job has failed:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla
Copy link

meta-cla bot commented Dec 2, 2025

Hi @morgolock!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request migrates Vulkan Q8 Conv2D operations from using Uniform Buffer Objects (UBOs) to Vulkan specialization constants for all fixed Conv2D parameters (kernel shape, stride, padding, dilation, groups) and tensor dimensions. This architectural change enables the Vulkan compiler to perform compile-time optimizations, eliminate generic fallback paths, and reduce dynamic indexing overhead in the shaders.

Key changes:

  • Added GenerateSpecConstants function in C++ to create specialization constants from Conv2D parameters and tensor sizes
  • Replaced UBO parameter buffers with specialization constants across all Conv2D shader variants
  • Updated GLSL shaders to use specialization constants with flattened naming (e.g., conv2d_params_stride_x instead of conv2d_params.stride.x)

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
backends/vulkan/runtime/graph/ops/impl/QuantizedConvolution.cpp Implements GenerateSpecConstants function; removes UBO parameter buffers from all conv2d dispatch nodes and replaces with spec constants
backends/vulkan/test/custom_ops/utils.cpp Contains commented-out exception throw (debug code that should be removed)
backends/vulkan/test/custom_ops/q8ta_q8csw_q8to_conv2d.cpp Contains preprocessor-guarded test configuration changes and commented-out validation (debug code)
backends/vulkan/runtime/graph/ops/glsl/quantize_and_pack_im2col.glsl Migrates from UBO to specialization constants; reconstructs size vectors from spec constants
backends/vulkan/runtime/graph/ops/glsl/im2col_packed_int8_utils.glslh Updates to use flattened spec constant names instead of struct member access
backends/vulkan/runtime/graph/ops/glsl/im2col_packed_int8.glsl Migrates from UBO to specialization constants for Conv2D parameters and tensor sizes
backends/vulkan/runtime/graph/ops/glsl/im2col.glsl Migrates from UBO to specialization constants; removes struct member access syntax
backends/vulkan/runtime/graph/ops/glsl/conv2d_q8ta_q8csw_q8to_linear_tiled.glsl Migrates from UBO to specialization constants; contains commented-out input_sizes
backends/vulkan/runtime/graph/ops/glsl/conv2d_q8ta_q8csw_q8to.glsl Migrates from UBO to specialization constants; updates all parameter references
backends/vulkan/runtime/graph/ops/glsl/conv2d_q8ta_q8csw_linear_tiled.glsl Migrates from UBO to specialization constants for linear tiled convolution
backends/vulkan/runtime/graph/ops/glsl/conv2d_q8csw_linear_tiled.glsl Migrates from UBO to specialization constants; updates loop bounds and calculations
backends/vulkan/runtime/graph/ops/glsl/conv2d_q8_utils.glslh Updates utility functions to use flattened spec constant names
backends/vulkan/runtime/graph/ops/glsl/conv2d_pw_q8ta_q8csw_q8to_tiled.glsl Migrates pointwise convolution from UBO to specialization constants
backends/vulkan/runtime/graph/ops/glsl/conv2d_fp_im2col_block_load.glslh Updates block loading logic to use flattened spec constant names
backends/vulkan/runtime/graph/ops/glsl/conv2d_fp_im2col_block.glslh Updates im2col indexing calculations to use flattened spec constant names
backends/vulkan/runtime/graph/ops/glsl/conv2d_dw_q8ta_q8csw_q8to.glsl Migrates depthwise convolution from UBO to specialization constants
backends/vulkan/runtime/graph/ops/glsl/conv2d_dw_q8_utils.glslh Updates depthwise convolution utilities to use flattened spec constant names
backends/vulkan/runtime/graph/ops/glsl/col2im.glsl Migrates col2im operation from UBO to specialization constants

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 174 to 209
#if 0
OutInChannels(16, 8), // channels (out, in)
InputSize2D(21, 17), // input_size (h, w)
KernelSize(3, 3), // kernel
Stride(1, 1), // stride
Padding(1, 1), // padding
Dilation(1, 1), // dilation
2, // groups
#else
OutInChannels(128, 128),
InputSize2D(128, 128),
KernelSize(5, 5),
Stride(2, 2),
Padding(2, 2),
Dilation(1, 1),
1,
#endif
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test configuration has been changed using #if 0 preprocessor directives, leaving the old configuration commented out. This appears to be temporary debug code. Either remove the old configuration entirely, or if both configurations are needed, use a more maintainable approach such as creating separate test functions or configuration parameters.

Copilot uses AI. Check for mistakes.
Comment on lines 448 to 475
// throw std::invalid_argument(
// "One or more dimensions exceed the allowed limit for reference implementation.");
std::cout << "Reference implementation: computation may take some time for large tensors..." << std::endl;
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception throw for dimension limit validation has been commented out and replaced with a warning message. This removes an important safety check that prevents extremely slow reference implementation execution. This appears to be debug code that should not be committed. The exception should be restored to prevent unintended long-running tests.

Suggested change
// throw std::invalid_argument(
// "One or more dimensions exceed the allowed limit for reference implementation.");
std::cout << "Reference implementation: computation may take some time for large tensors..." << std::endl;
throw std::invalid_argument(
"One or more dimensions exceed the allowed limit for reference implementation.");

Copilot uses AI. Check for mistakes.
Comment on lines 618 to 652
// Execute test cases using the new framework with custom FLOP calculator
auto results = execute_test_cases(
#if 1
generate_quantized_conv2d_test_cases,
#else
generate_quantized_conv2d_easy_cases,
#endif
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case generator selection is controlled by #if 1 preprocessor directive with the alternative commented out. This appears to be temporary debug code. Either remove the unused alternative or use a more maintainable approach like a command-line flag or configuration parameter.

Suggested change
// Execute test cases using the new framework with custom FLOP calculator
auto results = execute_test_cases(
#if 1
generate_quantized_conv2d_test_cases,
#else
generate_quantized_conv2d_easy_cases,
#endif
// Select test case generator based on command-line argument
// Usage: ./q8ta_q8csw_q8to_conv2d [easy]
auto test_case_generator = generate_quantized_conv2d_test_cases;
if (argc > 1 && std::string(argv[1]) == "easy") {
test_case_generator = generate_quantized_conv2d_easy_cases;
}
// Execute test cases using the new framework with custom FLOP calculator
auto results = execute_test_cases(
test_case_generator,

Copilot uses AI. Check for mistakes.
Comment on lines 34 to 38
//${layout_declare_ubo(B, "ivec4", "output_sizes")}
// Sizes of the input image
${layout_declare_ubo(B, "ivec4", "input_sizes")}

${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")}
//${layout_declare_ubo(B, "ivec4", "input_sizes")}

//${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")}
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old UBO declarations are commented out but not removed. Since these are being replaced by specialization constants, the commented-out code should be completely removed to keep the codebase clean and avoid confusion.

Copilot uses AI. Check for mistakes.
//${layout_declare_ubo(B, "ivec4", "output_sizes")}
${layout_declare_ubo(B, "ivec4", "im2col_sizes")}
${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")}
//${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")}
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old UBO declarations are commented out but not removed. Since these are being replaced by specialization constants, the commented-out code should be completely removed to keep the codebase clean and avoid confusion.

Suggested change
//${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")}

Copilot uses AI. Check for mistakes.
* LICENSE file in the root directory of this source tree.
*/

#include <iostream>
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The #include <iostream> is only used within #if 0 debug code (lines 642-653). Since that debug code should be removed, this include should also be removed to avoid unnecessary dependencies in production code.

Suggested change
#include <iostream>

Copilot uses AI. Check for mistakes.
print_valuespec_data(output_spec, "ref output", true);

throw std::runtime_error("Correctness validation failed");
// throw std::runtime_error("Correctness validation failed");
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception throw has been commented out, which means correctness validation failures will no longer stop test execution with an error. This appears to be debug/development code that should not be committed. The proper behavior is to throw an exception when correctness validation fails.

Suggested change
// throw std::runtime_error("Correctness validation failed");
throw std::runtime_error("Correctness validation failed");

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely needs to be undone

Comment on lines 40 to 44
//${layout_declare_ubo(B, "ivec4", "input_sizes")}
// Sizes of the output image
${layout_declare_ubo(B, "ivec4", "output_sizes")}
//${layout_declare_ubo(B, "ivec4", "output_sizes")}

//${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")}
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old UBO declarations are commented out but not removed. Since these are being replaced by specialization constants, the commented-out code should be completely removed to keep the codebase clean and avoid confusion.

Copilot uses AI. Check for mistakes.
Comment on lines 47 to 49
//${layout_declare_ubo(B, "ivec4", "output_sizes")}
//${layout_declare_ubo(B, "ivec4", "input_sizes")}
//${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")}
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old UBO declarations are commented out but not removed. Since these are being replaced by specialization constants, the commented-out code should be completely removed to keep the codebase clean and avoid confusion.

Suggested change
//${layout_declare_ubo(B, "ivec4", "output_sizes")}
//${layout_declare_ubo(B, "ivec4", "input_sizes")}
//${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")}

Copilot uses AI. Check for mistakes.
Comment on lines 45 to 48
//${layout_declare_ubo(B, "ivec4", "output_sizes")}
//${layout_declare_ubo(B, "ivec4", "input_sizes")}
//${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")}

Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old UBO declarations are commented out but not removed. Since these are being replaced by specialization constants, the commented-out code should be completely removed to keep the codebase clean and avoid confusion.

Suggested change
//${layout_declare_ubo(B, "ivec4", "output_sizes")}
//${layout_declare_ubo(B, "ivec4", "input_sizes")}
//${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")}

Copilot uses AI. Check for mistakes.
@robell robell requested a review from perheld December 4, 2025 10:03
@morgolock morgolock force-pushed the vk_spec_consts branch 2 times, most recently from d902b0f to 960ac20 Compare December 8, 2025 10:43
Copilot AI review requested due to automatic review settings December 9, 2025 08:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 193 to 209
#if 0
OutInChannels(16, 8), // channels (out, in)
InputSize2D(21, 17), // input_size (h, w)
KernelSize(3, 3), // kernel
Stride(1, 1), // stride
Padding(1, 1), // padding
Dilation(1, 1), // dilation
2, // groups
#else
OutInChannels(128, 128),
InputSize2D(128, 128),
KernelSize(5, 5),
Stride(2, 2),
Padding(2, 2),
Dilation(1, 1),
1,
#endif
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The #if 0 preprocessor directive disables the original test configuration. This appears to be temporary debugging code that should not be committed. Please remove the #if 0 ... #else ... #endif block and restore the appropriate test configuration.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed #if 0

apply_bias = 0;
}

vkapi::SpecVarList spec_constants = GenerateSpecConstants(graph, conv_params, groups, output_image);
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect parameter passed to GenerateSpecConstants. The fourth parameter should be apply_bias, but output_image (a ValueRef) is being passed instead. This will not compile or will produce incorrect specialization constants. Should be: GenerateSpecConstants(graph, conv_params, groups, apply_bias)

Suggested change
vkapi::SpecVarList spec_constants = GenerateSpecConstants(graph, conv_params, groups, output_image);
vkapi::SpecVarList spec_constants = GenerateSpecConstants(graph, conv_params, groups, apply_bias);

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid comment here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made changes as suggested

conv2d_params_dilation_x, conv2d_params_dilation_y,
conv2d_params_kernel_size_x, conv2d_params_kernel_size_y,
in_channels_per_group, out_channels_per_group,
K4_per_group, K4, K_per_group, logical_K, logical_K_per_group, groups
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect type for groups in the specialization constants list. The groups parameter is a ValueRef, but it should be converted to a uint32_t using graph.get_int(groups) before being added to the spec_constants list. This will cause a type mismatch. Should be: K4_per_group, K4, K_per_group, logical_K, logical_K_per_group, graph.get_int(groups)

Suggested change
K4_per_group, K4, K_per_group, logical_K, logical_K_per_group, groups
K4_per_group, K4, K_per_group, logical_K, logical_K_per_group, graph.get_int(groups)

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid comment - need to first extract the value of groups i.e.

uint32_t groups_val = graph.get_int(groups);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested using uint32_t groups_val = graph.get_int(groups);

* LICENSE file in the root directory of this source tree.
*/

#include <iostream>
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The #include <iostream> directive is added but doesn't appear to be used anywhere in this file. Consider removing this unused include.

Suggested change
#include <iostream>

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Copy link
Contributor

@SS-JIA SS-JIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some valid comments by the AI reviewer

${layout_declare_ubo(B, "ivec4", "output_sizes")}
${layout_declare_ubo(B, "ivec4", "input_sizes")}
${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")}
//${layout_declare_ubo(B, "Conv2DParams", "conv2d_params")}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

conv2d_params_dilation_x, conv2d_params_dilation_y,
conv2d_params_kernel_size_x, conv2d_params_kernel_size_y,
in_channels_per_group, out_channels_per_group,
K4_per_group, K4, K_per_group, logical_K, logical_K_per_group, groups
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid comment - need to first extract the value of groups i.e.

uint32_t groups_val = graph.get_int(groups);

apply_bias = 0;
}

vkapi::SpecVarList spec_constants = GenerateSpecConstants(graph, conv_params, groups, output_image);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid comment here as well

print_valuespec_data(output_spec, "ref output", true);

throw std::runtime_error("Correctness validation failed");
// throw std::runtime_error("Correctness validation failed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely needs to be undone

Copilot AI review requested due to automatic review settings December 11, 2025 18:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 17 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 459 to 461
throw std::invalid_argument(
"One or more dimensions exceed the allowed limit for reference implementation.");
std::cout << "Reference implementation: computation may take some time for large tensors..." << std::endl;
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line of code is unreachable because it appears after a throw statement on line 459. The exception will be thrown and execution will not reach this cout statement. Either the cout should be moved before the throw, or the throw should be removed if this was intended to be a warning rather than an error.

Suggested change
throw std::invalid_argument(
"One or more dimensions exceed the allowed limit for reference implementation.");
std::cout << "Reference implementation: computation may take some time for large tensors..." << std::endl;
std::cout << "Reference implementation: computation may take some time for large tensors..." << std::endl;
throw std::invalid_argument(
"One or more dimensions exceed the allowed limit for reference implementation.");

Copilot uses AI. Check for mistakes.
${layout_declare_tensor(B, "r", "t_bias", DTYPE, "buffer", is_scalar_array=False)}

${layout_declare_ubo(B, "ivec4", "output_sizes")}
//${layout_declare_ubo(B, "ivec4", "output_sizes")}
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output_sizes UBO is commented out but is still being used on line 91 where it's passed to make_block_extents(output_sizes). This will cause a compilation error as output_sizes is not defined. Either this UBO declaration should not be commented out, or the usage should be replaced with an alternative method to obtain output sizes.

Suggested change
//${layout_declare_ubo(B, "ivec4", "output_sizes")}
${layout_declare_ubo(B, "ivec4", "output_sizes")}

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings December 15, 2025 10:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 17 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

throw std::invalid_argument(
throw std::invalid_argument(
"One or more dimensions exceed the allowed limit for reference implementation.");
std::cout << "Reference implementation: computation may take some time for large tensors..." << std::endl;
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The std::cout statement on line 461 is unreachable code because the exception is thrown on lines 459-460 immediately before it. If the dimensions exceed the limit, the function will exit via the exception and never print the message. Either remove this line or move it before the throw statement if you want to log this information before throwing.

Suggested change
std::cout << "Reference implementation: computation may take some time for large tensors..." << std::endl;

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@SS-JIA SS-JIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for this optimization - it will be critical for deploying ET-VK effectively on Arm GPUs!

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 16, 2025
Copy link
Contributor

@SS-JIA SS-JIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting a temporary RC since I discovered that using specialization constants this heavily almost doubles model load time.

Let's discuss this further in tomorrow's meeting!

Copilot AI review requested due to automatic review settings December 17, 2025 18:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@SS-JIA
Copy link
Contributor

SS-JIA commented Dec 17, 2025

@morgolock it seems that you will need to run lintrunner before this PR can be merged to ensure that lint is clean 😅

To do so:

pip install lintrunner
pip install lintrunner_adapters

cd ~/executorch
lintrunner init
lintrunner lint --apply-patches --verbose -m main

…izes

This change moves all fixed Conv2D parameters (kernel shape, stride,
padding, dilation, groups) into Vulkan specialization constants. By making
these values compile-time constants, the backend can generate more optimized
pipelines, eliminate generic fallback paths, and reduce dynamic indexing overhead.
This significantly improves performance across large and compute-intensive
convolution workloads.

Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
Change-Id: I3efe3de80dece91341ae4111bef1254c6779a1db
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Dec 18, 2025

@SS-JIA has imported this pull request. If you are a Meta employee, you can view this in D89421309.

@SS-JIA SS-JIA merged commit 73ae813 into pytorch:main Dec 22, 2025
138 of 140 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants