Skip to content

fix: proxy-webhook selector matches operator pods#3228

Open
bowling233 wants to merge 1 commit intotektoncd:mainfrom
ZJUSCT:main
Open

fix: proxy-webhook selector matches operator pods#3228
bowling233 wants to merge 1 commit intotektoncd:mainfrom
ZJUSCT:main

Conversation

@bowling233
Copy link

Changes

Fixes #3227

Both the tekton-operator and tekton-operator-proxy-webhook Deployments
label their Pods with name: tekton-operator. The
tekton-operator-proxy-webhook Service uses this same label as its only
selector, so it inadvertently load-balances traffic across both Deployments.
Because tekton-operator pods do not serve on port 8443, ~50% of admission
webhook requests fail with connection refused. Since the
MutatingWebhookConfiguration has failurePolicy: Fail, each failure
immediately rejects TaskRun Pod creation.

Changes:

  • cmd/kubernetes/operator/kodata/webhook/webhook.yaml: rename the
    proxy-webhook Deployment's matchLabels selector and pod template label
    from name: tekton-operator to name: tekton-operator-proxy-webhook;
    update the Service selector to match.
  • cmd/openshift/operator/kodata/webhook/webhook.yaml: same change for the
    OpenShift manifest.

The existing app: tekton-operator label is preserved on both Deployments.
No other resources are affected.

Alternative considered: adding a set-based (NotIn) expression to the
Service selector to exclude tekton-operator pods. This was not viable
because Kubernetes Services only support equality-based (matchLabels)
selectors.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

See the contribution guide for more details.

Note on tests: This bug only manifests at the Service routing layer
(i.e., ~50% of requests land on a pod with no server). There is no
in-tree unit or integration test that exercises which pods a Service
selects. A targeted e2e test verifying that the proxy-webhook Service
endpoints do not include tekton-operator pods would be a good addition,
but is left for a follow-up.

Release Notes

Fix: the tekton-operator-proxy-webhook Service selector incorrectly matched
tekton-operator pods in addition to proxy-webhook pods, causing ~50% of
admission webhook requests to fail with "connection refused" and TaskRun Pod
creation to be rejected. Users on v0.78.1 can work around this until upgrading
by adding `pod-template-hash: <webhook-pod-hash>` to the Service selector.

Both the `tekton-operator` and `tekton-operator-proxy-webhook`
Deployments label their Pods with `name: tekton-operator`. The
`tekton-operator-proxy-webhook` Service uses this same label as
its only selector, so it inadvertently load-balances traffic
across both Deployments. Because `tekton-operator` pods do not
serve on port 8443, ~50% of admission webhook requests fail:

  failed calling webhook "proxy.operator.tekton.dev":
  Post ".../tekton-operator-proxy-webhook.../defaulting":
  dial tcp <ClusterIP>:443: connect: connection refused

Because MutatingWebhookConfiguration has `failurePolicy: Fail`,
each such failure immediately rejects TaskRun Pod creation.

Rename the proxy-webhook Deployment's selector matchLabels and
pod template label from `name: tekton-operator` to
`name: tekton-operator-proxy-webhook`, and update the Service
selector to match. The `app: tekton-operator` label is left
unchanged. Applies to both Kubernetes and OpenShift manifests.

Adding a set-based (NotIn) expression to the Service selector
instead was not viable as Kubernetes Services only support
equality-based (matchLabels) selectors.
Copilot AI review requested due to automatic review settings February 19, 2026 10:19
@tekton-robot tekton-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Feb 19, 2026
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Feb 19, 2026

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: bowling233 / name: Baolin Zhu (49cacf1)

@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign anithapriyanatarajan after the PR has been reviewed.
You can assign the PR to them by writing /assign @anithapriyanatarajan in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 19, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a production routing bug where the tekton-operator-proxy-webhook Service selector unintentionally matched both proxy-webhook and main operator pods, causing intermittent admission webhook failures and rejected TaskRun Pod creation.

Changes:

  • Update the proxy-webhook Deployment selector + pod template label to use name: tekton-operator-proxy-webhook.
  • Update the proxy-webhook Service selector to match the new pod label (Kubernetes + OpenShift manifests).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
cmd/kubernetes/operator/kodata/webhook/webhook.yaml Aligns proxy-webhook Deployment/Service selectors to target only proxy-webhook pods on Kubernetes.
cmd/openshift/operator/kodata/webhook/webhook.yaml Same selector/label fix for the OpenShift manifest.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jkhelil
Copy link
Member

jkhelil commented Feb 22, 2026

@bowling233 , thank for your PR.

  • can you check what happens to existing clusters during upgrade? ( Install 0.78.1 and then apply your change)
    Please describe and post a proof that upgrade is working and not broken

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tekton-operator-proxy-webhook Service selector matches operator pods, causing ~50% webhook admission failures

4 participants