Skip to content

Pod scheduling priority: Helm release could be stuck in a restart loop due Mongo/Rabbit clusters glued with a "Pending" state #11

@arm4b

Description

@arm4b

Moved from https://github.com/StackStorm/st2enterprise-dockerfiles/issues/80

Sometimes when deploying StackStorm cluster in K8s (especially with resource pressure), entire Helm deployment is stuck in a dead dependency loop: st2 pods keep restarting, rescheduling, recreating due to no MQ/DB connection, while Mongo and Rabbit cluster can't start because of st2 pods reschedule spam.

Note that our HA deployment with 3 nodes for each DB and MQ, 2 for each st2 service creates a cluster of minimum 30 Pods

The solution could be trying K8s Pod priority https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/ which is beta starting from K8s v1.11.

This way we can prioritize scheduling for MongoDB, RabbitMQ and etcd clusters before st2 services.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or requestgood first issueGood for newcomers

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions