Pod scheduling priority: Helm release could be stuck in a restart loop due Mongo/Rabbit clusters glued with a "Pending" state

> Moved from https://github.com/StackStorm/st2enterprise-dockerfiles/issues/80

Sometimes when deploying StackStorm cluster in K8s (especially with resource pressure), entire Helm deployment is stuck in a dead dependency loop: st2 pods keep restarting, rescheduling, recreating due to no MQ/DB connection, while Mongo and Rabbit cluster can't start because of st2 pods reschedule spam.

> Note that our HA deployment with 3 nodes for each DB and MQ, 2 for each st2 service creates a cluster of minimum `30` Pods

The solution could be trying K8s Pod priority https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/ which is beta starting from K8s `v1.11`.

This way we can prioritize scheduling for MongoDB, RabbitMQ and etcd clusters before st2 services.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Pod scheduling priority: Helm release could be stuck in a restart loop due Mongo/Rabbit clusters glued with a "Pending" state #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Pod scheduling priority: Helm release could be stuck in a restart loop due Mongo/Rabbit clusters glued with a "Pending" state #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions