-
-
Notifications
You must be signed in to change notification settings - Fork 117
Allow partitioning sensors with hash ranges across multiple deployments instead of one-sensor-per-pod #218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
62a1015 to
bf37cc5
Compare
|
K. I think this is ready for review. I renamed |
177e7d2 to
9534789
Compare
|
Hmm. I think I can break this up into two PRs. I'll submit that in a few minutes. |
72e88db to
8135045
Compare
|
#221 and #222 should be merged before this one will be ready to review. Then, this PR can focus on the sensor partitioning methods. It turns out, by adding hash ranges, there are 3 sensor modes (Yes, there were 2 modes before, it just wasn't obvious, which is why I added #222). Quoting from the docs I just updated in this PR:
There was a default sensor container defined in So, this improves the user experience by
|
472d38e to
6347e7f
Compare
6347e7f to
a613a61
Compare
arm4b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I definitely like the idea that sensor partitioning is auto-detected just based on the number.
Though, it's definitely a complex feature: the implementation, configuration flow, adding even more moving pieces to an already complicated chart.
Historically, we didn't plan to implement something like this in the K8s for prod use, considering how K8s manages the Pods health, availability, self-healing, failover. And so one-sensor-for-container was considered as a production recommended way that's compatible with the K8s model.
Such a significant addition to the chart functionality needs a good testing and feedback.
It would be nice if someone from the community could try this PR and share their experience here, especially around the different sensor modes and how switching affects the environment.
Internally, we've had a script auto-calculating hash ranges on sensor pod startup since at least November 2017 with StackStorm 2.4.1 (I just looked through the git history). I hope this is useful for others as well, so I re-implemented that docker-entry-script using helm's template primitives.
In our environment, we provide the StackStorm platform and various team members can add packs that may or may not have sensors. They only need to learn the stackstorm layer, not the kubernetes layer as deploying StackStorm follows a completely different deployment model than everything else we deploy in kubernetes. So, defining the sensors is out-of-scope when installing StackStorm via the helm chart. All I can do is configure how many pods should be available for sensors, and other devs or devops people (who do not have access to deploy directly to kubernetes) will add their packs+sensors to StackStorm. hash ranges is the only method StackStorm supports for dynamic sensor partitioning. So, that's the method that I need to be able to install StackStorm in kubernetes. As far as "how K8s manages the Pods health, availability, self-healing, failover", I chose to use deployments instead of replicas for this because it takes advantage of all of these kubernetes native features. The hash_range is node-specific configuration, so configuring all sensor nodes within the same deployment would have caused headaches if any particular sensor node crashes or needs to be moved to a different host for some reason (ie k8s follows a LIFO approach to replicas, you can't easily restart individual replicas). So, one-sensor-per-pod is not the only method that takes advantage of all of these k8s health-management features, hash-range based some-sensors-per-pod mode also maximizes use of the k8s-features given the constraint that I can't know how many sensors need to run at StackStorm installation time.
Configuring hash range partitioning has been a complex feature to configure on its own. This method simplifies that significantly when people are doing HA in k8s. So, maybe people will start preferring the hash-range strategy over all-sensors-in-one-pod mode once this is merged. Do you know anyone who is providing StackStorm as a platform-style service? Could you ping them to see if they can test this? Very few people seem to be working with stackstorm-ha, so I'm not sure who we can get to provide the feedback you want. |
OK - I'm not sure how to add tests for this. Does anyone have any suggestions? Is there a way to query StackStorm for which partitions are active? I can use Maybe we can query kubernetes for the pods and check the annotations, but would such a test have any value? And how do I do kubernetes queries from within the bats tests? We don't have kubectl installed on the |
4c7d181 to
97d8c3b
Compare
766cc5c to
2d20051
Compare
0ab6d65 to
5b06aa8
Compare
cognifloyd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added unittests for this feature.
| {{- $one_sensor_per_pod := not ($.Values.st2.packs.sensors | empty) }} | ||
| {{- range ($one_sensor_per_pod | ternary ($.Values.st2.packs.sensors) (until 1)) }} | ||
| {{- $sensor := omit $.Values.st2sensorcontainer "name" "ref" "postStartScript" }} | ||
| {{- $some_sensors_per_pod := gt $sensor_deployments_count 1 }} | ||
| {{- if and $one_sensor_per_pod $some_sensors_per_pod }} | ||
| {{- fail "The sensor values are ambiguous. To use one-sensor-per-pod, use `st2.packs.sensors`. To use multiple-sensors-per-pod, use `st2sensorcontainer.deployments > 1`. Do not use both!" }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added a safety check to help users avoid ambiguous sensor configuration. If st2.packs.sensors is not empty, then st2sensorcontainer.deployments must be the default (1 or less).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
arm4b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the efforts making it possible 💯
| You can increase the number of `st2sensorcontainer` pods by increasing the number of deployments. | ||
| The replicas count is still only `1` per deployment, but the sensors are distributed between the deployments using | ||
| [Sensor Hash Range Partitioning](https://docs.stackstorm.com/reference/sensor_partitioning.html#hash). | ||
| The hash ranges are calculated automatically based on the number of deployments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs look really clear 👍
Hope more users will find it helpful to configure the sensor pods automatically.
|
Woohoo! Thanks for the review! |
This allows us to specify
.Values.st2sensorcontainer.deploymentsinstead of hard-coding each sensor pod in.Values.st2.packs.sensors. It uses the ST2 hash range partitioning method to enable this behavior.See: https://docs.stackstorm.com/reference/sensor_partitioning.html#hash
This may be helpful in private k8s clusters where hardware resources are constrained, so that only a few sensor containers can be running.
Props to @emptywee for figuring out how to calculate the hash ranges automatically. I translated what he did in a bash script into the helm chart.
TODO: