-
Notifications
You must be signed in to change notification settings - Fork 299
MachineMetadata constarints with %i interpolation doesn't work. #1446
Description
We're using the %i interpolation for dynamic scheduling. Our instance metadata looks as follows:
fleetctl list-machines | grep es-
94356eb9... 10.200.0.6 image=coreos-beta-899-4-0-v20160121,instancename=eu1-staging-es-0,instancetype=elasticsearch,region=europe-west1-d
c8648469... 10.200.0.12 image=coreos-beta-899-4-0-v20160121,instancename=eu1-staging-es-2,instancetype=elasticsearch,region=europe-west1-d
d00b1eae... 10.200.0.8 image=coreos-beta-899-4-0-v20160121,instancename=eu1-staging-es-1,instancetype=elasticsearch,region=europe-west1-d
When we take a unit like:
[Unit]
Description=echo Service
Description=echo Service
After=etcd-cluster-ready.service docker.service dns-resolv-update.service
Requires=etcd-cluster-ready.service docker.service dns-resolv-update.service
[Service]
Restart=always
RestartSec=60
TimeoutSec=0
ExecStartPre=-/opt/bin/echo "whatever %i"
ExecStart=/opt/bin/echo "whatever %i"
ExecStop=-/opt/bin/echo "whatever %i"
[X-Fleet]
MachineMetadata="instancetype=elasticsearch" "instancename=%i"
And we submit two copies of it:
fleetctl start echo@eu1-staging-es-1.service echo@eu1-staging-es-2.service
We'd expect the echo@eu1-staging-es-1.service to be scheduled on d00b1eae and echo@eu1-staging-es-2.service to be scheduled on c8648469.
However, what we're seeing is:
Unit echo@eu1-staging-es-1.service inactive
Unit echo@eu1-staging-es-2.service inactive
Unit echo@eu1-staging-es-1.service launched on d00b1eae.../10.200.0.8
Unit echo@eu1-staging-es-2.service launched on d00b1eae.../10.200.0.8
This happens rarely, but it does happen. We couldn't reproduce it when the constraint instancename=%i was explicit "instancename=eu1-staging-es-2".
It's pretty puzzling, since we know that the only place that deals with Units of Jobs is the jobs.go and that's used by the engine and agents. The interpolation is done within it: https://github.com/coreos/fleet/blob/master/job/job.go#L179
Any ideas what this could be caused by?