Skip to content
This repository was archived by the owner on Jan 30, 2020. It is now read-only.
This repository was archived by the owner on Jan 30, 2020. It is now read-only.

MachineMetadata constarints with %i interpolation doesn't work. #1446

@daniellowtw

Description

@daniellowtw

We're using the %i interpolation for dynamic scheduling. Our instance metadata looks as follows:

fleetctl list-machines | grep es-                                         
94356eb9... 10.200.0.6  image=coreos-beta-899-4-0-v20160121,instancename=eu1-staging-es-0,instancetype=elasticsearch,region=europe-west1-d
c8648469... 10.200.0.12 image=coreos-beta-899-4-0-v20160121,instancename=eu1-staging-es-2,instancetype=elasticsearch,region=europe-west1-d
d00b1eae... 10.200.0.8  image=coreos-beta-899-4-0-v20160121,instancename=eu1-staging-es-1,instancetype=elasticsearch,region=europe-west1-d

When we take a unit like:

[Unit]
Description=echo Service
Description=echo Service
After=etcd-cluster-ready.service docker.service dns-resolv-update.service
Requires=etcd-cluster-ready.service docker.service dns-resolv-update.service

[Service]
Restart=always
RestartSec=60
TimeoutSec=0
ExecStartPre=-/opt/bin/echo "whatever %i"
ExecStart=/opt/bin/echo "whatever %i"
ExecStop=-/opt/bin/echo "whatever %i"

[X-Fleet]
MachineMetadata="instancetype=elasticsearch" "instancename=%i"

And we submit two copies of it:
fleetctl start echo@eu1-staging-es-1.service echo@eu1-staging-es-2.service

We'd expect the echo@eu1-staging-es-1.service to be scheduled on d00b1eae and echo@eu1-staging-es-2.service to be scheduled on c8648469.

However, what we're seeing is:

Unit echo@eu1-staging-es-1.service inactive
Unit echo@eu1-staging-es-2.service inactive
Unit echo@eu1-staging-es-1.service launched on d00b1eae.../10.200.0.8
Unit echo@eu1-staging-es-2.service launched on d00b1eae.../10.200.0.8

This happens rarely, but it does happen. We couldn't reproduce it when the constraint instancename=%i was explicit "instancename=eu1-staging-es-2".

It's pretty puzzling, since we know that the only place that deals with Units of Jobs is the jobs.go and that's used by the engine and agents. The interpolation is done within it: https://github.com/coreos/fleet/blob/master/job/job.go#L179

Any ideas what this could be caused by?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions