Allow Agent Reload with API - /v1/agent/reload#27106
Allow Agent Reload with API - /v1/agent/reload#27106benjamin-lykins wants to merge 21 commits intohashicorp:mainfrom
Conversation
tgross
left a comment
There was a problem hiding this comment.
@benjamin-lykins there's a failing test not touched here but clearly related to this work. Let's get that resolved before we review:
=== FAIL: command/agent TestServer_Reload_TLS_Certificate (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered, repanicked]
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x30fefa3]
goroutine 87962 [running]:
testing.tRunner.func1.2({0x3898300, 0x69a5e70})
/opt/hostedtoolcache/go/1.25.3/x64/src/testing/testing.go:1872 +0x237
testing.tRunner.func1()
/opt/hostedtoolcache/go/1.25.3/x64/src/testing/testing.go:1875 +0x35b
panic({0x3898300?, 0x69a5e70?})
/opt/hostedtoolcache/go/1.25.3/x64/src/runtime/panic.go:783 +0x132
github.com/hashicorp/nomad/command/agent.(*Agent).Reload(0xc00c83c700, 0xc00d476d08)
/home/runner/work/nomad/nomad/command/agent/agent.go:1533 +0x63
github.com/hashicorp/nomad/command/agent.TestServer_Reload_TLS_Certificate(0xc00a9f8540)
/home/runner/work/nomad/nomad/command/agent/agent_test.go:1197 +0x1d7
testing.tRunner(0xc00a9f8540, 0x42c8720)
/opt/hostedtoolcache/go/1.25.3/x64/src/testing/testing.go:1934 +0xea
created by testing.(*T).Run in goroutine 1
/opt/hostedtoolcache/go/1.25.3/x64/src/testing/testing.go:1997 +0x465
DONE 5542 tests, 12 skipped, 4 failures in 182.433s
I think I know what is causing this. I'll get it fixed. |
|
@tgross fixed the test, which then uncovered another broken test after another I changed the return response. both of those were cleaned up and looks happier now here. |
tgross
left a comment
There was a problem hiding this comment.
This is a good start @benjamin-lykins!
In addition to the comments I've left, make sure you've added a changelog entry file. You can do that via make cl. Once this is getting closer to being ready, we'll need a docs PR to go with this, which is now going to be over in the web-unified-docs repo.
command/agent/agent_endpoint.go
Outdated
|
|
||
| newConf := DefaultConfig() | ||
|
|
||
| for _, path := range currConf.Files { |
There was a problem hiding this comment.
If the -config flag in the command line arguments for this agent points to a directory, this Files field will contain only the files that were in that directory at the time we started the agent. If you add a new file, it won't be present on reload. We may need to figure out a way to get the original -config flag attached to the config for this to work.
There was a problem hiding this comment.
Thanks for catching this one, wasn't thinking of it or realized when I originally ran through it. Made some modifications to save the -config flag values and tested locally without issue. Tested if I passed over two -config as well and worked as expected. I was also able to reuse the logic in LoadConfig to check each path, which seems to work out well. Lastly, I tested adding files with bad configuration and got a decent response back, but I could also just send a canned message back
curl -k -X PUT \
https://localhost:4646/v1/agent/reload
Error loading /Users/benjamin.lykins/git/nomad/nomad/bin/config1/badfile.hcl: failed to decode HCL file /Users/benjamin.lykins/git/nomad/nomad/bin/config1/badfile.hcl: At 1:6: key 'asdf' expected start of object ('{') or assignment ('=')
command/agent/agent_endpoint.go
Outdated
|
|
||
| newConf.Files = append([]string(nil), currConf.Files...) | ||
|
|
||
| if err := s.agent.Reload(newConf); err != nil { |
There was a problem hiding this comment.
If you take a look at the signal handler for reloading, you'll see that this step only reloads the agent configuration and won't push any changes to the server, client, or HTTP server configuration. We should at least have parity with the signal handler here, which calls (*Command) handleReload. We may need to figure out a way to extract that function so that it's reachable here.
There was a problem hiding this comment.
This still needs to be done, and I'll start looking ahead at how to get this going. I saw this when I was mucking around with TLS rotations with server/client, and it was only updating the more general agent config.
…w config values/files
|
There was a problem hiding this comment.
Storing the config paths, if a path to a directory was used (not just the file names). If a new file is added, it gets picked up.
| // FullReload reloads the agent configuration including server, client, and HTTP server. | ||
| func (a *Agent) FullReload(newConfig *Config) error { | ||
| shouldReloadAgent, shouldReloadHTTP := a.ShouldReload(newConfig) | ||
| if shouldReloadAgent { | ||
| a.logger.Debug("starting reload of agent config") | ||
| if err := a.Reload(newConfig); err != nil { | ||
| a.logger.Error("failed to reload the config", "error", err) | ||
| return err | ||
| } | ||
| } | ||
|
|
||
| if s := a.Server(); s != nil { | ||
| if newConfig.Server == nil { | ||
| a.logger.Debug("skipping server reload - no server config in new configuration") | ||
| } else { | ||
| a.logger.Debug("starting reload of server config") | ||
| sconf, err := convertServerConfig(newConfig) | ||
| if err != nil { | ||
| a.logger.Error("failed to convert server config", "error", err) | ||
| return err | ||
| } | ||
|
|
||
| // Finalize the config to get the agent objects injected in | ||
| a.finalizeServerConfig(sconf) | ||
|
|
||
| // Reload the config | ||
| if err := s.Reload(sconf); err != nil { | ||
| a.logger.Error("reloading server config failed", "error", err) | ||
| return fmt.Errorf("reloading server config failed: %w", err) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if client := a.Client(); client != nil { | ||
| if newConfig.Client == nil { | ||
| a.logger.Debug("skipping client reload - no client config in new configuration") | ||
| } else { | ||
| a.logger.Debug("starting reload of client config") | ||
| clientConfig, err := convertClientConfig(newConfig) | ||
| if err != nil { | ||
| a.logger.Error("failed to convert client config", "error", err) | ||
| return err | ||
| } | ||
|
|
||
| // Finalize the config to get the agent objects injected in | ||
| if err := a.finalizeClientConfig(clientConfig); err != nil { | ||
| a.logger.Error("failed to finalize client config", "error", err) | ||
| return err | ||
| } | ||
|
|
||
| if err := client.Reload(clientConfig); err != nil { | ||
| a.logger.Error("reloading client config failed", "error", err) | ||
| return fmt.Errorf("reloading client config failed: %w", err) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // reload HTTP server after we have reloaded both client and server, in case | ||
| // we error in either of the above cases. For example, reloading the http | ||
| // server to a TLS connection could succeed, while reloading the server's rpc | ||
| // connections could fail. | ||
| if shouldReloadHTTP { | ||
| if err := a.reloadHTTPServers(); err != nil { | ||
| a.logger.Error("failed to reload HTTP servers", "error", err) | ||
| } | ||
| } | ||
|
|
||
| return nil | ||
| } | ||
|
|
There was a problem hiding this comment.
New method, which I pretty much just grabbing from what was in command.go. I was able to test and get it to work - atleast I haven't stumbled into any issues yet. Created a new test for this method as well, although I could expand it out to cover more specific scenarios (only client, only server, etc).
tgross
left a comment
There was a problem hiding this comment.
I think I see where you're going with this, but there's an architectural issue. See my comment for more details.
| @@ -1189,64 +1195,12 @@ func (c *Command) handleReload() error { | |||
| newConf.LogLevel = c.agent.GetConfig().LogLevel | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
Extracting all this below into its own function makes sense, but we've skipped over everything above this point. As it turns out, the readConfig function in particular is doing a lot of work around reading in the correct files and merging the configuration with the command line arguments. If we extract this into the agent FullReload without that work, we'll end up dropping any command line args that are used to set configuration.
My hunch is that it's just hard to extract all this because the section above is tied to Command and not Agent. What if instead of trying to remove the work below from command and sticking it in agent, we pass a closure from Command into Agent when it's constructed? That is, we keep all the logic in Command but pass Agent a function with a signature func() error that's called from both.
Ex.
func (c *Command) reloaderFunc() func() error {
return func() error {
return c.handleReload()
}
}func NewAgent(config *Config,
logger log.InterceptLogger, logOutput io.Writer,
inmem *metrics.InmemSink,
configReloader func() error,
) (*Agent, error) {
// ...
}
func (a *Agent) Reload() error {
return a.configReloader()
}You'd probably need to change handleReload so that it returns all its errors for the caller to run c.Ui with as needed (not all errors are fatal though, so that's a quirk). But that'd let you avoid all the extra validation code you're writing here -- it's already being done in the handleReload and if it's not we can add it there and get it for both code paths.
Description
Provide an API endpoint to reload agent config without elevated permissions on the infrastructure running Nomad.
Testing
Reloading log level without ACL
curl -k -X PUT \ https://localhost:4646/v1/agent/reload {"message":"agent configuration reloaded"}Attempting without token (ACL Enabled):
curl -k -X PUT \ https://localhost:4646/v1/agent/reload Permission deniedAttempting with token (ACL Enabled;
Agent = Write):Attempting with token (ACL Enabled;
Agent = Read):curl -k -X PUT \ --header "X-Nomad-Token: 2cde0074-d0ac-53e1-a6dc-3f416a8e1737" \ https://localhost:4646/v1/agent/reload Permission deniedInvalid Config File:
curl -k -X PUT \ --header "X-Nomad-Token: 79d77f6a-e168-3bbb-8fa7-e668b2b09346" \ https://localhost:4646/v1/agent/reload Error loading /Users/benjamin.lykins/git/nomad/nomad/bin/config/general.hcl: failed to decode HCL file /Users/benjamin.lykins/git/nomad/nomad/bin/config/general.hcl: At 3:16: literal not terminatedLinks
Fixes: #24048
Contributor Checklist
changelog entry using the
make clcommand.ensure regressions will be caught.
and job configuration, please update the Nomad website documentation to reflect this. Refer to
the website README for docs guidelines. Please also consider whether the
change requires notes within the upgrade guide.
Reviewer Checklist
backporting document.
in the majority of situations. The main exceptions are long-lived feature branches or merges where
history should be preserved.
within the public repository.
Changes to Security Controls
Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.