Fix Squid proxy crash loop on container restart #3022

carlos-gn · 2025-12-12T19:51:42Z

Summary

When -egress or -ingress proxy containers are restarted via Docker (e.g., docker restart <workload>-egress), Squid enters an infinite crash loop because it finds a stale PID file from the previous instance.

Root Cause

Squid writes a PID file to /tmp/squid.pid on startup
On docker restart, Docker sends SIGTERM and Squid begins a 30-second graceful shutdown
Docker's default restart timeout is 10 seconds - it sends SIGKILL before Squid can clean up
Container restarts with the stale PID file still present
New Squid instance refuses to start, thinking another instance is running

This doesn't affect thv restart because ToolHive uses a 30-second stop timeout giving Squid enough time to gracefully shutdown and clean up the PID file.

Fix

Changed pid_filename /tmp/squid.pid to pid_filename none in the generated Squid config. Without a PID file, there's nothing to go stale.

Note: The PID file was only used by Squid internally to prevent multiple instances from running. Since Docker already ensures only one container process runs, the PID file is redundant. Nothing in ToolHive depends on it - the container healthcheck uses ps aux | grep squid, not the PID file.

References

Squid pid_filename documentation - confirms none is a valid value to disable PID file generation

Test plan

Run thv run <server> --isolate-network
Run docker restart <server>-egress
Verify docker logs <server>-egress shows clean startup (no FATAL errors)
Verify container survives multiple restarts
Run task test to ensure no regressions

codecov · 2025-12-12T19:54:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.82%. Comparing base (7bee0c8) to head (a90f25f).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3022   +/-   ##
=======================================
  Coverage   56.82%   56.82%           
=======================================
  Files         335      335           
  Lines       33474    33474           
=======================================
  Hits        19022    19022           
  Misses      12868    12868           
  Partials     1584     1584

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Squid proxy containers enter an infinite crash loop when restarted via Docker directly because the PID file persists across restarts. Change pid_filename from /tmp/squid.pid to none so Squid doesn't create a PID file that can become stale on abrupt container restarts. The PID file was only used by Squid internally to prevent multiple instances. Since Docker ensures only one container runs, it's redundant. The container healthcheck uses ps aux | grep squid, not the PID file. Fixes stacklok#2440 Signed-off-by: carlos <21148423+carlos-gn@users.noreply.github.com>

JAORMX · 2025-12-16T17:40:48Z

@claude can you review this against squid security best practices?

claude · 2025-12-16T17:41:29Z

Claude encountered an error —— View job

Failed with exit code 128

I'll analyze this and get back to you.

github-actions bot added the size/XS Extra small PR: < 100 lines changed label Dec 12, 2025

carlos-gn mentioned this pull request Dec 12, 2025

Ingress and egress containers go into error loop if restarted #2440

Closed

carlos-gn force-pushed the fix/squid-error-loop branch from 5f23ebf to 377828a Compare December 12, 2025 19:59

github-actions bot added size/XS Extra small PR: < 100 lines changed and removed size/XS Extra small PR: < 100 lines changed labels Dec 12, 2025

carlos-gn marked this pull request as ready for review December 12, 2025 20:03

Merge branch 'main' into fix/squid-error-loop

243221c

github-actions bot added size/XS Extra small PR: < 100 lines changed and removed size/XS Extra small PR: < 100 lines changed labels Dec 12, 2025

Merge branch 'main' into fix/squid-error-loop

2ff6e4d

github-actions bot added size/XS Extra small PR: < 100 lines changed and removed size/XS Extra small PR: < 100 lines changed labels Dec 13, 2025

Merge branch 'main' into fix/squid-error-loop

8400964

github-actions bot added size/XS Extra small PR: < 100 lines changed and removed size/XS Extra small PR: < 100 lines changed labels Dec 16, 2025

Merge branch 'main' into fix/squid-error-loop

a90f25f

github-actions bot added size/XS Extra small PR: < 100 lines changed and removed size/XS Extra small PR: < 100 lines changed labels Dec 17, 2025

JAORMX approved these changes Dec 17, 2025

View reviewed changes

JAORMX merged commit 0a7bacf into stacklok:main Dec 17, 2025
30 checks passed

carlos-gn deleted the fix/squid-error-loop branch December 17, 2025 14:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Squid proxy crash loop on container restart #3022

Fix Squid proxy crash loop on container restart #3022

Uh oh!

carlos-gn commented Dec 12, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 12, 2025 •

edited

Loading

Uh oh!

JAORMX commented Dec 16, 2025

Uh oh!

claude bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix Squid proxy crash loop on container restart #3022

Fix Squid proxy crash loop on container restart #3022

Uh oh!

Conversation

carlos-gn commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Fix

References

Test plan

Uh oh!

codecov bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JAORMX commented Dec 16, 2025

Uh oh!

claude bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

carlos-gn commented Dec 12, 2025 •

edited

Loading

codecov bot commented Dec 12, 2025 •

edited

Loading

claude bot commented Dec 16, 2025 •

edited

Loading