build_runner: fix race condition in dispatch_deps capacity reservation #30561

Merged
andrewrk merged 1 commit from erg/zig:fix-30014 into master 2025-12-22 03:54:57 +01:00
Contributor

Fix for #30014

The bug is that we reserve capacity based on a stale length read outside the mutex, then iterate over the current larger list inside the mutex.

To reproduce the crash:

  • check out https://codeberg.org/erg/zig/src/branch/fix-30014-repro or view here and make the changes locally: erg/zig@d279e47b4a
  • rebuild zig or just stage3 with rm -rf build/stage3 && cmake --build build --target stage3 && cd race_test && ../build/stage3/bin/zig build -j16 --maxrss 500000000
  • the actual repro is running your stage3 zig on race_test/build.zig with zig build -j16 --maxrss 500000000

It will crash if you haven't applied the patch in this PR, so apply the patch and rebuild your stage3 and rerun the race_test/build.zig

Race scenario:

  • ensureUnusedCapacity(run.memory_blocked_steps.items.len) outside the mutex (bug)
  • line 1346 can add items items run.memory_blocked_steps.append(gpa, s) catch @panic("OOM");
  • lock mutex and iterate for (memory_blocked_steps.items) |dep| { (bug: memory_blocked_steps could have grown)
  • appendAssumeCapacity(dep) but if dep > N we crash

ArrayList can grow here:

run.memory_blocked_steps.append(gpa, s) catch @panic("OOM");

I am liking zig so far. Thanks yall!

Fix for https://codeberg.org/ziglang/zig/issues/30014 The bug is that we reserve capacity based on a stale length read outside the mutex, then iterate over the current larger list inside the mutex. To reproduce the crash: - check out https://codeberg.org/erg/zig/src/branch/fix-30014-repro or view here and make the changes locally: https://codeberg.org/erg/zig/commit/d279e47b4ad2a69bc168ec5f6e4e787dfb36b653 - rebuild `zig` or just stage3 with `rm -rf build/stage3 && cmake --build build --target stage3 && cd race_test && ../build/stage3/bin/zig build -j16 --maxrss 500000000` - the actual repro is running your stage3 zig on `race_test/build.zig` with `zig build -j16 --maxrss 500000000` It will crash if you haven't applied the patch in this PR, so apply the patch and rebuild your stage3 and rerun the `race_test/build.zig` Race scenario: - `ensureUnusedCapacity(run.memory_blocked_steps.items.len)` outside the mutex (bug) - line 1346 can add items `items run.memory_blocked_steps.append(gpa, s) catch @panic("OOM");` - lock mutex and iterate `for (memory_blocked_steps.items) |dep| {` (bug: memory_blocked_steps could have grown) - `appendAssumeCapacity(dep)` but if dep > N we crash ArrayList can grow here: https://codeberg.org/erg/zig/src/commit/d279e47b4ad2a69bc168ec5f6e4e787dfb36b653/lib/compiler/build_runner.zig#L1346 I am liking zig so far. Thanks yall!
build_runner: fix race condition in dispatch_deps capacity reservation
All checks were successful
ci / aarch64-macos-release (push) Successful in 46m11s
ci / x86_64-linux-debug (push) Successful in 52m28s
ci / x86_64-linux-debug-llvm (push) Successful in 1h17m55s
ci / aarch64-linux-release (push) Successful in 1h17m43s
ci / s390x-linux-release (push) Successful in 1h18m50s
ci / aarch64-macos-debug (push) Successful in 1h25m57s
ci / aarch64-linux-debug (push) Successful in 1h49m3s
ci / loongarch64-linux-release (push) Successful in 2h1m10s
ci / s390x-linux-debug (push) Successful in 2h31m11s
ci / loongarch64-linux-debug (push) Successful in 2h38m6s
ci / riscv64-linux-debug (pull_request) Has been skipped
ci / riscv64-linux-release (pull_request) Has been skipped
ci / x86_64-freebsd-release (pull_request) Successful in 31m44s
ci / x86_64-windows-release (pull_request) Successful in 42m17s
ci / x86_64-windows-debug (pull_request) Successful in 44m37s
ci / x86_64-freebsd-debug (pull_request) Successful in 44m21s
ci / aarch64-macos-release (pull_request) Successful in 46m3s
ci / x86_64-linux-debug (pull_request) Successful in 58m10s
ci / x86_64-linux-debug-llvm (pull_request) Successful in 1h14m28s
ci / aarch64-linux-release (pull_request) Successful in 1h17m24s
ci / s390x-linux-release (pull_request) Successful in 1h24m15s
ci / aarch64-macos-debug (pull_request) Successful in 1h24m22s
ci / x86_64-linux-release (push) Successful in 2h17m45s
ci / aarch64-linux-debug (pull_request) Successful in 1h49m2s
ci / x86_64-linux-release (pull_request) Successful in 2h9m14s
ci / s390x-linux-debug (pull_request) Successful in 2h20m12s
ci / loongarch64-linux-release (pull_request) Successful in 1h51m32s
ci / loongarch64-linux-debug (pull_request) Successful in 2h46m45s
ci / riscv64-linux-release (push) Successful in 7h25m22s
ci / riscv64-linux-debug (push) Successful in 9h34m56s
6b9125cbe6
Move ensureUnusedCapacity inside the mutex call to prevent
a race condition where other worker threads could append to
memory_blocked_steps between checking the length and iterating.

repro branch: https://codeberg.org/erg/zig/src/branch/fix-30014-repro
check out that branch, and depending on if you have the fix-30014 patch or not,
it will either race condition or succeed

Fixes #30014
andrewrk approved these changes 2025-12-22 03:54:25 +01:00
andrewrk left a comment
Owner

Nice catch, thank you!

Nice catch, thank you!
andrewrk merged commit 6b9125cbe6 into master 2025-12-22 03:54:57 +01:00
Owner

Oops, I expected that to mark the PR as merge after CI checks succeeding. Will revert if it fails on master.

Oops, I expected that to mark the PR as merge after CI checks succeeding. Will revert if it fails on master.
Sign in to join this conversation.
No reviewers
No labels
abi/f32
abi/ilp32
abi/n32
abi/sf
abi/x32
accepted
arch/1750a
arch/21k
arch/6502
arch/a29k
arch/aarch64
arch/alpha
arch/amdgcn
arch/arc
arch/arc32
arch/arc64
arch/arm
arch/avr
arch/avr32
arch/bfin
arch/bpf
arch/clipper
arch/colossus
arch/cr16
arch/cris
arch/csky
arch/dlx
arch/dsp16xx
arch/elxsi
arch/epiphany
arch/fr30
arch/frv
arch/h8300
arch/h8500
arch/hexagon
arch/hppa
arch/hppa64
arch/i370
arch/i860
arch/i960
arch/ia64
arch/ip2k
arch/kalimba
arch/kvx
arch/lanai
arch/lm32
arch/loongarch32
arch/loongarch64
arch/m32r
arch/m68k
arch/m88k
arch/maxq
arch/mcore
arch/metag
arch/microblaze
arch/mips
arch/mips64
arch/mmix
arch/mn10200
arch/mn10300
arch/moxie
arch/mrisc32
arch/msp430
arch/nds32
arch/nios2
arch/ns32k
arch/nvptx
arch/or1k
arch/pdp10
arch/pdp11
arch/pj
arch/powerpc
arch/powerpc64
arch/propeller
arch/riscv32
arch/riscv64
arch/rl78
arch/rx
arch/s390
arch/s390x
arch/sh
arch/sh64
arch/sparc
arch/sparc64
arch/spirv
arch/spu
arch/st200
arch/starcore
arch/tilegx
arch/tilepro
arch/tricore
arch/ts
arch/v850
arch/vax
arch/vc4
arch/ve
arch/wasm
arch/we32k
arch/x86
arch/x86_16
arch/x86_64
arch/xcore
arch/xgate
arch/xstormy16
arch/xtensa
autodoc
backend/c
backend/llvm
backend/self-hosted
binutils
breaking
build system
debug info
docs
error message
frontend
fuzzing
incremental
lib/c
lib/compiler-rt
lib/cxx
lib/std
lib/tsan
lib/ubsan-rt
lib/unwind
linking
miscompilation
os/aix
os/android
os/bridgeos
os/contiki
os/dragonfly
os/driverkit
os/emscripten
os/freebsd
os/fuchsia
os/haiku
os/hermit
os/hurd
os/illumos
os/ios
os/kfreebsd
os/linux
os/maccatalyst
os/macos
os/managarm
os/netbsd
os/ohos
os/openbsd
os/plan9
os/redox
os/rtems
os/serenity
os/solaris
os/tvos
os/uefi
os/visionos
os/wali
os/wasi
os/watchos
os/windows
os/zos
proposal
release notes
testing
tier system
zig cc
zig fmt
bounty
bug
contributor-friendly
downstream
enhancement
infra
optimization
question
regression
upstream
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ziglang/zig!30561
No description provided.