Skip to content

Conversation

@robtaylor
Copy link
Owner

  • Fix order-dependency in small-signal analysis from IndexSet::remove deprecation
  • CI: Enable integration tests
  • Fix non-deterministic compilation causing random CI test failures
  • Update snapshots
  • Fix LLVM 18 compatibility on macOS ARM64
  • CI: Update for LLVM 18 and enable integration tests
  • Remove architecture-specific values from OSDI integration test snapshots
  • Fix ld64.lld linking on macOS for LLVM 18
  • Fix ARM64 SIGSEGV: correct snprintf varargs calling convention

robtaylor and others added 9 commits November 25, 2025 13:00
…eprecation

The deprecated IndexSet::remove() method was replaced with swap_remove() in
the small-signal network analysis. However, this exposed a latent order-
dependency bug where the analysis results differed based on iteration order.

Root Cause:
The algorithm had a circular dependency where analyze_value() checks if
values are in small_signal_vals, but membership in that set depends on the
analysis results. With platform-specific hash ordering (ahash::RandomState),
this caused reactive/resistive contribution counts to swap on Windows MSYS2.

Solution:
Implemented an order-independent fixed-point algorithm with four phases:

1. Speculatively add ALL candidate nodes to small_signal_vals
2. Evaluate all candidates against this consistent set state
3. Remove speculative nodes that weren't confirmed
4. Add confirmed flows and remove resolved candidates

This ensures all candidates see the same set state during evaluation,
making the analysis deterministic regardless of iteration order while
still supporting circular dependencies (e.g., noise nodes).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The compiler was producing non-deterministic output due to use of
ahash's RandomState, which uses random seeding for its hash function.
This caused iteration order of hash-based collections to vary between
runs, leading to different compilation results and sporadic CI failures.

Root cause: GVN (Global Value Numbering) used ahash::RandomState for
hashing expressions. Different hash values across runs caused different
equivalence class assignments and leader selections, which cascaded:
1. GVN picks different instruction leaders non-deterministically
2. Values get replaced differently via replace_uses()
3. op_dependent_insts BitSet gets populated differently
4. determine_evaluation() makes different linearization decisions
5. Noise sources/implicit equations are created vs. linearized differently

This manifested as entire noise sources or implicit equations appearing
or disappearing between runs - not just ordering differences.

Changes:
- Replace ahash::RandomState with BuildHasherDefault<FxHasher> in GVN
- Replace AHashMap/AHashSet with IndexMap/IndexSet (with FxHasher) or
  HashMap/HashSet (with FxHasher) throughout the codebase
- Use FxHasher consistently for deterministic hashing
- Remove unused ahash dependencies from several crates

Affected crates: mir_opt, mir_autodiff, sim_back, osdi, hir_def,
hir_lower, basedb, mir, vfs, typed_indexmap, sourcegen

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@robtaylor robtaylor force-pushed the macos-fixes branch 2 times, most recently from 72f9320 to a3d2ada Compare November 26, 2025 22:40
arpadbuermen and others added 7 commits November 27, 2025 08:31
This commit fixes multiple issues related to building and linking with
LLVM 18 on macOS ARM64:

1. Use LLVM 18's clang for bitcode generation (build.rs)
   - Bitcode generated by Apple's clang is incompatible with LLVM 18
   - Now uses clang from LLVM_SYS_181_PREFIX when available

2. Configure ld64.lld linker with proper flags (linker/src/lib.rs)
   - Detect and use ld64.lld from LLVM 18 when available
   - Automatically add -syslibroot flag using xcrun --show-sdk-path

3. Simplify macOS target configuration (aarch64_apple_darwin.rs)
   - Removed -lSystem from post_link_args
   - With -undefined dynamic_lookup, symbols are resolved at runtime
   - Added -platform_version flags required by ld64.lld

4. Update README with macOS build instructions
   - Add macOS dependency setup section with Homebrew instructions
   - Document build-macos.sh and test-macos.sh convenience scripts
   - Add section explaining integration tests
   - Clarify how to enable and run integration tests

These changes ensure the compiler works correctly when built against
LLVM 18, fixing "Unsupported stack probing method" and "library not
found for -lSystem" errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Updates:
- Configure macOS CI to use llvm@18 instead of latest llvm
- Add xtask for setup on Mac. Unfortunatly we can't fix for
  straight cargo build/test without patching llvm-sys

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This change makes integration test snapshots platform-agnostic, allowing
cross-platform CI testing without spurious failures due to different struct
layouts on ARM64 vs x86-64.

Removed values:
- residual offset numbers (4 values per node)
- react_ptr offset values in jacobian entries
- instance_size and model_size values

These values are calculated by LLVM using platform-specific ABI alignment
rules (LLVMABISizeOfType and LLVMOffsetOfElement), causing differences
between architectures. All semantic information (parameter names, types,
node names, jacobian flags, etc.) is preserved.

All integration test snapshots have been regenerated.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
LLVM 18's ld64.lld requires -platform_version flags on both x86_64 and ARM64.
Also fixes CI to use correct GitHub runners (macos-14 for ARM64, macos-13 for x86_64).

Changes:
- Add -platform_version flags to x86_64_apple_darwin target spec
- Update CI matrix to use macos-14 for ARM64 builds (macos-13 is x86_64 only)

This fixes the linker error:
  ld64.lld: error: must specify -platform_version
  ld64.lld: error: missing or unsupported -arch x86_64

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Root cause: snprintf was declared with ALL parameters as variadic `(...)`
instead of having its first 3 parameters fixed: `(ptr, i64, ptr, ...)`.

On ARM64, fixed and variadic parameters use different calling conventions:
- Fixed parameters: passed in x0-x7 registers
- Variadic parameters: special handling, different register/stack allocation

When snprintf was declared as `i32 (...)`, LLVM generated code expecting
all arguments to use the variadic calling convention. This caused:
1. Arguments passed in wrong registers/memory locations
2. Double values (like 0x3ff0000000000000 = 1.0) interpreted as pointers
3. SIGSEGV when snprintf tried to strlen() these bogus pointer values

Fix: Changed intrinsics.rs line 100 to pass `args` instead of `&[]` to
`ty_variadic_func()`. Now snprintf is correctly declared as:
  declare i32 @snprintf(ptr, i64, ptr, ...)

This makes the first 3 parameters use the standard calling convention
while only format arguments use varargs convention.

Also removed duplicate parameter addition code in compilation_unit.rs
(lines 452-455) which was likely a failed attempt to work around the
incorrect function signature.

Fixes: Integration test crashes on ARM64 macOS with LLVM 18
Affects: BSIM3, BSIM4, and other models with large instance structs
Testing: All 28 integration tests now pass on ARM64 macOS

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
arpadbuermen and others added 8 commits December 4, 2025 13:17
Signed-off-by: Árpád Bűrmen <arpad.buermen@fe.uni-lj.si>
…eprecation

The deprecated IndexSet::remove() method was replaced with swap_remove() in
the small-signal network analysis. However, this exposed a latent order-
dependency bug where the analysis results differed based on iteration order.

Root Cause:
The algorithm had a circular dependency where analyze_value() checks if
values are in small_signal_vals, but membership in that set depends on the
analysis results. With platform-specific hash ordering (ahash::RandomState),
this caused reactive/resistive contribution counts to swap on Windows MSYS2.

Solution:
Implemented an order-independent fixed-point algorithm with four phases:

1. Speculatively add ALL candidate nodes to small_signal_vals
2. Evaluate all candidates against this consistent set state
3. Remove speculative nodes that weren't confirmed
4. Add confirmed flows and remove resolved candidates

This ensures all candidates see the same set state during evaluation,
making the analysis deterministic regardless of iteration order while
still supporting circular dependencies (e.g., noise nodes).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants