Skip to content

Conversation

@jameysharp
Copy link
Contributor

The implementation of PRegSet::from(self.vcode.machine_env()) does a surprising amount of work: it lazily initializes a OnceLock in the backend, then loops over six vectors of registers and adds them all to the PRegSet.

We could likely implement that better, but in the meantime at least we can avoid repeating that work for every single machine instruction. The PRegSet itself only takes a few words to store so it's cheap to just keep it around.

I discovered this because when the call to self.vcode.machine_env() is in the middle of the loop, that prevents holding a mutable borrow on parts of self.vcode, which I want to be able to do in another PR.

The implementation of `PRegSet::from(self.vcode.machine_env())` does a
surprising amount of work: it lazily initializes a `OnceLock` in the
backend, then loops over six vectors of registers and adds them all to
the `PRegSet`.

We could likely implement that better, but in the meantime at least we
can avoid repeating that work for every single machine instruction. The
`PRegSet` itself only takes a few words to store so it's cheap to just
keep it around.

I discovered this because when the call to `self.vcode.machine_env()` is
in the middle of the loop, that prevents holding a mutable borrow on
parts of `self.vcode`, which I want to be able to do in another PR.
@jameysharp jameysharp requested a review from a team as a code owner April 24, 2024 06:30
@jameysharp jameysharp requested review from elliottt and removed request for a team April 24, 2024 06:30
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. labels Apr 24, 2024
@cfallin
Copy link
Member

cfallin commented Apr 24, 2024

This one really surprised me so I went looking at history to figure out if we ever measured it: it looks like first appeared in #6957; previously the PRegSet was computed once when the lowering backend was created; and we didn't benchmark that. The PR did note caching the MachineEnv but we missed caching the derived information.

Out of curiosity have you seen a perf improvement with this PR? (We should do it even if not, but it seems to me it should improve compile time)

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I'll go ahead and r+ since I dug into this (sorry to preempt reviewers!).

@jameysharp jameysharp added this pull request to the merge queue Apr 26, 2024
@jameysharp
Copy link
Contributor Author

According to hyperfine, this commit is 1.02 ± 0.01 times faster when running wasmtime compile -C cache=n ../sightglass/benchmarks/spidermonkey/benchmark.wasm. That's actually a bigger effect than I expected since I thought operand collection is a pretty small part of the overall time.

Merged via the queue into bytecodealliance:main with commit 4822904 Apr 26, 2024
@jameysharp jameysharp deleted the cache-allocatable branch April 26, 2024 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants