Skip to content

Refactor graph traversal in GlobalEffects#8593

Open
stevenfontanella wants to merge 1 commit intomainfrom
refactor-global-effects-with-traversal
Open

Refactor graph traversal in GlobalEffects#8593
stevenfontanella wants to merge 1 commit intomainfrom
refactor-global-effects-with-traversal

Conversation

@stevenfontanella
Copy link
Copy Markdown
Member

@stevenfontanella stevenfontanella commented Apr 10, 2026

Refactor GlobalEffects to not compute the transitive call graph explicitly but instead aggregate effects as we go. This improves the runtime of the pass by 4.3% on calcworker (1.21792 s -> 1.16586 s averaged over 20 compilations). It also helps prepare the code for future changes to support effects for indirect calls.

Another potential future improvement here is to use SCC, which would let us stop processing children early in cases where there are no effects to update. Currently we can't do this because we add trap effects to potentially-recursive call loops, so even if no effects were updated, we need to keep going to find potential cycles.

@stevenfontanella stevenfontanella force-pushed the refactor-global-effects-with-traversal branch from 113c475 to a0ec0cf Compare April 10, 2026 22:50
@stevenfontanella stevenfontanella changed the title Refactor to use graph traversal Refactor graph traversal in GlobalEffects Apr 10, 2026
@stevenfontanella stevenfontanella force-pushed the refactor-global-effects-with-traversal branch from a0ec0cf to e7e6e33 Compare April 10, 2026 23:14
@stevenfontanella stevenfontanella marked this pull request as ready for review April 10, 2026 23:17
@stevenfontanella stevenfontanella requested a review from a team as a code owner April 10, 2026 23:17
@stevenfontanella stevenfontanella requested review from tlively and removed request for a team April 10, 2026 23:17
@tlively
Copy link
Copy Markdown
Member

tlively commented Apr 10, 2026

Interesting idea about using the SCCs to further optimize the propagation. We already have a SCC utility that would hopefully make this fairly easy to experiment with. And if we need to detect cycles to add an extra trapping effect, then we could just look for self-edges and nontrivial SCCs. But let's not do that here!

Copy link
Copy Markdown
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Glad to see that this approach speeds things up.

std::map<Function*, FuncInfo>& funcInfos) {

std::unordered_set<std::pair<Name, Name>> processed;
std::deque<std::pair<Name, Name>> work;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you compare using a deque with using a UniqueDeferredQueue or a stack vector?

In DAE2 I found that using a stack vector was significantly faster than using UniqueDeferredQueue, but I don't think I tried a deque.

Comment on lines +133 to +135
if (!calleeEffects) {
callerEffects.reset();
return;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this correct? The caller might have effects from its own body or from other callees.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nullopt here actually means "don't know" or "all effects" rather than "no effects". If we don't know the effects of the callee then we also don't know the effects of the caller.

I agree that this is misleading though. I think I can introduce a static method in EffectAnalyzer that gives back an EffectAnalyzer that contains all effects, that way we don't need an optional at all here and the logic stays more uniform, we just always merge unconditionally. Does that sound good?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds much simpler 👍

Comment on lines +145 to +148
if (callee == caller) {
auto& callerEffects = funcInfos.at(module.getFunction(caller)).effects;
if (callerEffects) {
callerEffects->trap = true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this does not handle mutual recursion the same way the previous implementation did. The previous implementation could find arbitrary recursion cycles by just looking at self edges in the transitive closure of the reverse call graph. But we don't have that transitive closure of the graph anymore, so looking for self edges is no longer sufficient to find nontrivial recursion cycles.

(So actually we may want to use the SCC utility to find cycles here after all 🫣)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is basically the same as the way we computed the transitive closure before, the difference is that we just don't materialize the graph of the transitive closure. I think the case you're saying is tested here (or let me know if you have a different case in mind).

In this case we have 1 -> 2 and 2 -> 1 in the work list. Then we pop off 1 -> 2 and push 1 -> 1 which ends up hitting the cycle detection.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see... but isn't this inefficient? If we have call graph A -> B -> C, after we merge C's effects into B and then merge B's effects into A, we know that merging C's effects into A would not change anything.

But landing this and then iterating with smaller changes sounds fine, especially since this already improves on the status quo.

}

for (const Name& callerCaller : callerCallers->second) {
if (processed.contains({callee, callerCaller})) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the bool returned from emplace to avoid this separate call to contains.

Comment on lines +152 to +154
// Even if nothing changed, we still need to keep traversing the callers
// to look for a potential cycle which adds a trap affect on the above
// lines.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would also be possible to look at whether anything was changed by either the cycle-handling code above or the effect merge here. If not, then we don't need to propagate further. (This will be even simpler if we find cycles and apply the trap effect as a separate step before computing this fixed point.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants