Skip to content

[3.14]: gh-148284: Move stackref buffer to per-eval loop, and don't override PGO data.#148310

Open
Fidget-Spinner wants to merge 7 commits intopython:3.14from
Fidget-Spinner:stackref_scratch
Open

[3.14]: gh-148284: Move stackref buffer to per-eval loop, and don't override PGO data.#148310
Fidget-Spinner wants to merge 7 commits intopython:3.14from
Fidget-Spinner:stackref_scratch

Conversation

@Fidget-Spinner
Copy link
Copy Markdown
Member

@Fidget-Spinner Fidget-Spinner commented Apr 9, 2026

@Fidget-Spinner
Copy link
Copy Markdown
Member Author

See analysis here #138115 (comment). This reduces the interpreter loop's C stack consumption by roughly 1kb on recent versions of Clang.

@Fidget-Spinner Fidget-Spinner requested a review from colesbury April 9, 2026 20:04
Copy link
Copy Markdown
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't entirely understand the addition of the Py_EnterRecursiveCall() in _testcapi.pyobject_vectorcall. Is this needed to prevent a test from crashing?

In practice, we don't do Py_EnterRecursiveCall() checks before most PyObject_Vectorcall() calls and I wouldn't expect extensions to do so either. It seems more common that the called function does the check (i.e., possibly recursive reprs or when entering the interpreter)

@Fidget-Spinner
Copy link
Copy Markdown
Member Author

I don't entirely understand the addition of the Py_EnterRecursiveCall() in _testcapi.pyobject_vectorcall. Is this needed to prevent a test from crashing?

You're right. I removed it.

@Fidget-Spinner Fidget-Spinner changed the title [3.14]: gh-148284: Move stackref buffer to per-eval loop. Check for C recursion limit in testcapi [3.14]: gh-148284: Move stackref buffer to per-eval loop, and don't override PGO data. Apr 9, 2026
@Fidget-Spinner
Copy link
Copy Markdown
Member Author

Fidget-Spinner commented Apr 9, 2026

For those wondering why I removed _Py_HOT_FUNCTION for eval loop. I'm almost 100% certain it's a bug to include it for the eval loop.

On GCC, PGO overrides it https://gcc.gnu.org/onlinedocs/gcc/Common-Attributes.html#index-cold
So it's not useful there.

On Clang, it overrides PGO https://clang.llvm.org/docs/AttributeReference.html#hot
So it's actually a pessimization, as PGO is usually better data.
The bug in modern Clang now seems to be that if we override the hotness, it propagates to all called functions as well, causing everything to be unconditionally inlined all the way to leaf calls. Blowing up stack usage.

The only time it matters is when there's no PGO. In which case, I'm not sure why people are using that CPython for performance. @vstinner added it back in 2016 and said that to the effect of PGO should be preferred https://bugs.python.org/issue28618

Back in 2016, PGO was newer, now it's a decade later, and practically all perf-critical Python use PGO. We should remove to unbreak modern compilers.

@colesbury
Copy link
Copy Markdown
Contributor

That makes sense to me. One question: do you want to do the _Py_HOT_FUNCTION removal in a separate PR to main that you backport to 3.14? The rest of these changes are 3.14 specific, but that seems relevant to both 3.14 and main.

@Fidget-Spinner
Copy link
Copy Markdown
Member Author

That makes sense to me. One question: do you want to do the _Py_HOT_FUNCTION removal in a separate PR to main that you backport to 3.14? The rest of these changes are 3.14 specific, but that seems relevant to both 3.14 and main.

Good point. I will break that out into another PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants