Skip to content

[Mono.Android] reduce GC bridge and reference logging overhead#11119

Open
simonrozsival wants to merge 14 commits intomainfrom
dev/simonrozsival/11104-reduce-gref-log-overhead
Open

[Mono.Android] reduce GC bridge and reference logging overhead#11119
simonrozsival wants to merge 14 commits intomainfrom
dev/simonrozsival/11104-reduce-gref-log-overhead

Conversation

@simonrozsival
Copy link
Copy Markdown
Member

@simonrozsival simonrozsival commented Apr 15, 2026

Summary

Reduce GC bridge and GREF startup overhead that was causing significant CPU waste during app startup on CoreCLR.

Changes

  • Remove WaitForGCBridgeProcessing on CoreCLR: The wait pattern is fundamentally flawed — even the Mono runtime acknowledges this with a FIXME in sgen-bridge.c: "The current usage pattern for this function is unsafe. Bridge processing could start immediately after unlock." A thread that passes the check can still race with bridge processing starting on another thread. The wait cannot prevent the race, only reduce its window. On CoreCLR, JNI wrapper threads hold their own handle copies via JniObjectReference and are not affected by the bridge swapping control_block handles. Verified with a .NET MAUI app under stress (rotation cycles, UI interaction, GC pressure) with zero crashes.

  • Decouple GREF/weak-GREF counters from logging: Counter increments (_monodroid_gref_inc/_dec) and log calls (_monodroid_gref_log_new/_delete) are now independent — counters always run, logging is conditional. No more if (logging) { log_which_also_counts(); } else { count_only(); } branching.

  • Trim-aware feature switch: Add Microsoft.Android.Runtime.RuntimeFeature.ObjectReferenceLogging (controlled by MSBuild property AndroidEnableObjectReferenceLogging, defaulted to false) so the linker can eliminate the entire logging code path including stack trace capture.

  • Lightweight counter P/Invokes: The non-logging GREF/weak-GREF path uses cheap atomic counter entrypoints instead of the heavy logging-oriented native exports.

Startup trace comparison

Traces collected with dotnet-trace on a .NET MAUI app (TestDummy) running on an Android emulator, compared with dotnet-trace report topN --number 20.

Exclusive time — Before

# Method Inclusive Exclusive
1 Interop+Sys.LowLevelMonitor_TimedWait 44.81% 44.81%
2 LowLevelMonitor.WaitCore() 26.24% 26.24%
3 IL_STUB_PInvoke (interop stub) 14.25% 14.25%
4 GC.RunFinalizers() 6.54% 6.54%
5 RuntimeNativeMethods._monodroid_gref_log_delete 5.30% 5.30%

The top 5 methods account for 97.14% of all exclusive CPU time. All five are GC bridge wait / GREF logging overhead.

Exclusive time — After

# Method Inclusive Exclusive
1 JniEnvironment.CallNonvirtualBooleanMethod 5.60% 5.60%
2 JniEnvironment.CallStaticObjectMethod 6.32% 5.42%
3 GenericsHelpers.ClassWithSlotAndModule 4.56% 4.56%
4 JniEnvironment.CallObjectMethod 4.45% 4.45%
5 ContextExtensions.GetWindow 3.63% 3.63%

After the fix, the profile reflects actual app work (JNI calls, type resolution, view inflation). None of the original top-5 offenders appear anywhere in the top 100.

Testing

  • dotnet build src/Mono.Android/Mono.Android.csproj -v minimal -p:AndroidApiLevel=35
  • dotnet build src/native/native-clr.csproj -v minimal -p:AndroidSupportedTargetJitAbi=arm64-v8a
  • dotnet build src/native/native-mono.csproj -v minimal -p:AndroidSupportedTargetJitAbi=arm64-v8a
  • dotnet test tests/Microsoft.Android.Sdk.TrimmableTypeMap.Tests (323 passed)
  • dotnet test tests/Microsoft.Android.Sdk.TrimmableTypeMap.IntegrationTests (10 passed)
  • MAUI app stress test on emulator (10 rotation cycles, UI interaction) — zero crashes

Fixes #11104

simonrozsival and others added 2 commits April 15, 2026 19:40
Reduce the startup cost of JNI wrapper transitions and reference bookkeeping.

- add an idle fast path for CoreCLR bridge waits
- use lightweight native ref counter entrypoints when logging is off
- default object-reference logging off for trimmed apps
- route JNI wrappers through a direct static bridge fast path

Fixes #11104

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@simonrozsival simonrozsival marked this pull request as ready for review April 15, 2026 21:40
Copilot AI review requested due to automatic review settings April 15, 2026 21:40
@simonrozsival simonrozsival marked this pull request as draft April 15, 2026 21:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces .NET for Android startup overhead by avoiding expensive synchronization and logging work on hot JNI/GC-bridge paths, and by routing non-logging reference tracking through lightweight native counter entrypoints. It also introduces a trim-aware feature switch to disable object-reference logging by default for trimmed apps.

Changes:

  • Added an idle fast-path for GC bridge waits on CoreCLR (avoids semaphore waits on common JNI transitions).
  • Introduced lightweight native _monodroid_{gref,weak_gref}_{inc,dec} entrypoints and updated runtime/native code paths to use them when reference logging is disabled.
  • Added RuntimeFeature.ObjectReferenceLogging and MSBuild/runtimeconfig wiring, defaulting it off for PublishTrimmed=true, plus test coverage for the new behavior.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/Mono.Android-Tests/Mono.Android-Tests/System/AppContextTests.cs Verifies the new RuntimeFeature.ObjectReferenceLogging switch value via reflection.
tests/Mono.Android-Tests/Mono.Android-Tests/Mono.Android.NET-Tests.csproj Forces AndroidEnableObjectReferenceLogging=false for the test app to validate runtimeconfig behavior.
tests/Mono.Android-Tests/Mono.Android-Tests/Java.Interop/JnienvTest.cs Adds CoreCLR GC-bridge wait fast-path tests (idle and active cases).
src/Xamarin.Android.Build.Tasks/Microsoft.Android.Sdk/targets/Microsoft.Android.Sdk.RuntimeConfig.targets Adds trim-aware default and runtimeconfig option for ObjectReferenceLogging.
src/Mono.Android/Microsoft.Android.Runtime/RuntimeFeature.cs Introduces the ObjectReferenceLogging feature switch.
src/Mono.Android/Microsoft.Android.Runtime/JavaMarshalValueManager.cs Adds bridgeProcessingActive + WaitIfBridgeProcessing() idle fast-path and minor queue fast-path.
src/Mono.Android/Android.Runtime/AndroidRuntimeInternal.cs Centralizes bridge-processing wait logic with runtime-specific fast-paths.
src/Mono.Android/Android.Runtime/JNIEnv.cs / JNINativeWrapper.g.* Routes bridge wait calls through AndroidRuntimeInternal.WaitForBridgeProcessing().
src/Mono.Android/Android.Runtime/Logger.cs Gates gref/lref logging on the new RuntimeFeature.ObjectReferenceLogging switch.
src/Mono.Android/Android.Runtime/AndroidRuntime.cs Uses new inc/dec native counters when logging is disabled; isolates expensive stacktrace capture behind non-inlined helpers.
src/Mono.Android/Android.Runtime/RuntimeNativeMethods.cs Adds P/Invoke declarations for _monodroid_{gref,weak_gref}_{inc,dec}.
src/native/* (mono + clr) Adds weak-gref inc/dec implementations and updates internal pinvoke tables/generators to export the new entrypoints.

simonrozsival and others added 2 commits April 16, 2026 00:02
…switch guards at call sites, add stress test

- Replace SemaphoreSlim + bridgeProcessingActive with ManualResetEventSlim
  for simpler, race-free bridge synchronization
- Guard GREF/LREF log branches with RuntimeFeature.ObjectReferenceLogging
  at call sites so the linker can eliminate dead logging code
- Add concurrent stress test for bridge-wait under contention

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace ManualResetEventSlim with a plain volatile int field and
SpinWait for the bridge processing gate. This avoids relying on
ManualResetEventSlim.Reset() being thread-safe under concurrent
access (which is not guaranteed by its contract). The volatile int
provides the cheapest possible fast path (single volatile read) and
SpinWait handles the rare blocking case efficiently.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@simonrozsival simonrozsival changed the title [Mono.Android] reduce GC bridge and GREF startup overhead [Mono.Android] reduce GC bridge and GREF logging overhead Apr 16, 2026
@simonrozsival simonrozsival changed the title [Mono.Android] reduce GC bridge and GREF logging overhead [Mono.Android] reduce GC bridge and reference logging overhead Apr 16, 2026
simonrozsival and others added 5 commits April 16, 2026 09:26
Counters and logging are now independent concerns:
- _monodroid_gref_inc/_dec always runs (cheap atomic counter)
- _monodroid_gref_log_new/_delete is pure logging (no counter side effect)
- _monodroid_gref_log_new return type changed from int to void

This simplifies both managed and native call sites — no more
branching between "log which also counts" vs "count only".

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove [NoInlining] method extraction — inline logging code at call sites
- Separate RuntimeFeature.ObjectReferenceLogging and Logger.LogGlobalRef into
  nested if statements for reliable trimmer recognition
- Remove RuntimeFeature check from Logger.LogGlobalRef/LogLocalRef — keep
  feature switch and runtime config as separate concerns
- Default ObjectReferenceLoggingEnabledByDefault to false
- Revert AndroidRuntimeInternal.WaitForBridgeProcessing — go back to the
  virtual call through JniEnvironment.Runtime.ValueManager
- Collapse SpinWait slow path inline — no measurable difference vs separate
  fast/slow methods
- Fix bridge-processing.cc counter/logging decoupling (was missing counter
  increment when logging was enabled)
- Fix auto -> void style for _monodroid_gref_log_new in CLR os-bridge

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Restore original JNINativeWrapper/JNIEnv/AndroidRuntimeInternal call
  patterns — these files now have zero diff from main
- Remove unnecessary WriteLocalReferenceLine/WriteGlobalReferenceLine
  guards (callers already check)
- Inline WaitIfBridgeProcessing directly into WaitForGCBridgeProcessing
- Remove unnecessary CollectedContexts.IsEmpty check (TryDequeue handles it)
- Move counter inc/dec out of bridge-processing.cc log_ methods into
  take_global_ref/take_weak_global_ref call sites

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace volatile int + SpinWait with ManualResetEventSlim. MRES provides
a cheap volatile-read fast path via IsSet, spins briefly, then falls back
to Monitor.Wait for efficient blocking without CPU waste.

The documented thread-safety limitation of Reset() is about concurrent
Reset()/Set() calls racing. In our case Reset() and Set() are only ever
called from the serialized GC bridge thread (BridgeProcessingStarted and
BridgeProcessingFinished), so this race cannot occur.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The wait pattern is fundamentally flawed — even Mono runtime acknowledges
this with a FIXME in sgen-bridge.c: "The current usage pattern for this
function is unsafe. Bridge processing could start immediately after
unlock."

A thread that checks the gate and finds it open can still race with
bridge processing that starts on another thread immediately after. The
wait cannot prevent the race, only reduce its window.

On CoreCLR, JNI wrapper threads hold their own handle copies via
JniObjectReference and are not affected by the bridge swapping
control_block handles. Verified by running a .NET MAUI app with an
empty WaitForGCBridgeProcessing under stress (10 rotation cycles,
UI interaction, GC pressure) with zero crashes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@simonrozsival simonrozsival marked this pull request as ready for review April 16, 2026 14:08
@simonrozsival simonrozsival added Area: CoreCLR Issues that only occur when using CoreCLR. copilot `copilot-cli` or other AIs were used to author this labels Apr 16, 2026
Comment thread tests/Mono.Android-Tests/Mono.Android-Tests/Mono.Android.NET-Tests.csproj Outdated
simonrozsival and others added 2 commits April 17, 2026 14:00
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jonathanpeppers
Copy link
Copy Markdown
Member

/review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 20, 2026

Android PR Reviewer completed successfully!

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Review Summary

Verdict: ✅ LGTM

Found 0 errors, 0 warnings, 4 suggestions

Analysis

This is a well-structured performance optimization that cleanly separates GREF/WREF counter management from diagnostic logging:

  • Counter decoupling — The new _monodroid_{gref,weak_gref}_{inc,dec} P/Invokes run unconditionally (cheap atomic ops), while the logging code path is gated behind RuntimeFeature.ObjectReferenceLogging. This ensures counters stay accurate regardless of logging state, while eliminating the heavy logging overhead in the common case.

  • Trim-aware feature switch — The [FeatureSwitchDefinition] + RuntimeHostConfigurationOption with Trim="true" pattern is the correct approach for enabling the trimmer to dead-code-eliminate the entire logging code path (including StackTrace capture). The switch name, MSBuild key, and AppContext key are all consistent.

  • WaitForGCBridgeProcessing removal (CoreCLR) — The detailed comment explaining why this is safe is excellent. The semaphore-based wait pattern was indeed a window-narrowing optimization at best, not a correctness guarantee.

  • Mono/CLR parity — Both runtimes are updated consistently — same counter APIs, same log-guarding pattern, same function signature changes.

Suggestions

  • 💡 Unused using System.Reflection; in JnienvTest.cs (likely leftover from an earlier iteration).
  • 💡 Minor MSBuild XML indentation inconsistency ( /> vs />).
  • 💡 ReleaseLocalReference count fix (localReferenceCount-1localReferenceCount) — this is actually a bug fix since base.ReleaseLocalReference already decremented the counter.
  • 💡 Native log functions now read gc_gref_count directly instead of from the atomic increment return value, which is a benign diagnostic logging trade-off.

👍 The startup trace data is compelling — eliminating WaitForGCBridgeProcessing and GREF logging from the top-5 CPU consumers is a significant win. The native changes are clean, symmetric across both runtimes, and the new test coverage for the feature switch is appropriate.

CI: Both public checks (license/cla, dotnet-android) are ✅ passing.


Review generated by android-reviewer from review guidelines.

Generated by Android PR Reviewer for issue #11119 · ● 7.6M

var tname = Thread.CurrentThread.Name;
var tid = Thread.CurrentThread.ManagedThreadId;
var from = new StackTrace (true).ToString ();
RuntimeNativeMethods._monodroid_lref_log_delete (localReferenceCount, value.Handle, (byte) 'L', tname, tid, from, 1);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 💡 Bug fix (positive) — Nice catch: the old code logged localReferenceCount - 1 here, but base.ReleaseLocalReference (called on line 173) already decrements localReferenceCount via the ref parameter. The old code was therefore logging a count that was one too low. Using localReferenceCount directly (already decremented by the base) is correct.

This is consistent with DeleteLocalReference above, which correctly uses localReferenceCount - 1 because the base hasn't been called yet at that point.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Copilot] Thanks for the positive note! Correct — base.ReleaseLocalReference already decrements localReferenceCount via the ref parameter, so the old localReferenceCount - 1 was double-counting the decrement. The current code correctly logs localReferenceCount directly after the base call.

@@ -1,5 +1,6 @@
using System;
using System.Linq;
using System.Reflection;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 💡 Code organization — This using System.Reflection; directive appears unused — no types from System.Reflection (BindingFlags, MethodInfo, PropertyInfo, etc.) are referenced anywhere in this file. Likely a leftover from an earlier iteration. Consider removing it.

Rule: Remove unused code (Postmortem #58)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Copilot] Good catch — removed the unused using System.Reflection; directive.

Condition="'$(_AndroidIsAssignableFromCheck)' != ''"
Value="$(_AndroidIsAssignableFromCheck)"
Trim="true"
/>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 💡 Formatting — This closing /> has 5 spaces of indentation instead of 4, which is inconsistent with the other self-closing tags in this file (e.g., line 46, line 65).

Suggested change
/>
/>

Rule: XML indentation consistency

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Copilot] Fixed — aligned the closing /> to 4 spaces to match the rest of the file.

void OSBridge::_monodroid_gref_log_new (jobject curHandle, char curType, jobject newHandle, char newType, const char *threadName, int threadId, const char *from) noexcept
{
int c = _monodroid_gref_inc ();
if ((log_categories & LOG_GREF) == 0) [[likely]] {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 💡 Performance — Minor observation: gc_gref_count is now read directly (plain load) rather than using the atomic return value from _monodroid_gref_inc(). Between the caller's _monodroid_gref_inc() and this read, another thread could modify the counter, so the logged value may differ from the actual count at the time of this specific increment.

This is perfectly fine for diagnostic logging — the intent is showing approximate state, not an exact snapshot. Just noting this is a deliberate trade-off of the decoupling, in case someone sees log output that doesn't look sequential.

Copy link
Copy Markdown
Member

@jonathanpeppers jonathanpeppers Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, we lost some synchronization here between the increment/decrement.

I don't actually know, but is there some C/C++ equivalent of Interlocked we could use here that would just make the number that gets logged slightly more reliable?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Copilot] Good point! The code already uses __sync_add_and_fetch / __sync_sub_and_fetch (GCC atomic builtins, equivalent to Interlocked.Increment/Decrement in C#) for the counter operations. The issue was that we were discarding the return value and reading gc_gref_count directly in the log functions.

Fixed by capturing the atomic return values from _monodroid_gref_inc()/_monodroid_gref_dec() etc., and passing them as grefc/gwrefc parameters to the log functions. This ensures the logged counts exactly match the state at the time of the atomic operation, with no window for another thread to modify the counter in between.

simonrozsival and others added 3 commits April 20, 2026 19:03
… globals

The _monodroid_gref_inc/dec functions return the count atomically, but
the log functions were reading gc_gref_count/gc_weak_gref_count directly.
Between the atomic op and the read, another thread could modify the
counter, making the logged value inaccurate.

Now the return values from the atomic increment/decrement operations are
captured and threaded through to the log functions as parameters, ensuring
the logged counts match the actual state at the time of the operation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove unused 'using System.Reflection' from JnienvTest.cs
- Fix closing '/>' indentation (5 spaces -> 4) in RuntimeConfig.targets

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bundle the atomic counter increment/decrement with the corresponding
log call into single helper methods (inc_gref_and_log, dec_gref_and_log,
inc_weak_gref_and_log, dec_weak_gref_and_log). This keeps take_global_ref
and take_weak_global_ref clean while still using the atomic return values
for accurate log output.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: CoreCLR Issues that only occur when using CoreCLR. copilot `copilot-cli` or other AIs were used to author this

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Perf] Investigate the impact of RuntimeNativeMethods._monodroid_*_log on startup performance

3 participants