Skip to content

RPi5: Pipe IPC & GPU Performance Regression – Kernel 6.6 → 6.12 #7308

@Kletternaut

Description

@Kletternaut

Platform

  • Hardware: Raspberry Pi 5 (8 GB RAM), NVMe boot, active chassis cooling
  • Kernels compared: 6.6.74+rpt-rpi-2712 vs 6.12.62+rpt-rpi-2712
  • Both kernels tested on the same physical system — kernel 6.6.74 was booted via tryboot (sudo reboot "0 tryboot") with matching DTB and overlays extracted from the original kernel package. No reinstall, no hardware change.
  • Date: April 2026

Background

Over the past 1.5 years I have been developing rpicam-gui, a dual-camera preview and capture application in C for the Raspberry Pi 5, using two IMX477 sensors at 2028×1520 @ 40 fps, alongside an active RDP session (two monitors) and a PiSP-accelerated AI inference pipeline (Hailo AIKit).

When I started this project on kernel 6.6.x the system handled this workload at roughly ~80–83% CPU load — challenging but workable. Over time, as the kernel progressed through the 6.12.x series, the same workload became increasingly difficult to run: frame drops, sluggish UI response over RDP, and eventually the application reaching its practical limit on hardware it ran fine on before.

The benchmarks below document a reproducible regression, most prominently in pipe IPC throughput (−26%), with additional degradation in GPU rendering and scheduler latency. These metrics compound in a multi-threaded camera pipeline and collectively explain the observed real-world performance loss.


Test Conditions

  • Same Raspberry Pi 5 (8 GB), same NVMe drive, active chassis cooling
  • No thermal throttling occurred on either kernel (max 64.2 °C / 63.1 °C)
  • Overclocking identical on both: arm_freq=2800, over_voltage=4, gpu_freq=1000
  • RDP session active (xrdp 0.9.21.1) — matching real-world workload
  • All non-essential services stopped before each run

Note: Tests were performed on kernel 6.12.62, not the latest 6.12.x release.
I rely on a proprietary kernel module driver not yet recompiled for newer kernels,
so 6.12.62 was the highest testable version. Based on the consistent trend across
the 6.12.x series, the regression on the latest kernel is likely equal or worse.


Results

Benchmark                          Kernel 6.6.74       Kernel 6.12.62      Change
------------------------------------------------------------------------------------------
CPU single-thread (sysbench)       3191 ev/s           3189 ev/s           ≈ 0%
CPU multi-thread, 4 cores          12426 ev/s          12435 ev/s          ≈ 0%
RAM sequential read                64334 MiB/s         64569 MiB/s         ≈ 0%
RAM sequential write               14586 MiB/s         17102 MiB/s         +17%  (6.12 faster)
RAM random access                  959 MiB/s           874 MiB/s           -9%   (6.12 slower)
GPU glmark2-es2 total score        229                 222                 -3%
GPU bump scene                     318 FPS             299 FPS             -6%
GPU pulsar scene                   214 FPS             203 FPS             -5%
Context switching (stress-ng)      8648 ops/s          8581 ops/s          -0.8%
sched_yield latency                1905 ns/call        2054 ns/call        +7.8% (6.12 slower)
>>> Pipe IPC throughput <<<        2,353,000 ops/s     1,750,000 ops/s     -26%  <<<
>>> Pipe IPC throughput <<<        287 MB/s            214 MB/s            -26%  <<<
NVMe sequential write              355 MB/s            379 MB/s            +7%   (6.12 faster)
Max temperature (active cooling)   64.2 °C             63.1 °C             —
Thermal throttle events            0                   0                   —

Key Findings

  • CPU raw compute is identical — this is not a clock speed or compiler regression.
  • Pipe IPC throughput dropped 26% on 6.12.62. Linux pipes are the primary IPC mechanism between camera capture, ISP, preview and display threads in libcamera / rpicam-apps. A 26% drop here has a direct, measurable impact on any camera-intensive workload.
  • sched_yield latency increased 7.8% — relevant for tight producer/consumer loops such as frame delivery between threads.
  • GPU performance dropped 3–6% across glmark2-es2 scenes.
  • RAM random access dropped 9% — relevant for ISP frame buffer access patterns.

The individual numbers may seem small in isolation, but they compound in a multi-threaded camera pipeline and all point in the same direction: 6.12.x is slower for IPC-heavy, display-intensive workloads on the RPi5.


Reproduction

Dependencies:

sudo apt install sysbench stress-ng glmark2-es2

The benchmark script used for both runs is available at:
https://forums.raspberrypi.com/viewtopic.php?t=397426

Key commands for the critical metric:

# Pipe IPC throughput
stress-ng --pipe 4 --timeout 20s --metrics-brief

# Scheduler yield latency
stress-ng --yield 4 --timeout 30s --metrics-brief

Request

Could someone with kernel access investigate the pipe IPC throughput regression between 6.6.x and 6.12.x on the RPi5 platform?

A git bisect between the last 6.6.x and first 6.12.x rpi tag on the stress-ng --pipe metric would likely narrow this down quickly.

Full raw benchmark output files are available on request.

Thank you for the continued work on the Raspberry Pi kernel.

Steps to reproduce the behaviour

https://forums.raspberrypi.com/viewtopic.php?t=397426

Device (s)

Raspberry Pi 5

System

$ cat /etc/rpi-issue
Raspberry Pi reference 2024-07-04
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 48efb5fc5485fafdc9de8ad481eb5c09e1182656, stage4

$ vcgencmd version
2026/02/06 14:31:40
Copyright (c) 2012 Broadcom
version 8124798b (release) (embedded)

$ uname -a
Linux raspi5 6.12.62+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.12.62-1+rpt1~bookworm (2026-01-19) aarch64 GNU/Linux

$ raspinfo | pastebinit
/usr/local/bin/raspinfo: Zeile 103: tvservice: Kommando nicht gefunden.
https://pastebin.com/DP60BvRq

Logs

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions