Skip to content

Add SystemPerformanceInfo for compute platform telemetry#251

Merged
jp-pino merged 15 commits intomasterfrom
system-performance-telemetry
Apr 22, 2026
Merged

Add SystemPerformanceInfo for compute platform telemetry#251
jp-pino merged 15 commits intomasterfrom
system-performance-telemetry

Conversation

@follesoe
Copy link
Copy Markdown
Member

@follesoe follesoe commented Mar 5, 2026

Summary

Adds comprehensive system performance telemetry to support monitoring across both iMX6 (X3) and Jetson Orin NX 16 GB (X3 Ultra / X7) platforms — similar to what jtop, htop, and tegrastats provide.

New messages in message_formats.proto

Message Purpose
CpuCoreLoad Per-core CPU index, load (0..1), and clock frequency (MHz)
GpuInfo GPU load and frequency
DlaInfo Per-engine Deep Learning Accelerator load, frequency, and enabled state
MemoryInfo RAM total/used/cached (uint64 to handle 16 GB+)
ThermalZone Typed thermal zone reading using ThermalZoneId enum (TJ, CANISTER)
VideoCodecInfo NVENC/NVDEC encoder and decoder active status and frequency
SystemPerformanceInfo Composite message combining all of the above plus queue loads

PowerRailInfo — removed; INA3221 driver loaded but no devices bound on current carrier board.

New in telemetry.proto

  • SystemPerformanceInfoTel — telemetry wrapper for SystemPerformanceInfo

Deprecations

  • CPUInfo — superseded by SystemPerformanceInfo
  • CPUTemperature — superseded by SystemPerformanceInfo.thermal_zones

Both are kept intact for backward compatibility.

Wire size estimates

Jetson Orin NX (all fields populated: 8 cores, GPU, 2 DLA engines, 2 thermal zones):

Field Bytes
cpu_cores (x8) ~112
cpu_load_average 5
gpu ~17
dla_engines (x2) ~28
memory ~57
thermal_zones (x2) ~18
video_codec ~12
Queue loads (x6) 30
Total ~279 B

iMX6 (4 cores, 1 thermal zone, no GPU/DLA/codec): ~133 B

At 1-10 Hz publish rates this is well under 5 KB/s on the wire.

Design decisions

  • ThermalZoneId enum instead of string for thermal zones — saves ~25 bytes and provides type safety. Currently defines TJ (junction temp) and CANISTER; more zones can be added as needed.
  • uint64 for memory fields — uint32 would overflow at 4 GB, insufficient for the 16 GB Orin NX.
  • Platform-agnostic composite message — unpopulated fields are zero/empty by default in protobuf, so iMX6 simply omits GPU/DLA/codec fields with no overhead.

Add new protocol messages to support detailed performance monitoring
across both iMX6 (X3) and Jetson Orin NX (X3 Ultra/X7) platforms.

New messages: CpuCoreLoad, GpuInfo, DlaInfo, MemoryInfo, ThermalZone,
VideoCodecInfo, PowerRailInfo, and a composite SystemPerformanceInfo
with corresponding SystemPerformanceInfoTel telemetry wrapper.

Deprecates CPUInfo, CPUTemperature in favor of the new messages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@follesoe follesoe requested review from Copilot and jp-pino and removed request for jp-pino March 5, 2026 09:59
@follesoe follesoe self-assigned this Mar 5, 2026
@follesoe follesoe added the enhancement New feature or request label Mar 5, 2026
@follesoe follesoe added this to the Blunux v4.7 milestone Mar 5, 2026
@follesoe follesoe marked this pull request as ready for review March 5, 2026 10:00
@follesoe follesoe requested a review from jp-pino March 5, 2026 10:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new composite telemetry schema for system performance monitoring across compute platforms (iMX6 and Jetson Orin NX), introducing a richer replacement for legacy CPU-only telemetry while keeping backward compatibility.

Changes:

  • Added SystemPerformanceInfo and supporting messages/enums to message_formats.proto (CPU cores, GPU/DLA, memory, thermals, power rails, video codec, queue loads).
  • Added SystemPerformanceInfoTel wrapper to telemetry.proto.
  • Marked CPUInfo / CPUTemperature and CPUInfoTel as deprecated via comments pointing to the new message(s).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
protobuf_definitions/telemetry.proto Adds SystemPerformanceInfoTel and marks CPUInfoTel as deprecated in comments.
protobuf_definitions/message_formats.proto Introduces SystemPerformanceInfo and related component messages; annotates legacy CPU messages as deprecated.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread protobuf_definitions/telemetry.proto Outdated
Comment thread protobuf_definitions/message_formats.proto Outdated
- Capitalize "cpu" to "CPU" in telemetry deprecation comment for
  consistency with existing naming conventions.
- Rename cpu_load_average to cpu_utilization to avoid confusion with
  Linux load average (which is unbounded and reported as 1/5/15 min).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@jp-pino jp-pino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference this is the info we can probably get:

Image Image Image

Probably also good to report the current power mode

Comment thread protobuf_definitions/message_formats.proto Outdated
tegrastats and jtop only expose active/inactive status and clock
frequency for NVENC/NVDEC, not utilization percentages. Updated
fields from load floats to bool active + frequency pairs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@follesoe follesoe requested a review from jp-pino March 5, 2026 12:19
@jp-pino
Copy link
Copy Markdown
Contributor

jp-pino commented Apr 20, 2026

Latest changes summary

DlaInfo

  • Added bool enabled = 4; — reflects clk_enable_count >= 1 from debugfs clock tree (same approach as rbonghi/jetson_stats)
  • Frequency now documented as "0 when disabled"

PowerRailInfo removal

  • Removed PowerRailInfo message entirely
  • Used reserved 7; in SystemPerformanceInfo to prevent field number reuse
  • Rationale: Investigated all available sysfs power paths on Orin NX:
    • /sys/class/nvidia-gpu-power/ — no power metrics, just device class
    • /sys/class/nvidia-gpu-v2-power/ — same GPU device (load/freq only)
    • /sys/class/power_supply/ — empty
    • /sys/class/regulator/ — only configured voltage set-points, requested_microamps always 0
    • No INA current-sense hardware bound on carrier board

New queue load fields

  • Added camera_queue_load = 12, overlay_queue_load = 13, position_observer_queue_load = 14
  • All 6 EventsQueueWithStats instances now mapped to telemetry

jp-pino and others added 7 commits April 20, 2026 11:23
- Remove PowerRailInfo message (no INA sensors on current hardware)
- Use reserved 7 to prevent field reuse in SystemPerformanceInfo
- Add bool enabled to DlaInfo (clk_enable_count based)
- Add camera/overlay/position_observer queue load fields (12-14)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PowerRailInfo was never in a public release, so no need to reserve
the field number. Renumber remaining fields sequentially from 7.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add nvjpg_active/nvjpg_frequency_mhz (field 5-6) for JPEG engine
- Add vic_active/vic_frequency_mhz (field 7-8) for Video Image Compositor
- Update DlaInfo.enabled comment to reflect runtime_status approach

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reports the DLA Falcon microcontroller clock frequency alongside
the core clock.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use inline shell to sanitize branch names instead of the removed
third-party action.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GPU temperature is not available as a separate reading on Jetson;
thermal zone data already covers this via ThermalZone messages.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Only junction temperature is currently available. Remove unused
zone IDs (CPU, GPU, SOC, BOARD) and assign TJ = 1.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

Comments suppressed due to low confidence (1)

protobuf_definitions/message_formats.proto:1294

  • The CPUInfo header says it includes "memory usage", but the message only reports memory_bus_load (plus CPU + queue loads). Consider adjusting the description to "memory bus load" (or add actual RAM/swap usage fields if that’s what consumers expect) so the deprecation guidance is clear.
// CPU information (deprecated, use SystemPerformanceInfo instead).
//
// Contains information about the CPU load and memory usage of the drone.
message CPUInfo {
  float cpu_load = 1; // CPU load (0..1).
  float memory_bus_load = 2; // Memory bus load (0..1).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread protobuf_definitions/message_formats.proto
Comment thread .github/workflows/ci-dotnet.yaml
Comment thread protobuf_definitions/message_formats.proto
Comment thread protobuf_definitions/message_formats.proto Outdated
Comment thread protobuf_definitions/message_formats.proto
Comment thread protobuf_definitions/message_formats.proto Outdated
Comment thread protobuf_definitions/telemetry.proto
Comment thread protobuf_definitions/telemetry.proto
jp-pino and others added 3 commits April 21, 2026 19:57
Remove swap_total_bytes and swap_used_bytes from MemoryInfo since the
system does not use swap. Add THERMAL_ZONE_ID_CANISTER to ThermalZoneId.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- DlaInfo.enabled: remove implementation detail (runtime_status ref)
- VideoCodecInfo field comment: 'load' -> 'status' (has flags, not load)
- SystemPerformanceInfoTel: remove stale 'power rails' reference

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add vpu_active, vpu_frequency_mhz, vpu_codec_irq_count, and
vpu_jpg_irq_count fields for CODA VPU monitoring on i.MX platforms.
Existing Jetson fields unchanged (field numbers 1-8 preserved).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (2)

protobuf_definitions/message_formats.proto:1294

  • CPUInfo is documented as containing "CPU load and memory usage", but the message only has cpu_load, memory_bus_load, and queue loads (no RAM usage fields). Since this comment was touched in this PR (deprecation note), please update the remaining text to match the actual fields (e.g., "CPU load, memory bus load, and queue loads").
// CPU information (deprecated, use SystemPerformanceInfo instead).
//
// Contains information about the CPU load and memory usage of the drone.
message CPUInfo {
  float cpu_load = 1; // CPU load (0..1).
  float memory_bus_load = 2; // Memory bus load (0..1).

.github/workflows/ci-dotnet.yaml:30

  • The sanitization can produce an empty $SANITIZED (e.g., a branch name like _ becomes - then gets trimmed). That would make --version-suffix start with a dot (".123"), which is not a valid SemVer/NuGet pre-release label and will break dotnet pack. Add a fallback when $SANITIZED is empty (e.g., default to branch/ci) before writing the output.
        run: |
          BRANCH="${GITHUB_REF_NAME}"
          SANITIZED=$(echo "$BRANCH" | sed 's/[^a-zA-Z0-9]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//' | tr '[:upper:]' '[:lower:]' | cut -c1-63)
          echo "sanitized-branch-name=$SANITIZED" >> "$GITHUB_OUTPUT"

      - name: Build project and generate NuGet package
        run: |
          dotnet pack Blueye.Protocol.Protobuf.csproj --version-suffix "${{ steps.branches.outputs.sanitized-branch-name }}.$GITHUB_RUN_NUMBER" -c Release -o out

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jp-pino jp-pino merged commit a57cd21 into master Apr 22, 2026
8 checks passed
@jp-pino jp-pino deleted the system-performance-telemetry branch April 22, 2026 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants