benchdnn: add GatedMLP driver#4951
Draft
kwieloch-intel wants to merge 3 commits intouxlfoundation:mainfrom
Draft
benchdnn: add GatedMLP driver#4951kwieloch-intel wants to merge 3 commits intouxlfoundation:mainfrom
kwieloch-intel wants to merge 3 commits intouxlfoundation:mainfrom
Conversation
- Introduced Gated MLP driver with functionality for correctness checking and input verification. - Implemented configuration handling for Gated MLP, including data type and activation function settings. - Developed reference implementation for Gated MLP using existing oneDNN primitives (matmul, eltwise). - Added auxiliary functions for activation string conversion and memory management. - Created test cases for various input shapes, including basic, LLM-scale, and CI configurations.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🚧 WIP — not ready for review
This PR adds a new benchdnn driver for the Gated MLP primitive, enabling correctness validation and performance benchmarking of the fused GatedMLP GPU kernel directly from the benchdnn application.
JIRA: MFDNN-14716
🔍 Problem description
oneDNN includes a fused GPU GatedMLP kernel (
ocl:ref:any) but has no dedicated benchdnn driver to test it. Other primitives (matmul, softmax, eltwise, etc.) all have benchdnn drivers. This means:--mode=F/Pinfrastructure.💡Proposed Solution
Implement a new benchdnn driver
--gated_mlpthat calls thednnl_gated_mlp_primitive_desc_create()API and validates GPU output against a CPU reference. The driver implements the full GatedMLP operation:f32,f16,bf16with per-tensor broadcast (--dt=f16or--dt=f16:f16:f16:f16:f32for SRC, W_GATE, W_UP, W_DOWN, DST).swish(default),gelu_erf,gelu_tanh.--stag,--wtag(shared for all 3 weight tensors),--dtag.MBxICxOC— all tensor shapes derived from 3 dimensions.ref_gated_mlp.cpp) generates gold data by composing existing oneDNN primitives on the CPU:tests/benchdnn/gated_mlp, zero changes tosrc/.tests/benchdnn:The GPU
ocl:ref:anygated_mlp implementation has a bug: executing multiple primitives with different shapes in the same process produces incorrect results (NaN, garbage) orCL_INVALID_KERNEL_ARGS(errcode -52). Each shape works correctly in isolation. This is a bug insrc/gpu/intel/gated_mlp/ref.hpp, not in this driver.The CI test files are limited to one shape per file as a workaround. Full multi-shape test configurations are included as comments with a TODO to uncomment once the GPU implementation is fixed.
📈 Results
DG2 (Intel Arc A770)
Each test suite is run as a separate process (one shape per invocation) to avoid the GPU sequential execution bug.
test_gated_mlp_smoke(f16 32x32x32)test_gated_mlp_ci(f32 128x256x512)test_gated_mlp_gpu(f16 64x896x4864)Example repro commands (click to open)
Pull Request Checklist
General
make testandmake test_benchdnn_*) pass locally for each commit?New features
Have you published an RFC for the new feature?(N/A — benchdnn driver only)Was the RFC approved?(N/A)doc/driver_gated_mlp.md)