Skip to content

graph: sdpa: support dropout seed/offset/prob in fused sdpa#4961

Open
TaoLv wants to merge 4 commits intomainfrom
lvtao/main/sdpa-dropout
Open

graph: sdpa: support dropout seed/offset/prob in fused sdpa#4961
TaoLv wants to merge 4 commits intomainfrom
lvtao/main/sdpa-dropout

Conversation

@TaoLv
Copy link
Copy Markdown
Contributor

@TaoLv TaoLv commented Apr 7, 2026

For SDPA forward with dropout seed/offset/prob.

SDPA backward will be fixed later

Update: SDPA backward is also fixed.

@TaoLv TaoLv requested a review from a team as a code owner April 7, 2026 03:42
@github-actions github-actions bot added the component:graph-api Codeowner: @oneapi-src/onednn-graph label Apr 7, 2026
@TaoLv
Copy link
Copy Markdown
Contributor Author

TaoLv commented Apr 7, 2026

Noticed correctness issue via benchdnn. Debugging...

# with fused sdpa kernel
$ _ONEDNN_GRAPH_SDPA_FORCE_PRIMITIVE=0 ./tests/benchdnn/benchdnn --graph --engine=gpu  --case=complex_fusion/mha/gqa-plain-training-fwd-w-dropout-bf16-f32.json
[COMPARE_STATS][DST]: trh=0 err_max_diff: 2.01562 err_max_rdiff:8.37618e+37 all_max_diff: 2.01562 all_max_rdiff:8.37618e+37
[COMPARE_STATS] Norm check is prohibited; error_to_total_ratio: 233469/262144; allowed_ratio: 256/262144;
Error: Function 'doit' at (/nfs/pdx/disks/hal9000/lvtao/oneDNN/tests/benchdnn/graph/graph.cpp:787) returned '1'
0:FAILED (errors:233469 total:262144) (3079 ms) __REPRO: --graph --engine=gpu --case=complex_fusion/mha/gqa-plain-training-fwd-w-dropout-bf16-f32.json
===========================================================
= Failed cases summary (--summary=no-failures to disable) =
===========================================================
0:FAILED (errors:233469 total:262144) (3079 ms) __REPRO: --graph --engine=gpu --case=complex_fusion/mha/gqa-plain-training-fwd-w-dropout-bf16-f32.json
============================
tests:1 passed:0 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:1 listed:0
total: 3.09s; create_pd: 0.09s (3%); create_prim: 0.96s (31%); fill: 0.00s (0%); execute: 0.01s (0%); compute_ref: 0.00s (0%); compare: 0.00s (0%);

# with primitive based kernel
$ _ONEDNN_GRAPH_SDPA_FORCE_PRIMITIVE=1 ./tests/benchdnn/benchdnn --graph --engine=gpu  --case=complex_fusion/mha/gqa-plain-training-fwd-w-dropout-
bf16-f32.json
0:PASSED (10122 ms) __REPRO: --graph --engine=gpu --case=complex_fusion/mha/gqa-plain-training-fwd-w-dropout-bf16-f32.json
tests:1 passed:1 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 10.12s; create_pd: 0.07s (1%); create_prim: 0.67s (7%); fill: 0.00s (0%); execute: 0.00s (0%); compute_ref: 0.00s (0%); compare: 0.00s (0%);

@TaoLv
Copy link
Copy Markdown
Contributor Author

TaoLv commented Apr 7, 2026

make test
set test_scope=NIGHTLY
disable benchdnn_all
enable benchdnn_graph

@TaoLv TaoLv force-pushed the lvtao/main/sdpa-dropout branch from c2806db to 3e7b8cd Compare April 7, 2026 06:38
@TaoLv
Copy link
Copy Markdown
Contributor Author

TaoLv commented Apr 7, 2026

make test
set test_scope=NIGHTLY
disable benchdnn_all
enable benchdnn_graph

@h-sadia
Copy link
Copy Markdown
Contributor

h-sadia commented Apr 7, 2026

We are not enabling mask here? Also, we will need a backport to v3.12 branch as well.

@TaoLv
Copy link
Copy Markdown
Contributor Author

TaoLv commented Apr 8, 2026

We are not enabling mask here? Also, we will need a backport to v3.12 branch as well.

Dropout mask output is not required for SDPA training in PyTorch.
Sure, I will backport these to rls-v3.12 once the correctness failures are addressed.

@TaoLv TaoLv force-pushed the lvtao/main/sdpa-dropout branch from 2376fbd to 851ad8e Compare April 9, 2026 01:14
@TaoLv TaoLv requested a review from a team as a code owner April 9, 2026 01:14
@github-actions github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Apr 9, 2026
@TaoLv
Copy link
Copy Markdown
Contributor Author

TaoLv commented Apr 9, 2026

make test
disable benchdnn_all
enable benchdnn_graph

@TaoLv
Copy link
Copy Markdown
Contributor Author

TaoLv commented Apr 9, 2026

Adding commits from #4969 for validation. Will rebase the PR once #4969 is landed.

@TaoLv TaoLv requested a review from a team as a code owner April 9, 2026 09:01
@github-actions github-actions bot added the component:tests Codeowner: @oneapi-src/onednn-arch label Apr 9, 2026
@TaoLv TaoLv force-pushed the lvtao/main/sdpa-dropout branch from 78e1fb8 to c28941c Compare April 9, 2026 09:07
Copy link
Copy Markdown
Contributor

@dzarukin dzarukin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Minor) It looks to me if the output_mask from dropout will be requested, the pattern won't be picked up. If that's the case, probably, would be good to reflect that in documentation or/and code comment. If this is false impression, then OK.

@TaoLv TaoLv force-pushed the lvtao/main/sdpa-dropout branch from c28941c to aa04588 Compare April 10, 2026 01:19
@github-actions github-actions bot removed the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Apr 10, 2026
@TaoLv
Copy link
Copy Markdown
Contributor Author

TaoLv commented Apr 10, 2026

make test
disable benchdnn_all
enable benchdnn_graph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:graph-api Codeowner: @oneapi-src/onednn-graph component:tests Codeowner: @oneapi-src/onednn-arch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants