Add CogVideoX diffusers-to-original format conversion script by Ricardo-M-L · Pull Request #13435 · huggingface/diffusers

Ricardo-M-L · 2026-04-08T17:38:45Z

Summary

Adds scripts/convert_cogvideox_to_original.py, a reverse conversion script that converts CogVideoX models (transformer and VAE) from diffusers format back to the original CogVideo checkpoint format
Reverses all weight name mappings, tensor concatenations (q/k/v merging), and adaln norm interleaving from the existing convert_cogvideox_to_diffusers.py
Supports both transformer and VAE components with fp16/bf16 precision options

How to use

# Convert transformer
python scripts/convert_cogvideox_to_original.py \
    --diffusers_model_path THUDM/CogVideoX-2b \
    --output_path ./cogvideox_original/transformer.pt \
    --component transformer

# Convert VAE
python scripts/convert_cogvideox_to_original.py \
    --diffusers_model_path THUDM/CogVideoX-2b \
    --output_path ./cogvideox_original/vae.pt \
    --component vae

# With fp16 precision
python scripts/convert_cogvideox_to_original.py \
    --diffusers_model_path THUDM/CogVideoX-5b \
    --output_path ./cogvideox_original/transformer.pt \
    --component transformer --fp16

Implementation details

The script reverses every transformation in convert_cogvideox_to_diffusers.py:

Forward (to diffusers)	Reverse (to original)
`query_key_value` chunked into `to_q`, `to_k`, `to_v`	`to_q/k/v` concatenated back into `query_key_value`
`adaln_layer.adaLN_modulations` split into `norm1.linear` + `norm2.linear`	`norm1/norm2.linear` interleaved back into `adaLN_modulations`
`query/key_layernorm_list` renamed to `attn1.norm_q/k`	`attn1.norm_q/k` renamed back to `query/key_layernorm_list`
All simple key renames via `TRANSFORMER_KEYS_RENAME_DICT`	Reversed via `TRANSFORMER_KEYS_RENAME_DICT_REVERSE`
VAE up_blocks index inversion	Re-inverted back to original indices
`model.diffusion_model.` prefix stripped	Prefix re-added

Test plan

Verify round-trip: convert original -> diffusers -> original and compare state dict keys
Test with CogVideoX-2B (30 layers, 30 heads)
Test with CogVideoX-5B (42 layers, 48 heads)
Test VAE conversion independently

🤖 Generated with Claude Code

Add a reverse conversion script that converts CogVideoX models from diffusers format back to the original CogVideo checkpoint format. This complements the existing convert_cogvideox_to_diffusers.py script. Closes huggingface#10076 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions bot added the size/L PR with diff > 200 LOC label Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CogVideoX diffusers-to-original format conversion script#13435

Add CogVideoX diffusers-to-original format conversion script#13435
Ricardo-M-L wants to merge 1 commit intohuggingface:mainfrom
Ricardo-M-L:feat/cogvideox-diffusers-to-original-conversion

Ricardo-M-L commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ricardo-M-L commented Apr 8, 2026

Summary

How to use

Implementation details

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant