Skip to content

Add CogVideoX diffusers-to-original format conversion script#13435

Open
Ricardo-M-L wants to merge 1 commit intohuggingface:mainfrom
Ricardo-M-L:feat/cogvideox-diffusers-to-original-conversion
Open

Add CogVideoX diffusers-to-original format conversion script#13435
Ricardo-M-L wants to merge 1 commit intohuggingface:mainfrom
Ricardo-M-L:feat/cogvideox-diffusers-to-original-conversion

Conversation

@Ricardo-M-L
Copy link
Copy Markdown

Summary

  • Adds scripts/convert_cogvideox_to_original.py, a reverse conversion script that converts CogVideoX models (transformer and VAE) from diffusers format back to the original CogVideo checkpoint format
  • Reverses all weight name mappings, tensor concatenations (q/k/v merging), and adaln norm interleaving from the existing convert_cogvideox_to_diffusers.py
  • Supports both transformer and VAE components with fp16/bf16 precision options

Closes #10076

How to use

# Convert transformer
python scripts/convert_cogvideox_to_original.py \
    --diffusers_model_path THUDM/CogVideoX-2b \
    --output_path ./cogvideox_original/transformer.pt \
    --component transformer

# Convert VAE
python scripts/convert_cogvideox_to_original.py \
    --diffusers_model_path THUDM/CogVideoX-2b \
    --output_path ./cogvideox_original/vae.pt \
    --component vae

# With fp16 precision
python scripts/convert_cogvideox_to_original.py \
    --diffusers_model_path THUDM/CogVideoX-5b \
    --output_path ./cogvideox_original/transformer.pt \
    --component transformer --fp16

Implementation details

The script reverses every transformation in convert_cogvideox_to_diffusers.py:

Forward (to diffusers) Reverse (to original)
query_key_value chunked into to_q, to_k, to_v to_q/k/v concatenated back into query_key_value
adaln_layer.adaLN_modulations split into norm1.linear + norm2.linear norm1/norm2.linear interleaved back into adaLN_modulations
query/key_layernorm_list renamed to attn1.norm_q/k attn1.norm_q/k renamed back to query/key_layernorm_list
All simple key renames via TRANSFORMER_KEYS_RENAME_DICT Reversed via TRANSFORMER_KEYS_RENAME_DICT_REVERSE
VAE up_blocks index inversion Re-inverted back to original indices
model.diffusion_model. prefix stripped Prefix re-added

Test plan

  • Verify round-trip: convert original -> diffusers -> original and compare state dict keys
  • Test with CogVideoX-2B (30 layers, 30 heads)
  • Test with CogVideoX-5B (42 layers, 48 heads)
  • Test VAE conversion independently

🤖 Generated with Claude Code

Add a reverse conversion script that converts CogVideoX models from
diffusers format back to the original CogVideo checkpoint format.
This complements the existing convert_cogvideox_to_diffusers.py script.

Closes huggingface#10076

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the size/L PR with diff > 200 LOC label Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L PR with diff > 200 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Do we have any script covert from hf format to orginal format?

1 participant