Summary
TranslateGemma 4B (google/translategemma-4b-it) is Google's translation-specific model built on Gemma 3 4B architecture. It is a text-only model (no vision encoder needed for inference), but gemma.cpp currently only supports the VLM variant of Gemma 3 4B.
Problem
When running TranslateGemma 4B with gemma.cpp:
- SBS conversion:
convert_from_safetensors.py assumes PaliGemma VLM format — requires vision tower tensors that TranslateGemma doesn't have
- Loading:
ConfigGemma3_4B() returns VLM config with vit_config.image_size=896, causing Tensor enc_norm_bias is required but not found in file error
- No LM-only dispatch:
ConfigGemma3_4B_LM() exists in code but is never used as the primary config
What We Did (Workarounds)
We successfully ran TranslateGemma on gemma.cpp with these changes:
1. Convert script modifications
- Skip
vision_tower.* and multi_modal_projector.* tensors during loading
- Fix vocab_size (262144 instead of PaliGemma's 257152+64 trim)
- Add QK norm tensors (
query_norm, key_norm) to layer config as BF16
- Zero out
vit_config in SBS metadata before writing
2. C++ changes needed
- configs.cc: Dispatch
GEMMA3_4B to ConfigGemma3_4B_LM() when no VIT tensors present
- tensor_info.cc: Guard VIT tensor registration with
if (config.vit_config.image_size > 0)
- weights.h: Conditional VIT MatPtr initialization
- python/configs.cc: Add missing Gemma 3 model enums (
GEMMA3_1B, GEMMA3_4B, GEMMA3_12B, GEMMA3_27B)
3. Result
- TranslateGemma 4B runs successfully on CPU with SFP 8-bit format
- 4.3GB SBS file, translation works across 55+ languages
- All 34 layers + QK norms correctly loaded
Feature Request
- Auto-detect LM-only vs VLM — when VIT tensors are absent in SBS, use
ConfigGemma3_*_LM() instead of VLM config
- Update
convert_from_safetensors.py to support text-only Gemma 3 models (not just PaliGemma)
- Add Gemma 3 4B/12B/27B to Python enum in
python/configs.cc
Environment
- gemma.cpp: latest main (April 2026)
- CPU: AMD EPYC (AVX-512 VNNI)
- OS: Ubuntu 24.04
- Model: google/translategemma-4b-it
References
Summary
TranslateGemma 4B (
google/translategemma-4b-it) is Google's translation-specific model built on Gemma 3 4B architecture. It is a text-only model (no vision encoder needed for inference), but gemma.cpp currently only supports the VLM variant of Gemma 3 4B.Problem
When running TranslateGemma 4B with gemma.cpp:
convert_from_safetensors.pyassumes PaliGemma VLM format — requires vision tower tensors that TranslateGemma doesn't haveConfigGemma3_4B()returns VLM config withvit_config.image_size=896, causingTensor enc_norm_bias is required but not found in fileerrorConfigGemma3_4B_LM()exists in code but is never used as the primary configWhat We Did (Workarounds)
We successfully ran TranslateGemma on gemma.cpp with these changes:
1. Convert script modifications
vision_tower.*andmulti_modal_projector.*tensors during loadingquery_norm,key_norm) to layer config as BF16vit_configin SBS metadata before writing2. C++ changes needed
GEMMA3_4BtoConfigGemma3_4B_LM()when no VIT tensors presentif (config.vit_config.image_size > 0)GEMMA3_1B,GEMMA3_4B,GEMMA3_12B,GEMMA3_27B)3. Result
Feature Request
ConfigGemma3_*_LM()instead of VLM configconvert_from_safetensors.pyto support text-only Gemma 3 models (not just PaliGemma)python/configs.ccEnvironment
References