You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
12B QAT mmproj (mmproj-gemma-4-12b-it-qat-q4_0.gguf): clip.audio.projector_type=gemma4ua and has_audio_encoder=True are declared, but the file ships ZERO a.* audio tensors (only v.* vision + mm.*). So 12B audio is not possible with this QAT mmproj — it would need an audio-carrying 12B mmproj (verify one exists upstream).
Follow-up to #250 / PR #251 (image input shipped). Gemma 4 is natively multimodal incl. audio, which is not implemented.
Weight-availability findings (from dumped mmproj headers, 2026-06-15)
mmproj-gemma-4-12b-it-qat-q4_0.gguf):clip.audio.projector_type=gemma4uaandhas_audio_encoder=Trueare declared, but the file ships ZEROa.*audio tensors (onlyv.*vision +mm.*). So 12B audio is not possible with this QAT mmproj — it would need an audio-carrying 12B mmproj (verify one exists upstream).gemma-4-E4B-it-mmproj.gguf): ships a real audio encoder —clip.audio.projector_type=gemma4a, 751a.*tensors (~12-block conformer-style:num_mel_bins=128, embedding 1024, ffn 4096, 8 heads),mm.a.input_projection [1536→2560].Scope
gemma4apath.gemma4aaudio encoder forward pass (conformer blocks incl.norm_conv) — part of the E4B encoder-full effort.ForwardEmbeddingseam at<|audio|>placeholders (<|audio>/<audio|>wrap).--audio+ (later) server audio content blocks.Relationship
gemma4a) is part of the E4B encoder-full work — see Add Gemma 4 E4B multimodal (vision / image input) support #126 (its deferred V6). This issue can either subsume that or stay the cross-cutting audio tracker.gemma4ua, encoder-free like its vision) is blocked on weight availability (above).