feat: add ltx2.3 support by leejet · Pull Request #1463 · leejet/stable-diffusion.cpp

leejet · 2026-04-27T14:38:57Z

LTX-2.3 dev T2V

.\bin\Release\sd-cli.exe -M vid_gen --diffusion-model  ..\..\ComfyUI\models\diffusion_models\ltx-2.3-22b-dev-UD-Q4_K_M.gguf --vae ..\..\ComfyUI\models\vae\ltx-2.3-22b-dev_video_vae.safetensors --audio-vae ..\..\ComfyUI\models\vae\ltx-2.3-22b-dev_audio_vae.safetensors --llm ..\..\ComfyUI\models\text_encoders\gemma-3-12b-it-qat-UD-Q4_K_XL.gguf --embeddings-connectors ..\..\ComfyUI\models\text_encoders\ltx-2.3-22b-dev_embeddings_connectors.safetensors  -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v -n "worst quality, low quality, blurry, distorted, artifacts" -W 1280 -H 720 --diffusion-fa --offload-to-cpu --video-frames 33 --fps 24 -o t2v.webm

t2v.webm

LTX-2.3 dev I2V

.\bin\Release\sd-cli.exe -M vid_gen --diffusion-model  ..\..\ComfyUI\models\diffusion_models\ltx-2.3-22b-dev-UD-Q4_K_M.gguf --vae ..\..\ComfyUI\models\vae\ltx-2.3-22b-dev_video_vae.safetensors --audio-vae ..\..\ComfyUI\models\vae\ltx-2.3-22b-dev_audio_vae.safetensors --llm ..\..\ComfyUI\models\text_encoders\gemma-3-12b-it-qat-UD-Q4_K_XL.gguf --embeddings-connectors ..\..\ComfyUI\models\text_encoders\ltx-2.3-22b-dev_embeddings_connectors.safetensors  -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v  -W 1280 -H 720 --diffusion-fa --offload-to-cpu --video-frames 33 -i ..\assets\ernie_image\turbo_example.png -o i2v.webm

i2v.webm

Green-Sky · 2026-04-29T14:32:12Z

Finally, temporal tiling 🥳

Green-Sky · 2026-05-08T09:49:52Z

Looks like @stduhpf 's https://huggingface.co/stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small would be ideal here.

@leejet your example command contains both --diffusion-fa and --fa (:

juntaosun · 2026-05-11T05:18:50Z

Does it support LoRa loading?

stduhpf · 2026-05-11T23:46:30Z

Does it support LoRa loading?

Of course it does.

stduhpf · 2026-05-11T23:48:27Z

Audio decoding is messed up (seems to be sped up with high pitch and repeats twice). Other than that it works nicely so far. Great job as always @leejet !

edit: I believe the audio issue might just be because of the arrangement of stereo samples in the webm file (planar vs interleaved)

LostRuins · 2026-05-17T06:23:13Z

Very nice to see this is being worked on, LTX 2.3 is highly anticipated!

Tangentially, not sure if you've seen this (likely vibecoded) ltx.cpp implementation https://github.com/audiohacking/ltx.cpp which is also ggml based, probably not too helpful but might be a useful reference, seeing as they also use the same comfyui ggufs.

leejet · 2026-05-17T08:31:30Z

Very nice to see this is being worked on, LTX 2.3 is highly anticipated!

Tangentially, not sure if you've seen this (likely vibecoded) ltx.cpp implementation https://github.com/audiohacking/ltx.cpp which is also ggml based, probably not too helpful but might be a useful reference, seeing as they also use the same comfyui ggufs.

I hadn’t looked at this project before, but it seems that this is not a complete implementation. The documentation says that ltx2.3 uses T5-XXL, which I don’t quite understand.

13 new upstream commits since previous sync at 0b82969. The big one is leejet#1500 (module backend assignment): ~1.5k LOC churn that splits backend code into a new ggml_extend_backend.{h,cpp} pair and replaces every runner's (backend_t backend, bool offload_params_to_cpu) constructor arg with (backend_t runtime, backend_t params). New CLI flags --backend te=cpu,vae=cuda0,... and --params-backend te=cpu,vae=cpu,... Other notable upstream changes folded in: 3633072 module backend assignment (leejet#1500) 38b14ad --max-vram -1 auto-detect (leejet#1498) 67dda3f LTX 2.3 architecture (leejet#1463) 06accf2 LTXAV latent2rgb projection 9d68341 Euler/DDIM unification (leejet#1474) cde20d5 stereo handling in sd_audio d7ecbe1 T5 EOS dedup in Anima bd17f53 / 0c1ca17 / 839f6a9 / 3b4d26f ROCm/docs/CI db08b84 GCC 16 build fix 686856e fake-VAE log demotion 0b82969 / 381e0df PR template + CONTRIBUTING.md Conflicts: - examples/common/common.cpp, include/stable-diffusion.h: kept our offload_config alongside upstream's new backend/params_backend strings. sd_ctx_params_t now carries both axes. - src/lora.hpp: dropped our enable_offload bool. The new params_backend argument expresses the same intent (CPU = offload). - src/hidream_o1.hpp: kept params_prefix member, switched constructor to upstream's (backend, params_backend) signature. - src/stable-diffusion.cpp: every runner-construction site took upstream's backend_for(MODULE) / params_backend_for(MODULE) lookups. Removed the dead cond_stage/diffusion/vae_offload_to_cpu local-bool derivation; replaced with calls to a new SDBackendManager::force_module_params_backend(MODULE, "cpu") helper that mutates params_assignment_ after init_backend() runs. The offload_config-driven escalations now land in the same data structure upstream's --params-backend writes to. Post-merge fixups surfaced by retesting HiDream O1 streaming: - src/llm.hpp: TextModel.forward_final_norm now casts to LLMRMSNorm, not RMSNorm. Upstream changed the "norm" block's concrete type; our pre-merge cast returned nullptr and crashed on first forward(). - src/hidream_o1.hpp: Stage 1 of compute_streaming_true scales inputs_embeds by sqrt(hidden_size) when params.llm.normalize_input, matching what forward_embeds does. No-op for HiDream O1 today but keeps the streaming path drift-free if a future arch flips it. Smoke-tested on 12 GB GPU: Z-Image-Turbo Q8 layer_streaming -> 4.32 s HiDream O1 bf16 dev layer_streaming -> 17.44 s (4 steps, 1024x1024)

Brings in the upstream files that LTX-2.3 (leejet/stable-diffusion.cpp leejet#1463) depends on, without merging unrelated commits from upstream master: - src/tokenizers/ directory restructure (replaces src/tokenize_util.* and src/vocab/), adds Gemma 3 tokenizer (gemma_tokenizer.*, gemma_merges.hpp, gemma_vocab.hpp) - src/llm.hpp updated to include the GEMMA3_12B architecture path used by LTX-2.3's text encoder - src/conditioner.hpp adds LTXAVEmbedder and LTXAVTextProjectionRunner - src/common_dit.hpp adds patchify3d / unpatchify3d - src/denoiser.hpp adds LTX2Scheduler and the LTX2_SCHEDULER enum value - src/diffusion_model.hpp adds LTXAVDiffusionExtra + video_positions / frame_rate fields to DiffusionExtraParams - src/ggml_extend.hpp adds force_prec_f32 to Conv3d for the VAE encoder block that overflows in F16 - src/ltx_vae.hpp (NEW) spatiotemporal Video-VAE encoder/decoder - src/ltxv.hpp replaces the 72-line LTX-Video v1 stub with full LTX-2.3 DiT - src/ltx_audio_vae.h (present for completeness; audio remains unwired and out of scope for this fork's video-only target) - src/vae.hpp adds VERSION_LTXAV scale factor branch - src/wan.hpp patchify/unpatchify helpers exposed as static for reuse - src/model.* adds LTX-2 architecture detection This is the staging layer. The next commit cherry-picks the LTX-2.3 wiring (stable-diffusion.cpp, CLI, headers) on top.

…t#1463) Cherry-picks upstream leejet/stable-diffusion.cpp commit 67dda3f ("feat: add ltx2.3 support") on top of the staged infrastructure. Conflicts resolved: * .gitmodules: kept only the ggml submodule; the LTX-2.3 video build does not need the upstream sdcpp-webui frontend, libwebp or libwebm submodules. * ggml: bumped to leejet/ggml 7f4ab364 (re-init, depth-1 clone) which carries the IM2COL_3D / PAD ops the Video-VAE requires. * stable-diffusion.cpp: taken from upstream wholesale. The fork had no fork-specific changes here; all 27 hunks were upstream's own refactor between the fork's master and the LTX-2.3 commit. The file now contains the LTXAVModel and LTXVideoVAE construction paths, the LTXAVEmbedder conditioner branch, process_ltxav_video_timesteps(), the LTX I2V latent preparation, and the new bool generate_video() signature with sd_image_t** and sd_audio_t** out-parameters. * include/stable-diffusion.h: taken wholesale (new generate_video signature, embeddings_connectors_path, audio_vae_path, LTX2_SCHEDULER, sd_audio_t). * examples/cli/main.cpp, CMakeLists.txt, examples/cli/CMakeLists.txt, examples/cli/README.md, examples/server/README.md: upstream wholesale (LTX-2 CLI flags, WebP/WebM detection, tokenizers CMake). * examples/common/: brought in the upstream split (common.cpp, common.h, media_io.cpp, media_io.h, log.cpp, log.h, resource_owners.hpp) plus examples/server/async_jobs.cpp and src/tensor_ggml.hpp. * docs/ltx2.md: merged my long-form guide with the upstream real CLI examples and weight URLs so a reviewer can build-to-first-video from the README. * src/ltx2_api.cpp: updated to the new generate_video() signature, dropped the HAVE_LTX2_SYNC guards now that LTX2_SCHEDULER is in scheduler_t. Calls the real LTX2_SCHEDULER + Euler sampler + flow_shift 2.37. Audio output is freed and discarded (video-only build). clang -std=c++17 -fsyntax-only on src/ltx2_api.cpp passes.

The cherry-pick of leejet#1463 missed three files in examples/common/ (log.cpp, log.h, resource_owners.hpp) and two in examples/cli/ (image_metadata.cpp, image_metadata.h). The new examples/cli/CMakeLists.txt explicitly references these sources, so cmake configuration failed without them. Also removes examples/common/common.hpp which was replaced upstream by the common.cpp + common.h split. CMake now configures cleanly with the Metal backend on Apple Silicon (ggml 7f4ab36, BLAS via Accelerate). Build verified with: cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release \ -DSD_METAL=ON -DSD_WEBP=OFF -DSD_WEBM=OFF \ -DBUILD_SHARED_LIBS=OFF

…on M3 Metal The cherry-pick of leejet#1463 left the fork in a mixed-version src/ state: some files at 67dda3f, others at the older fork master (c8fb3d2). That broke the include graph (clip.hpp included src/tokenize_util.h, ggml_extend.hpp included ggml_extend_backend.h, model.cpp expected model_io/, denoiser.hpp referenced ER_SDE / EULER_CFG_PP sample-method enums missing from the public header, etc.). This commit fully aligns src/, examples/, and include/ to 67dda3f: * src/: bulk-checkout every src/*.cpp/hpp/h from 67dda3f, then keep src/ltx2_api.cpp (fork-specific). Adds model_io/, ggml_graph_cut.{cpp,h}, ggml_extend_backend.{cpp,h}, sample-cache.{cpp,h}, condition_cache_utils.hpp, auto_encoder_kl.hpp, convert.cpp, ernie_image.hpp, hidream_o1.hpp, rng.hpp (relocated), spectrum.hpp, tensor.hpp, upscaler.h. Removes stale src/gguf_reader.hpp (replaced by model_io/gguf_io.*). * include/stable-diffusion.h: bumped to 67dda3f, adds the new sample methods (ER_SDE_SAMPLE_METHOD, EULER_CFG_PP_SAMPLE_METHOD, EULER_A_CFG_PP_SAMPLE_METHOD) the LTXAV denoiser depends on. * src/ltx2_api.cpp: untouched. The post-sync edits committed earlier already match the new generate_video() signature. Verified locally on Apple M3 (Metal 4 + CPU NEON + BLAS Accelerate): cmake -G Ninja -DSD_METAL=ON -DSD_WEBP=OFF -DSD_WEBM=OFF \ -DBUILD_SHARED_LIBS=OFF .. ninja sd-cli Produces a 35 MB sd-cli binary that recognises every LTX-2 flag: --mode vid_gen, --diffusion-model, --vae, --llm, --embeddings-connectors, --audio-vae, --init-img, --end-img, --video-frames, --fps, --temporal-tiling, --sampling-method euler --scheduler ltx2 Net diff vs fork master: 95 files, well under the 174 files of 64johnlee's unmergeable mega-merge. "Minor modifications before merge" acceptance criterion stays intact.

leejet added 2 commits April 12, 2026 23:44

add GemmaTokenizer

51d681e

Merge branch 'master' into ltx2.3

274ecd5

leejet mentioned this pull request Apr 27, 2026

feat: LTX-2 support #1458

Closed

leejet force-pushed the ltx2.3 branch from ca7e008 to 99d78d0 Compare April 28, 2026 17:09

add basic ltx2.3 support

831b321

leejet force-pushed the ltx2.3 branch from 99d78d0 to 831b321 Compare April 28, 2026 17:12

leejet added 4 commits April 29, 2026 01:17

change vocab file encoding

0b65927

fix ci

d51f35b

fix ubuntu build

2ca782a

add temporal tiling support

e744e1e

pwilkin added a commit to pwilkin/stable-diffusion.cpp that referenced this pull request May 1, 2026

merge: ltx2.3 (PR leejet#1463) on top of backend-fit

cab0d82

mudler mentioned this pull request May 1, 2026

feat: add LTX-2 video generation support #1459

Closed

Merge branch 'master' into ltx2.3

bb63d5c

This was referenced May 6, 2026

[Feature] LTX-2 model support #1189

Open

LTX video support #480

Closed

GreenShadows mentioned this pull request May 7, 2026

[Feature] ltx2.3 support? #1479

Closed

leejet added 3 commits May 10, 2026 15:00

add ltx audio support

8b03d9b

update ggml submodule url

4fdf43a

fix generate_video

7738073

This was referenced May 12, 2026

ltx2.3: fix sd_audio stereo format #1489

Merged

Temporal tile size + overlap #1490

Closed

merge branch 'master' into ltx2.3

22d9a83

stduhpf mentioned this pull request May 16, 2026

Add ltxav latent2rgb projection matrix #1502

Merged

1 task

leejet added 2 commits May 16, 2026 23:15

Merge branch 'master' into ltx2.3

f8a0330

add i2v support

18fbb4c

This comment was marked as resolved.

Sign in to view

minify bundled Gemma tokenizer vocab sources

6e71338

LostRuins mentioned this pull request May 17, 2026

//LTX 2.3 support #1342

Closed

leejet added 5 commits May 17, 2026 14:25

pass video fps into temporal rope embeddings

836d152

fix av_ca_timestep_scale_multiplier

cf0b8e0

add LTX2Scheduler support

b56f5ac

update docs

78a6afa

fix ci

1122df4

leejet changed the title ~~wip: add ltx2.3 support~~ feat: add ltx2.3 support May 17, 2026

leejet merged commit 67dda3f into master May 17, 2026
14 checks passed

stduhpf mentioned this pull request May 20, 2026

Fix: always load runtime lora params on runtime backend #1532

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add ltx2.3 support#1463

feat: add ltx2.3 support#1463
leejet merged 20 commits into
masterfrom
ltx2.3

leejet commented Apr 27, 2026 •

edited

Loading

Uh oh!

Green-Sky commented Apr 29, 2026

Uh oh!

Green-Sky commented May 8, 2026 •

edited

Loading

Uh oh!

juntaosun commented May 11, 2026

Uh oh!

stduhpf commented May 11, 2026

Uh oh!

stduhpf commented May 11, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

LostRuins commented May 17, 2026 •

edited

Loading

Uh oh!

leejet commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

leejet commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

LTX-2.3 dev T2V

LTX-2.3 dev I2V

Uh oh!

Green-Sky commented Apr 29, 2026

Uh oh!

Green-Sky commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juntaosun commented May 11, 2026

Uh oh!

stduhpf commented May 11, 2026

Uh oh!

stduhpf commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

LostRuins commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leejet commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

leejet commented Apr 27, 2026 •

edited

Loading

Green-Sky commented May 8, 2026 •

edited

Loading

stduhpf commented May 11, 2026 •

edited

Loading

LostRuins commented May 17, 2026 •

edited

Loading