Skip to content

feat: add ltx2.3 support#1463

Merged
leejet merged 20 commits into
masterfrom
ltx2.3
May 17, 2026
Merged

feat: add ltx2.3 support#1463
leejet merged 20 commits into
masterfrom
ltx2.3

Conversation

@leejet

@leejet leejet commented Apr 27, 2026

Copy link
Copy Markdown
Owner

LTX-2.3 dev T2V

.\bin\Release\sd-cli.exe -M vid_gen --diffusion-model  ..\..\ComfyUI\models\diffusion_models\ltx-2.3-22b-dev-UD-Q4_K_M.gguf --vae ..\..\ComfyUI\models\vae\ltx-2.3-22b-dev_video_vae.safetensors --audio-vae ..\..\ComfyUI\models\vae\ltx-2.3-22b-dev_audio_vae.safetensors --llm ..\..\ComfyUI\models\text_encoders\gemma-3-12b-it-qat-UD-Q4_K_XL.gguf --embeddings-connectors ..\..\ComfyUI\models\text_encoders\ltx-2.3-22b-dev_embeddings_connectors.safetensors  -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v -n "worst quality, low quality, blurry, distorted, artifacts" -W 1280 -H 720 --diffusion-fa --offload-to-cpu --video-frames 33 --fps 24 -o t2v.webm
t2v.webm

LTX-2.3 dev I2V

.\bin\Release\sd-cli.exe -M vid_gen --diffusion-model  ..\..\ComfyUI\models\diffusion_models\ltx-2.3-22b-dev-UD-Q4_K_M.gguf --vae ..\..\ComfyUI\models\vae\ltx-2.3-22b-dev_video_vae.safetensors --audio-vae ..\..\ComfyUI\models\vae\ltx-2.3-22b-dev_audio_vae.safetensors --llm ..\..\ComfyUI\models\text_encoders\gemma-3-12b-it-qat-UD-Q4_K_XL.gguf --embeddings-connectors ..\..\ComfyUI\models\text_encoders\ltx-2.3-22b-dev_embeddings_connectors.safetensors  -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v  -W 1280 -H 720 --diffusion-fa --offload-to-cpu --video-frames 33 -i ..\assets\ernie_image\turbo_example.png -o i2v.webm
i2v.webm

@Green-Sky

Copy link
Copy Markdown
Contributor

Finally, temporal tiling 🥳

pwilkin added a commit to pwilkin/stable-diffusion.cpp that referenced this pull request May 1, 2026
@Green-Sky

Green-Sky commented May 8, 2026

Copy link
Copy Markdown
Contributor

Looks like @stduhpf 's https://huggingface.co/stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small would be ideal here.


@leejet your example command contains both --diffusion-fa and --fa (:

@juntaosun

Copy link
Copy Markdown

Does it support LoRa loading?

@stduhpf

stduhpf commented May 11, 2026

Copy link
Copy Markdown
Contributor

Does it support LoRa loading?

Of course it does.

@stduhpf

stduhpf commented May 11, 2026

Copy link
Copy Markdown
Contributor

Audio decoding is messed up (seems to be sped up with high pitch and repeats twice). Other than that it works nicely so far. Great job as always @leejet !

edit: I believe the audio issue might just be because of the arrangement of stereo samples in the webm file (planar vs interleaved)

@stduhpf

This comment was marked as resolved.

@LostRuins LostRuins mentioned this pull request May 17, 2026
@LostRuins

LostRuins commented May 17, 2026

Copy link
Copy Markdown
Contributor

Very nice to see this is being worked on, LTX 2.3 is highly anticipated!

Tangentially, not sure if you've seen this (likely vibecoded) ltx.cpp implementation https://github.com/audiohacking/ltx.cpp which is also ggml based, probably not too helpful but might be a useful reference, seeing as they also use the same comfyui ggufs.

@leejet leejet changed the title wip: add ltx2.3 support feat: add ltx2.3 support May 17, 2026
@leejet

leejet commented May 17, 2026

Copy link
Copy Markdown
Owner Author

Very nice to see this is being worked on, LTX 2.3 is highly anticipated!

Tangentially, not sure if you've seen this (likely vibecoded) ltx.cpp implementation https://github.com/audiohacking/ltx.cpp which is also ggml based, probably not too helpful but might be a useful reference, seeing as they also use the same comfyui ggufs.

I hadn’t looked at this project before, but it seems that this is not a complete implementation. The documentation says that ltx2.3 uses T5-XXL, which I don’t quite understand.

@leejet leejet merged commit 67dda3f into master May 17, 2026
14 checks passed
fszontagh added a commit to fszontagh/stable-diffusion.cpp that referenced this pull request May 22, 2026
13 new upstream commits since previous sync at 0b82969. The big one is
leejet#1500 (module backend assignment): ~1.5k LOC churn that splits backend
code into a new ggml_extend_backend.{h,cpp} pair and replaces every
runner's (backend_t backend, bool offload_params_to_cpu) constructor
arg with (backend_t runtime, backend_t params). New CLI flags
--backend te=cpu,vae=cuda0,... and --params-backend te=cpu,vae=cpu,...

Other notable upstream changes folded in:
  3633072 module backend assignment (leejet#1500)
  38b14ad --max-vram -1 auto-detect (leejet#1498)
  67dda3f LTX 2.3 architecture (leejet#1463)
  06accf2 LTXAV latent2rgb projection
  9d68341 Euler/DDIM unification (leejet#1474)
  cde20d5 stereo handling in sd_audio
  d7ecbe1 T5 EOS dedup in Anima
  bd17f53 / 0c1ca17 / 839f6a9 / 3b4d26f ROCm/docs/CI
  db08b84 GCC 16 build fix
  686856e fake-VAE log demotion
  0b82969 / 381e0df PR template + CONTRIBUTING.md

Conflicts:

- examples/common/common.cpp, include/stable-diffusion.h: kept our
  offload_config alongside upstream's new backend/params_backend
  strings. sd_ctx_params_t now carries both axes.

- src/lora.hpp: dropped our enable_offload bool. The new params_backend
  argument expresses the same intent (CPU = offload).

- src/hidream_o1.hpp: kept params_prefix member, switched constructor
  to upstream's (backend, params_backend) signature.

- src/stable-diffusion.cpp: every runner-construction site took
  upstream's backend_for(MODULE) / params_backend_for(MODULE) lookups.
  Removed the dead cond_stage/diffusion/vae_offload_to_cpu local-bool
  derivation; replaced with calls to a new
  SDBackendManager::force_module_params_backend(MODULE, "cpu") helper
  that mutates params_assignment_ after init_backend() runs. The
  offload_config-driven escalations now land in the same data
  structure upstream's --params-backend writes to.

Post-merge fixups surfaced by retesting HiDream O1 streaming:

- src/llm.hpp: TextModel.forward_final_norm now casts to LLMRMSNorm,
  not RMSNorm. Upstream changed the "norm" block's concrete type;
  our pre-merge cast returned nullptr and crashed on first forward().

- src/hidream_o1.hpp: Stage 1 of compute_streaming_true scales
  inputs_embeds by sqrt(hidden_size) when params.llm.normalize_input,
  matching what forward_embeds does. No-op for HiDream O1 today but
  keeps the streaming path drift-free if a future arch flips it.

Smoke-tested on 12 GB GPU:
  Z-Image-Turbo Q8 layer_streaming     -> 4.32 s
  HiDream O1 bf16 dev layer_streaming  -> 17.44 s (4 steps, 1024x1024)
kamalbuilds added a commit to kamalbuilds/qvac-ext-stable-diffusion.cpp that referenced this pull request Jun 2, 2026
Brings in the upstream files that LTX-2.3 (leejet/stable-diffusion.cpp leejet#1463)
depends on, without merging unrelated commits from upstream master:

- src/tokenizers/ directory restructure (replaces src/tokenize_util.* and
  src/vocab/), adds Gemma 3 tokenizer (gemma_tokenizer.*, gemma_merges.hpp,
  gemma_vocab.hpp)
- src/llm.hpp updated to include the GEMMA3_12B architecture path used by
  LTX-2.3's text encoder
- src/conditioner.hpp adds LTXAVEmbedder and LTXAVTextProjectionRunner
- src/common_dit.hpp adds patchify3d / unpatchify3d
- src/denoiser.hpp adds LTX2Scheduler and the LTX2_SCHEDULER enum value
- src/diffusion_model.hpp adds LTXAVDiffusionExtra + video_positions /
  frame_rate fields to DiffusionExtraParams
- src/ggml_extend.hpp adds force_prec_f32 to Conv3d for the VAE encoder
  block that overflows in F16
- src/ltx_vae.hpp (NEW) spatiotemporal Video-VAE encoder/decoder
- src/ltxv.hpp replaces the 72-line LTX-Video v1 stub with full LTX-2.3 DiT
- src/ltx_audio_vae.h (present for completeness; audio remains unwired and
  out of scope for this fork's video-only target)
- src/vae.hpp adds VERSION_LTXAV scale factor branch
- src/wan.hpp patchify/unpatchify helpers exposed as static for reuse
- src/model.* adds LTX-2 architecture detection

This is the staging layer. The next commit cherry-picks the LTX-2.3 wiring
(stable-diffusion.cpp, CLI, headers) on top.
kamalbuilds added a commit to kamalbuilds/qvac-ext-stable-diffusion.cpp that referenced this pull request Jun 2, 2026
…t#1463)

Cherry-picks upstream leejet/stable-diffusion.cpp commit 67dda3f ("feat: add
ltx2.3 support") on top of the staged infrastructure. Conflicts resolved:

  * .gitmodules: kept only the ggml submodule; the LTX-2.3 video build does not
    need the upstream sdcpp-webui frontend, libwebp or libwebm submodules.
  * ggml: bumped to leejet/ggml 7f4ab364 (re-init, depth-1 clone) which carries
    the IM2COL_3D / PAD ops the Video-VAE requires.
  * stable-diffusion.cpp: taken from upstream wholesale. The fork had no
    fork-specific changes here; all 27 hunks were upstream's own refactor
    between the fork's master and the LTX-2.3 commit. The file now contains
    the LTXAVModel and LTXVideoVAE construction paths, the LTXAVEmbedder
    conditioner branch, process_ltxav_video_timesteps(), the LTX I2V latent
    preparation, and the new bool generate_video() signature with sd_image_t**
    and sd_audio_t** out-parameters.
  * include/stable-diffusion.h: taken wholesale (new generate_video signature,
    embeddings_connectors_path, audio_vae_path, LTX2_SCHEDULER, sd_audio_t).
  * examples/cli/main.cpp, CMakeLists.txt, examples/cli/CMakeLists.txt,
    examples/cli/README.md, examples/server/README.md: upstream wholesale
    (LTX-2 CLI flags, WebP/WebM detection, tokenizers CMake).
  * examples/common/: brought in the upstream split (common.cpp, common.h,
    media_io.cpp, media_io.h, log.cpp, log.h, resource_owners.hpp) plus
    examples/server/async_jobs.cpp and src/tensor_ggml.hpp.
  * docs/ltx2.md: merged my long-form guide with the upstream real CLI
    examples and weight URLs so a reviewer can build-to-first-video from
    the README.
  * src/ltx2_api.cpp: updated to the new generate_video() signature, dropped
    the HAVE_LTX2_SYNC guards now that LTX2_SCHEDULER is in scheduler_t.
    Calls the real LTX2_SCHEDULER + Euler sampler + flow_shift 2.37. Audio
    output is freed and discarded (video-only build).

clang -std=c++17 -fsyntax-only on src/ltx2_api.cpp passes.
kamalbuilds added a commit to kamalbuilds/qvac-ext-stable-diffusion.cpp that referenced this pull request Jun 2, 2026
The cherry-pick of leejet#1463 missed three files in examples/common/ (log.cpp,
log.h, resource_owners.hpp) and two in examples/cli/ (image_metadata.cpp,
image_metadata.h). The new examples/cli/CMakeLists.txt explicitly references
these sources, so cmake configuration failed without them. Also removes
examples/common/common.hpp which was replaced upstream by the
common.cpp + common.h split.

CMake now configures cleanly with the Metal backend on Apple Silicon
(ggml 7f4ab36, BLAS via Accelerate). Build verified with:

    cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release \
        -DSD_METAL=ON -DSD_WEBP=OFF -DSD_WEBM=OFF \
        -DBUILD_SHARED_LIBS=OFF
kamalbuilds added a commit to kamalbuilds/qvac-ext-stable-diffusion.cpp that referenced this pull request Jun 2, 2026
…on M3 Metal

The cherry-pick of leejet#1463 left the fork in a mixed-version src/ state: some
files at 67dda3f, others at the older fork master (c8fb3d2). That broke
the include graph (clip.hpp included src/tokenize_util.h, ggml_extend.hpp
included ggml_extend_backend.h, model.cpp expected model_io/, denoiser.hpp
referenced ER_SDE / EULER_CFG_PP sample-method enums missing from the
public header, etc.).

This commit fully aligns src/, examples/, and include/ to 67dda3f:

  * src/: bulk-checkout every src/*.cpp/hpp/h from 67dda3f, then keep
    src/ltx2_api.cpp (fork-specific). Adds model_io/, ggml_graph_cut.{cpp,h},
    ggml_extend_backend.{cpp,h}, sample-cache.{cpp,h}, condition_cache_utils.hpp,
    auto_encoder_kl.hpp, convert.cpp, ernie_image.hpp, hidream_o1.hpp,
    rng.hpp (relocated), spectrum.hpp, tensor.hpp, upscaler.h.
    Removes stale src/gguf_reader.hpp (replaced by model_io/gguf_io.*).

  * include/stable-diffusion.h: bumped to 67dda3f, adds the new sample
    methods (ER_SDE_SAMPLE_METHOD, EULER_CFG_PP_SAMPLE_METHOD,
    EULER_A_CFG_PP_SAMPLE_METHOD) the LTXAV denoiser depends on.

  * src/ltx2_api.cpp: untouched. The post-sync edits committed earlier
    already match the new generate_video() signature.

Verified locally on Apple M3 (Metal 4 + CPU NEON + BLAS Accelerate):
  cmake -G Ninja -DSD_METAL=ON -DSD_WEBP=OFF -DSD_WEBM=OFF \
        -DBUILD_SHARED_LIBS=OFF ..
  ninja sd-cli

Produces a 35 MB sd-cli binary that recognises every LTX-2 flag:
  --mode vid_gen, --diffusion-model, --vae, --llm,
  --embeddings-connectors, --audio-vae, --init-img, --end-img,
  --video-frames, --fps, --temporal-tiling,
  --sampling-method euler --scheduler ltx2

Net diff vs fork master: 95 files, well under the 174 files of 64johnlee's
unmergeable mega-merge. "Minor modifications before merge" acceptance
criterion stays intact.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants