feat: add runtime cache API for TensorRT-RTX by tp5uiuc · Pull Request #4180 · pytorch/TensorRT

tp5uiuc · 2026-04-10T20:18:20Z

Description

Add runtime cache support for TensorRT-RTX JIT compilation results, replacing the timing cache which is not used by RTX (no autotuning).

TensorRT-RTX uses JIT compilation at inference time. The runtime cache (IRuntimeCache) stores these compilation results so that kernels and execution graphs are not recompiled on subsequent runs. This is analogous to the timing cache but operates at inference time rather than build time.

Fixes #3817

Changes

Skip timing cache for RTX: Early return in _create_timing_cache() and _save_timing_cache() when ENABLED_FEATURES.tensorrt_rtx is True (timing cache is a no-op in TRT-RTX)
Add runtime_cache_path setting: New RUNTIME_CACHE_PATH default and runtime_cache_path field in CompilationSettings, threaded through all compile functions
Wire up IRuntimeCache in PythonTorchTensorRTModule: Create RuntimeConfig with runtime cache on engine setup, load from disk if available, save on module destruction
File locking: Uses filelock for concurrent access safety when multiple processes share the same cache file
Documentation: Updated docstrings, compilation settings RST, and engine cache tutorial with new "Runtime Cache (TensorRT-RTX)" section

Type of change

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

tp5uiuc · 2026-04-13T21:59:24Z

    dryrun: bool = _defaults.DRYRUN,
    hardware_compatible: bool = _defaults.HARDWARE_COMPATIBLE,
    timing_cache_path: str = _defaults.TIMING_CACHE_PATH,
+    runtime_cache_path: str = _defaults.RUNTIME_CACHE_PATH,


Runtime cache is a JIT-time API : it may not much make sense for cross_compile_for_windows and convert_exported_program_to_serialized_trt_engine. I have added it to the interface as a common API for entry point into torch-TRT, but I can add it to unsupported_settings

agree, it doesn't make sense in JIT-time cache.
Let's add unsupported_settings for now, even in future, we want this feature we can add it back.

Great, thanks for the feedback Lan 🙏

Added in 3893fa4 which emits a warning.

lanluo-nvidia

LGTM

Add runtime cache support for TensorRT-RTX JIT compilation results, replacing the timing cache which is not used by RTX (no autotuning). Changes: - Skip timing cache creation/saving for TensorRT-RTX in _TRTInterpreter - Add RUNTIME_CACHE_PATH default and runtime_cache_path setting - Wire up IRuntimeCache in PythonTorchTensorRTModule (setup, load, save) - Persist runtime cache to disk with filelock for concurrent access safety - Thread runtime_cache_path through all compile functions - Add unit tests (12 tests) and E2E model tests (6 tests) - Update docstrings and RST documentation Fixes pytorch#3817 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Version provided by upstream torch; no pin needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

runtime_cache_path is a JIT-time API for TensorRT-RTX that only applies at inference time via PythonTorchTensorRTModule. Remove it from compilation_options in cross_compile_for_windows and convert_exported_program_to_serialized_trt_engine (with a warning), letting the dataclass default fill in harmlessly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Implement TensorRT-RTX runtime cache persistence in the C++ runtime path (TorchTensorRTModule / TRTEngine). Mirrors the Python-runtime feature landed in pytorch#4180. What - apply_runtime_cache() (no-op stub from the prior commit) now creates an IRuntimeCache from the IRuntimeConfig, loads any existing cache file from the configured path, and attaches the cache to the config via IRuntimeConfig::setRuntimeCache (taken by const reference). - load_runtime_cache() reads the cache under an advisory shared lock (flock LOCK_SH) on POSIX. Concurrent readers coexist; transient failures downgrade to warnings so inference never blocks on cache IO. - save_runtime_cache() writes the serialized cache atomically via tmp-file + rename under an exclusive lock (flock LOCK_EX). The write path creates intermediate directories as needed. On Windows the save falls back to a best-effort write without advisory locking and emits a warning; LockFileEx support is a follow-up. - ~TRTEngine() now invokes save_runtime_cache() before tearing down the cache, config, and execution context so JIT compilation results survive process exits. Why - TensorRT-RTX JIT-compiles specialized kernels at inference time. The runtime cache lets those compilations persist across runs and across processes, which was measured at ~8x warm-vs-cold speedup in the Python-runtime implementation. - Without this commit, users relying on the C++ runtime (TorchScript deployments, use_python_runtime=False) would have no way to retain JIT work and would pay the cold-start cost on every process start. Tests - tests/py/dynamo/runtime/test_000_runtime_cache_cpp.py exercises the C++ runtime path (use_python_runtime=False) with cache save on destructor, directory creation, warm-cache roundtrip correctness via cosine-similarity, and ABI/index registration.

meta-cla Bot added the cla signed label Apr 10, 2026

tp5uiuc marked this pull request as draft April 10, 2026 20:18

github-actions Bot requested a review from cehongwang April 10, 2026 20:18