feat: add runtime cache API for TensorRT-RTX#4180
Merged
lanluo-nvidia merged 3 commits intopytorch:mainfrom Apr 18, 2026
Merged
feat: add runtime cache API for TensorRT-RTX#4180lanluo-nvidia merged 3 commits intopytorch:mainfrom
lanluo-nvidia merged 3 commits intopytorch:mainfrom
Conversation
tp5uiuc
commented
Apr 10, 2026
tp5uiuc
commented
Apr 10, 2026
tp5uiuc
commented
Apr 10, 2026
tp5uiuc
commented
Apr 10, 2026
tp5uiuc
commented
Apr 10, 2026
tp5uiuc
commented
Apr 10, 2026
tp5uiuc
commented
Apr 10, 2026
tp5uiuc
commented
Apr 12, 2026
tp5uiuc
commented
Apr 12, 2026
7 tasks
tp5uiuc
commented
Apr 13, 2026
| dryrun: bool = _defaults.DRYRUN, | ||
| hardware_compatible: bool = _defaults.HARDWARE_COMPATIBLE, | ||
| timing_cache_path: str = _defaults.TIMING_CACHE_PATH, | ||
| runtime_cache_path: str = _defaults.RUNTIME_CACHE_PATH, |
Contributor
Author
There was a problem hiding this comment.
Runtime cache is a JIT-time API : it may not much make sense for cross_compile_for_windows and convert_exported_program_to_serialized_trt_engine. I have added it to the interface as a common API for entry point into torch-TRT, but I can add it to unsupported_settings
Collaborator
There was a problem hiding this comment.
agree, it doesn't make sense in JIT-time cache.
Let's add unsupported_settings for now, even in future, we want this feature we can add it back.
Contributor
Author
There was a problem hiding this comment.
Great, thanks for the feedback Lan 🙏
7 tasks
Add runtime cache support for TensorRT-RTX JIT compilation results, replacing the timing cache which is not used by RTX (no autotuning). Changes: - Skip timing cache creation/saving for TensorRT-RTX in _TRTInterpreter - Add RUNTIME_CACHE_PATH default and runtime_cache_path setting - Wire up IRuntimeCache in PythonTorchTensorRTModule (setup, load, save) - Persist runtime cache to disk with filelock for concurrent access safety - Thread runtime_cache_path through all compile functions - Add unit tests (12 tests) and E2E model tests (6 tests) - Update docstrings and RST documentation Fixes pytorch#3817 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Version provided by upstream torch; no pin needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
runtime_cache_path is a JIT-time API for TensorRT-RTX that only applies at inference time via PythonTorchTensorRTModule. Remove it from compilation_options in cross_compile_for_windows and convert_exported_program_to_serialized_trt_engine (with a warning), letting the dataclass default fill in harmlessly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3893fa4 to
3be6032
Compare
10 tasks
tp5uiuc
added a commit
to tp5uiuc/TensorRT
that referenced
this pull request
Apr 22, 2026
Implement TensorRT-RTX runtime cache persistence in the C++ runtime path (TorchTensorRTModule / TRTEngine). Mirrors the Python-runtime feature landed in pytorch#4180. What - apply_runtime_cache() (no-op stub from the prior commit) now creates an IRuntimeCache from the IRuntimeConfig, loads any existing cache file from the configured path, and attaches the cache to the config via IRuntimeConfig::setRuntimeCache (taken by const reference). - load_runtime_cache() reads the cache under an advisory shared lock (flock LOCK_SH) on POSIX. Concurrent readers coexist; transient failures downgrade to warnings so inference never blocks on cache IO. - save_runtime_cache() writes the serialized cache atomically via tmp-file + rename under an exclusive lock (flock LOCK_EX). The write path creates intermediate directories as needed. On Windows the save falls back to a best-effort write without advisory locking and emits a warning; LockFileEx support is a follow-up. - ~TRTEngine() now invokes save_runtime_cache() before tearing down the cache, config, and execution context so JIT compilation results survive process exits. Why - TensorRT-RTX JIT-compiles specialized kernels at inference time. The runtime cache lets those compilations persist across runs and across processes, which was measured at ~8x warm-vs-cold speedup in the Python-runtime implementation. - Without this commit, users relying on the C++ runtime (TorchScript deployments, use_python_runtime=False) would have no way to retain JIT work and would pay the cold-start cost on every process start. Tests - tests/py/dynamo/runtime/test_000_runtime_cache_cpp.py exercises the C++ runtime path (use_python_runtime=False) with cache save on destructor, directory creation, warm-cache roundtrip correctness via cosine-similarity, and ABI/index registration.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add runtime cache support for TensorRT-RTX JIT compilation results, replacing the timing cache which is not used by RTX (no autotuning).
TensorRT-RTX uses JIT compilation at inference time. The runtime cache (
IRuntimeCache) stores these compilation results so that kernels and execution graphs are not recompiled on subsequent runs. This is analogous to the timing cache but operates at inference time rather than build time.Fixes #3817
Changes
_create_timing_cache()and_save_timing_cache()whenENABLED_FEATURES.tensorrt_rtxis True (timing cache is a no-op in TRT-RTX)runtime_cache_pathsetting: NewRUNTIME_CACHE_PATHdefault andruntime_cache_pathfield inCompilationSettings, threaded through all compile functionsIRuntimeCacheinPythonTorchTensorRTModule: CreateRuntimeConfigwith runtime cache on engine setup, load from disk if available, save on module destructionfilelockfor concurrent access safety when multiple processes share the same cache fileType of change
Checklist: