Skip to content

Conversation

@tongyuantongyu
Copy link
Member

@tongyuantongyu tongyuantongyu commented Nov 26, 2025

Description

This PR fixed a bug that virtual address reservation / virtual memory allocation size may not satisfy the GPU alignment requirement.

Test Coverage

VirtualMemoryManagerTest.TestCudaVirtualMemoryAllocatorUnalignedSize

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 26, 2025

📝 Walkthrough

Walkthrough

The changes enhance CUDA memory allocation error reporting and refactor alignment computation in virtual memory management. A new optional info parameter extends error checking utilities with contextual information, while alignment calculation shifts from construction-time initialization to lazy, device-aware computation using GPU-specific granularity discovery.

Changes

Cohort / File(s) Summary
Error Reporting Enhancement
cpp/tensorrt_llm/common/cudaDriverWrapper.h
Added optional info parameter to checkDriver() function template; introduced TLLM_CU_CHECK_WITH_INFO(stat, info, ...) macro for formatted error context.
Virtual Memory Header Refactoring
cpp/include/tensorrt_llm/runtime/virtualMemory.h
Replaced mPageSize member with atomic mAlignment in Configuration; updated pageAligned() to accept device parameter and implement lazy alignment discovery via cuMemGetAllocationGranularity; added <numeric> include; enhanced error logging via TLLM_CU_CHECK_WITH_INFO.
Virtual Memory Implementation Updates
cpp/tensorrt_llm/runtime/virtualMemory.cpp
Updated allocation path to use device-aware pageAligned(n, device) computation; wrapped cuMemAddressReserve with TLLM_CU_CHECK_WITH_INFO for improved diagnostics; ensured UnicastConfigurator and LocalCreator use page-aligned size instead of raw size.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant VirtualMemory as VirtualMemory:<br/>allocate()
    participant Config as Configuration:<br/>pageAligned()
    participant Atomic as mAlignment<br/>(atomic)
    participant CUDA as CUDA Runtime

    Caller->>VirtualMemory: allocate(n, device)
    VirtualMemory->>Config: pageAligned(n, device)
    Config->>Atomic: load mAlignment
    alt mAlignment == 0 (uninitialized)
        Config->>CUDA: cuMemGetAllocationGranularity()
        CUDA-->>Config: gpu_alignment
        Config->>Config: lcm(system_page_size, gpu_alignment)
        Config->>Atomic: store alignment (spin-wait)
    end
    Config->>Config: round n to alignment
    Config-->>VirtualMemory: page_aligned_size
    VirtualMemory->>CUDA: cuMemAddressReserve(page_aligned_size)
    alt Success
        CUDA-->>VirtualMemory: reserved address
    else Error
        VirtualMemory->>VirtualMemory: checkDriver with info log
    end
    VirtualMemory-->>Caller: address
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Atomic lazy-loading mechanism: The spin-wait loop with memory barriers in pageAligned() requires careful review to ensure thread-safety and lack of deadlocks during concurrent alignment initialization.
  • Device-aware granularity computation: Verify that cuMemGetAllocationGranularity() is called with correct device context and that the LCM calculation correctly combines system and GPU alignment requirements.
  • Error propagation changes: Confirm that TLLM_CU_CHECK_WITH_INFO macro correctly formats diagnostic messages and that all allocation paths pass appropriate context info.
  • Page-aligned size propagation: Ensure UnicastConfigurator and LocalCreator receive consistent page-aligned sizes throughout the allocation flow.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ⚠️ Warning PR description lacks a proper title format with ticket/issue reference and type designation. Core bug fix is explained, but title template is not followed. Add a proper PR title following the template: [ticket/issue/None][type] Summary (e.g., [TRTLLM-XXXX][fix] Correct virtual memory allocation alignment)
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: correcting virtual memory allocation alignment, which aligns with the primary purpose of the PR.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
cpp/include/tensorrt_llm/runtime/virtualMemory.h (2)

465-495: Atomic alignment field is a good fit; ensure required headers are included

Using std::atomic<std::size_t> mAlignment lazily initialized to 0 matches the new pageAligned design and avoids doing CUDA driver queries in the constructor. However, this header now directly depends on std::atomic and std::numeric_limits (used below), but only <numeric> is explicitly included.

To keep the header self-contained and robust against transitive include changes, consider explicitly adding:

 #include <map>
 #include <mutex>
 #include <numeric>
+#include <atomic>
+#include <limits>
 #include <unistd.h>

496-529: Let me analyze the provided code snippet and the original review to determine the validity of the suggestions.

Analysis of the original review concerns:

  1. Heterogeneous multi-GPU alignment caching: The code uses a single cached mAlignment that is computed on the first device that calls pageAligned(). For heterogeneous systems with different GPU types, this could theoretically be an issue, but:

    • The alignment is computed using cuMemGetAllocationGranularity with a specific device parameter
    • However, the cached value doesn't account for the device parameter on subsequent calls
    • This is a valid concern, though it applies only if multiple different devices are used
  2. Memory ordering semantics: The use of memory_order_relaxed for synchronization is adequate for the current scalar use case, but the reviewer suggests future-proofing with acquire/release if the function grows more complex. This is a reasonable optional suggestion.

Both suggestions in the original review are optional and forward-looking, not addressing critical bugs. The code as written is correct for single-device or homogeneous multi-GPU scenarios.


Lazy, device-aware pageAligned implementation is sound; minor edge considerations

The lazy initialization using a sentinel (loading = max_size_t), CAS, and a spin-wait with pause/yield is correct for typical usage:

  • Alignment is computed once via cuMemGetAllocationGranularity for the first device that calls the function, then cached.
  • std::lcm(getpagesize(), gpuAlignment) ensures the size is valid for both OS page size and GPU allocation granularity.
  • Concurrent callers either compute once or spin briefly until the value is stored; no data race on mAlignment.

Two observations:

  1. Device-specific alignment: If heterogeneous multi-GPU scenarios are used (different GPU types with different granularities), this single cached alignment ignores the device parameter after first initialization. This is acceptable for homogeneous setups but worth documenting.

  2. Memory ordering: Current memory_order_relaxed semantics are correct for the scalar alignment value. Consider acquire/release only if this function evolves to coordinate additional state beyond the alignment value itself.

The implementation is correct as-is for its current usage.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b10137f and f51758c.

📒 Files selected for processing (3)
  • cpp/include/tensorrt_llm/runtime/virtualMemory.h (4 hunks)
  • cpp/tensorrt_llm/common/cudaDriverWrapper.h (2 hunks)
  • cpp/tensorrt_llm/runtime/virtualMemory.cpp (2 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{cpp,h,cu}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.{cpp,h,cu}: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #define whenever possible, as the latter are not visible to the compiler
A variable that is not modified after its initialization should be declared as const
Except 0 (only used in comparison for checking signness/existence/emptiness) and nullptr, true, false, all other literals should only be used for variable initialization and should be replaced with named constants
Use Allman indentation style for braces in C++
Put the semicolon for an empty for or while loop in a new line
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements)
If and else should always be followed by brace-delimited statements, even if empty or a single statement
C++ filenames should use camel case with first letter lowercase (e.g., thisIsASubDir and thisIsAFilename.cpp)
All filenames involved in compilation of a compilation target must have case-insensitive unique filenames
All types (including class names) should use camel case with uppercase first letter (e.g., FooBarClass)
Local variables, methods and namespaces should use camel case with first letter lowercase (e.g., localFooBar)
Non-magic-number global variables that are non-static and not defined in anonymous namespace should use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos)
Non-magic-number global variables that are static or defined in an anonymous namespace should use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal)
Locally visible static variables should use camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;)
Public, private and protected class member variables should use camel case prefixed with 'm' (e.g., mNbFooValues), though the 'm' pre...

Files:

  • cpp/tensorrt_llm/common/cudaDriverWrapper.h
  • cpp/tensorrt_llm/runtime/virtualMemory.cpp
  • cpp/include/tensorrt_llm/runtime/virtualMemory.h
**/*.h

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.h: Use a preprocessor guard in C++ header files with the guard name format TRTLLM_ followed by the filename in all caps (e.g., TRTLLM_FOO_BAR_HELLO_H for file FooBarHello.h); do not include directory names in the symbol
Do not use underscore prefix or suffix in C++ preprocessor guard symbols; they are reserved in C++ standard for compilers or implementation

Files:

  • cpp/tensorrt_llm/common/cudaDriverWrapper.h
  • cpp/include/tensorrt_llm/runtime/virtualMemory.h
**/*.{cpp,h,cu,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code files should contain an NVIDIA copyright header that includes the current year at the top

Files:

  • cpp/tensorrt_llm/common/cudaDriverWrapper.h
  • cpp/tensorrt_llm/runtime/virtualMemory.cpp
  • cpp/include/tensorrt_llm/runtime/virtualMemory.h
🧠 Learnings (9)
📓 Common learnings
Learnt from: djns99
Repo: NVIDIA/TensorRT-LLM PR: 7104
File: cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_kernels.cu:1185-1189
Timestamp: 2025-08-25T00:03:39.294Z
Learning: TLLM_CHECK_WITH_INFO is a host-side utility function and cannot be called from CUDA device functions (those marked with __device__ or __global__). In device code, assert() is the primary mechanism for handling "should never happen" conditions, and like standard C++ assert, CUDA's assert only works in debug builds and is compiled out in release builds.
📚 Learning: 2025-08-25T00:03:39.294Z
Learnt from: djns99
Repo: NVIDIA/TensorRT-LLM PR: 7104
File: cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_kernels.cu:1185-1189
Timestamp: 2025-08-25T00:03:39.294Z
Learning: TLLM_CHECK_WITH_INFO is a host-side utility function and cannot be called from CUDA device functions (those marked with __device__ or __global__). In device code, assert() is the primary mechanism for handling "should never happen" conditions, and like standard C++ assert, CUDA's assert only works in debug builds and is compiled out in release builds.

Applied to files:

  • cpp/tensorrt_llm/common/cudaDriverWrapper.h
📚 Learning: 2025-09-23T15:13:48.819Z
Learnt from: nv-lschneider
Repo: NVIDIA/TensorRT-LLM PR: 7910
File: cpp/tensorrt_llm/kernels/nccl_device/multimem.h:20-30
Timestamp: 2025-09-23T15:13:48.819Z
Learning: TRT-LLM targets modern CUDA toolkits that support FP8 datatypes, so cuda_fp8.h can be included unconditionally without version guards in TRT-LLM code.

Applied to files:

  • cpp/tensorrt_llm/common/cudaDriverWrapper.h
  • cpp/include/tensorrt_llm/runtime/virtualMemory.h
📚 Learning: 2025-11-14T11:22:03.729Z
Learnt from: nzmora-nvidia
Repo: NVIDIA/TensorRT-LLM PR: 9163
File: tensorrt_llm/_torch/auto_deploy/custom_ops/quant.py:107-113
Timestamp: 2025-11-14T11:22:03.729Z
Learning: In TensorRT-LLM AutoDeploy custom ops, when adding hardware capability checks to select between kernel implementations (e.g., cuBLAS vs. CUDA kernel), use descriptive variable names that identify the specific GPU architectures or families being targeted (e.g., `is_blackwell_geforce_or_ada`) rather than generic names like `enable_cuda_core`. This makes it clear that the code is selecting an implementation path based on hardware capabilities, not enabling/disabling hardware features.

Applied to files:

  • cpp/tensorrt_llm/common/cudaDriverWrapper.h
📚 Learning: 2025-09-23T15:01:00.070Z
Learnt from: nv-lschneider
Repo: NVIDIA/TensorRT-LLM PR: 7910
File: cpp/tensorrt_llm/kernels/nccl_device/config.cu:15-17
Timestamp: 2025-09-23T15:01:00.070Z
Learning: In TensorRT-LLM NCCL device kernels, the <sstream> header is not needed as an explicit include in config.cu because it's provided transitively through other headers. Local compilation testing confirms this works without the explicit include.

Applied to files:

  • cpp/tensorrt_llm/common/cudaDriverWrapper.h
  • cpp/tensorrt_llm/runtime/virtualMemory.cpp
  • cpp/include/tensorrt_llm/runtime/virtualMemory.h
📚 Learning: 2025-09-23T15:01:00.070Z
Learnt from: nv-lschneider
Repo: NVIDIA/TensorRT-LLM PR: 7910
File: cpp/tensorrt_llm/kernels/nccl_device/config.cu:15-17
Timestamp: 2025-09-23T15:01:00.070Z
Learning: In TensorRT-LLM NCCL device kernels (cpp/tensorrt_llm/kernels/nccl_device/config.cu), std::ostringstream is used but <sstream> doesn't need to be explicitly included because it's provided transitively through other headers like tensorrt_llm/common/cudaUtils.h or config.h. Local compilation testing confirms this works without the explicit include.

Applied to files:

  • cpp/tensorrt_llm/common/cudaDriverWrapper.h
  • cpp/include/tensorrt_llm/runtime/virtualMemory.h
📚 Learning: 2025-11-24T17:09:17.870Z
Learnt from: CR
Repo: NVIDIA/TensorRT-LLM PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:09:17.870Z
Learning: Applies to **/*.h : Use a preprocessor guard in C++ header files with the guard name format `TRTLLM_` followed by the filename in all caps (e.g., `TRTLLM_FOO_BAR_HELLO_H` for file `FooBarHello.h`); do not include directory names in the symbol

Applied to files:

  • cpp/tensorrt_llm/common/cudaDriverWrapper.h
  • cpp/include/tensorrt_llm/runtime/virtualMemory.h
📚 Learning: 2025-08-19T03:35:20.866Z
Learnt from: djns99
Repo: NVIDIA/TensorRT-LLM PR: 6915
File: cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_kernels.cu:4616-4626
Timestamp: 2025-08-19T03:35:20.866Z
Learning: In the MOE profiler TMA workspace preparation (cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_kernels.cu), the overlapping of TMA WS regions for NONE and FINALIZE variants is deliberate design to save memory space, as confirmed by djns99. The comment "reuse the same pointers to save space" reflects this intentional behavior.

Applied to files:

  • cpp/tensorrt_llm/runtime/virtualMemory.cpp
📚 Learning: 2025-11-24T17:09:17.870Z
Learnt from: CR
Repo: NVIDIA/TensorRT-LLM PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:09:17.870Z
Learning: Applies to **/*.cu : CUDA code must be compiled with a CUDA compiler and includes declarations/definitions with CUDA keywords (`__device__`, `__managed__`, `__constant__`, `__global__`), device functions, and kernel launching with <<<...>>> syntax

Applied to files:

  • cpp/include/tensorrt_llm/runtime/virtualMemory.h
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (9)
cpp/tensorrt_llm/common/cudaDriverWrapper.h (2)

143-162: Extended checkDriver info parameter looks correct and preserves existing behavior

The added info parameter cleanly augments the error message when non-null and defaults to nullptr so existing call sites remain source-compatible. The conditional formatting branch is straightforward and still falls back to the previous message structure when info is absent. No functional issues spotted here.


187-192: TLLM_CU_CHECK_WITH_INFO is well-integrated; consider usage constraints

The new macro correctly preformats the info string and passes it into checkDriver, giving richer context on CUDA driver failures. Lifetime of the fmtstr(...).c_str() argument is safe for the duration of the call. Just keep in mind this remains a host-side utility (like other TLLM_*CHECK macros) and should not be invoked from device code.

cpp/include/tensorrt_llm/runtime/virtualMemory.h (1)

203-239: Improved cuMemCreate error reporting in LocalCreator::create

Switching to TLLM_CU_CHECK_WITH_INFO for cuMemCreate with "allocating %zu bytes of memory" and mSize gives exactly the extra context you want when allocations fail, without changing semantics. The memory accounting that follows still uses the same mSize, so behavior remains consistent.

cpp/tensorrt_llm/runtime/virtualMemory.cpp (6)

49-60: More informative status checks in materialize

Using TLLM_CHECK_WITH_INFO with the current status value before materializing a chunk is an improvement over a plain assert-style check. It keeps the control flow unchanged but gives clear diagnostics if materialize is called in an invalid state.


88-115: Release-path status validation is clearer and safer

The new TLLM_CHECK_WITH_INFO in _release precisely documents valid states (MATERIALIZED or ERRORED with non-INVALID_STATE) and reports the actual status on misuse. This doesn’t change behavior when called correctly but improves failure-mode debuggability.


153-172: Manager add now surfaces detailed misuse information

Switching these internal checks in CudaVirtualMemoryManager::add to TLLM_CHECK_WITH_INFO (status validation and duplicate-handle detection) is a straight upgrade in error reporting and doesn’t alter the success path. The ScopeGuard rollback logic remains intact.


339-380: Alignment fix in allocate correctly ties VA reservation, mapping, and allocation to GPU granularity

The revised allocation path looks correct and addresses the original bug:

  • pageAlignedSize = mConfig->pageAligned(n, device) ensures the size passed to:
    • cuMemAddressReserve,
    • UnicastConfigurator (for cuMemMap/cuMemUnmap),
    • and LocalCreator / cuMemCreate
      is a common multiple of both system page size and the GPU’s allocation granularity.
  • TLLM_CU_CHECK_WITH_INFO around cuMemAddressReserve with "allocating %zu bytes of address space" provides precise context on failures.
  • The configurators that operate on logical payload (MemsetConfigurator, OffloadConfigurator) still use n, so they only touch/backup the intended region inside the larger aligned allocation, which is fine.

Taken together, the VA reservation, mapping size, and physical allocation size are now consistent and satisfy the GPU alignment requirements.

You may want to run a quick smoke test on devices with non-trivial cuMemGetAllocationGranularity (e.g., large-bar configurations) and verify that cuMemCreate, cuMemMap, and cuMemAddressReserve/Free all succeed for a variety of non-aligned n values.


382-389: Deallocate remains consistent with new alignment strategy

deallocate now relies on mConfig->pageAligned(n) for cuMemAddressFree, which uses the same cached alignment value computed on first use in pageAligned. For a given n, the aligned size used in cuMemAddressReserve and in cuMemAddressFree will match, so freeing the VA range remains correct under the new device-aware alignment scheme.


417-426: Allocator configuration guard now emits helpful context on misuse

The TLLM_CHECK_WITH_INFO in setVirtualMemoryAllocator that prints the tag, mode, and stream of an already-active allocator is a useful diagnostic improvement when someone attempts to reconfigure virtual memory while it is already in use. No behavior change for the valid path, and the message should make misconfiguration much easier to track down.

@tongyuantongyu
Copy link
Member Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25929 [ run ] triggered by Bot. Commit: a35aac3

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25929 [ run ] completed with state SUCCESS. Commit: a35aac3
/LLM/main/L0_MergeRequest_PR pipeline #19661 completed with status: 'FAILURE'

Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
@tongyuantongyu
Copy link
Member Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #26120 [ run ] triggered by Bot. Commit: 30d67da

@tensorrt-cicd
Copy link
Collaborator

PR_Github #26120 [ run ] completed with state SUCCESS. Commit: 30d67da
/LLM/main/L0_MergeRequest_PR pipeline #19836 completed with status: 'FAILURE'

@tongyuantongyu
Copy link
Member Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #26244 [ run ] triggered by Bot. Commit: 30d67da

@tensorrt-cicd
Copy link
Collaborator

PR_Github #26244 [ run ] completed with state SUCCESS. Commit: 30d67da
/LLM/main/L0_MergeRequest_PR pipeline #19941 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants