🐛 [Bug] run_llm.py fails with offload_module_to_cpu=True

##  Bug Description

After KV caching, the exported_program.module() fails with input not found error. Likely something changed in exported_program.module() API. Workaround is setting offload_module_to_cpu=False

## To Reproduce

Steps to reproduce the behavior:

1.
2.
3.



## Expected behavior



## Environment

> Build information about Torch-TensorRT can be found by turning on debug messages

 - Torch-TensorRT Version (e.g. 1.0.0):
 - PyTorch Version (e.g. 1.0):
 - CPU Architecture:
 - OS (e.g., Linux):
 - How you installed PyTorch (`conda`, `pip`, `libtorch`, source):
 - Build command you used (if compiling from source):
 - Are you using local sources or building from archives:
 - Python version:
 - CUDA version:
 - GPU models and configuration:
 - Any other relevant information:

## Additional context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛 [Bug] run_llm.py fails with offload_module_to_cpu=True #3888

Bug Description

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🐛 [Bug] run_llm.py fails with offload_module_to_cpu=True #3888

Description

Bug Description

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions