Commit a88a404
authored
[BUG] fixed memory leak in BaseModel by detach some tensor (#1924)
#### Reference Issues/PRs
#1369
#1461
#### What does this implement/fix? Explain your changes.
1.Detached tensors in the log dictionary before appending them to the
training/validation/testing_step_outputs lists. This fixes a memory leak
caused by retaining the computation graph for every batch throughout an
entire epoch.
2.Detached the loss tensor within the step() method before logging.
3.Move prediction results to CPU to prevent VRAM growth.
#### Did you add any tests for the change?
I ran my training code for 5 epochs using a memory profiler. Here are
two comparison plot:
before
<img width="1156" height="472" alt="before"
src="https://github.com/user-attachments/assets/45a6696a-efe6-4f06-897e-d80daee79977"
/>
after
<img width="1132" height="470" alt="alfter"
src="https://github.com/user-attachments/assets/c3ba5187-97ee-4b6b-b4cb-2d641b2d0d88"
/>1 parent 81b5303 commit a88a404
1 file changed
+8
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
| 57 | + | |
57 | 58 | | |
58 | 59 | | |
| 60 | + | |
59 | 61 | | |
60 | 62 | | |
61 | 63 | | |
| |||
308 | 310 | | |
309 | 311 | | |
310 | 312 | | |
| 313 | + | |
| 314 | + | |
311 | 315 | | |
312 | 316 | | |
313 | 317 | | |
| |||
720 | 724 | | |
721 | 725 | | |
722 | 726 | | |
723 | | - | |
| 727 | + | |
724 | 728 | | |
725 | 729 | | |
726 | 730 | | |
| |||
739 | 743 | | |
740 | 744 | | |
741 | 745 | | |
742 | | - | |
| 746 | + | |
743 | 747 | | |
744 | 748 | | |
745 | 749 | | |
| |||
750 | 754 | | |
751 | 755 | | |
752 | 756 | | |
753 | | - | |
| 757 | + | |
754 | 758 | | |
755 | 759 | | |
756 | 760 | | |
| |||
934 | 938 | | |
935 | 939 | | |
936 | 940 | | |
937 | | - | |
| 941 | + | |
938 | 942 | | |
939 | 943 | | |
940 | 944 | | |
| |||
0 commit comments