MSEloss by default uses 'mean' as reduction method, so I think `epoch_loss += (loss.detach().item() / batchsize)` is incorrect