We don't need the `dpo_beta` term while computing the margin here: https://github.com/thinking-machines-lab/tinker-cookbook/blob/main/tinker_cookbook/preference/train_dpo.py#L142. It's extraneous, since we are already multiplying it in the two lines above: https://github.com/thinking-machines-lab/tinker-cookbook/blob/main/tinker_cookbook/preference/train_dpo.py#L140-L141