Thnak you very much for this excellent work.
Here I want to ask whether it is possible to give us some choice to control the hyper-parameters of PPO loss, like clip_ratio_high and clip_rario_low, which is very important to control the exploration-exploitation of RL.
The Tinker documentation shows that we can implement the custom loss by ourselves. However, the tranining time will becomes x3, which is too much.