Hi, thanks for your great jobs. I have some questions in stage 1 patch embedder training.
- I want to know whether the latent need to add noise like in diffusion forward process to calculate the mse loss in stage1 before sending patch embedder.
- If I want to train an I2V task, do I need to concatenate all the conditions in channels before feeding them into the patch embedder for training?
- And what is the value of mse loss after convergence in your training setting?
Can you share more information? Thanks~