Why in one case you use the outputs for class and box predictions in other case you use detection for the same ?