Trainer.test() on ddp can not get entire dataset. #5263

MarsSu0618 · 2020-12-25T07:39:49Z

MarsSu0618
Dec 25, 2020

Problem

Hi, Everyone. I have some question about ddp. Because I want to write predict result to file.
And I use trainer.test(model=model, test_dataloaders=test_dataloader) to process it. But I just get 1/ngpus dataset.
For example:

GPUS: 2
total data: 10000
predict data: 5000(total data / GPUS)

Hope someone can help me to solve it. Because I have to write predict result to file.

Environment

pytorch: 1.4.0
pytorch-lightning: 1.1.1
GPUS: Tesla P100-PCIE-16GB * 2

Sample Code

class ExampleModel(pl.LightningModule):
   def __init__(self):
       self.original_file = open('/path/to/file', 'a')
   def train_step(...):
       .....
   def test_step(self, batch, batch_idx):
        init_ids = batch['init_ids']
        attention_mask = batch['attention_mask']
        token_type_ids = batch['token_type_ids']
        predictions = self.model(init_ids, attention_mask, token_type_ids)
        ..........
        self.original_file.write(convert_ids_to_str(init_ids.cpu().numpy()) + '\t' + str(seg.cpu().numpy()[0]) + '\n')

trainer

trainer = pl.Trainer(max_epochs=EPOCH,
                    gpus=[0,1], 
                    num_nodes=1,
                    auto_select_gpus=True,
                    num_sanity_val_steps=0,
                    accelerator='ddp',
                    callbacks=[modelcheckpoint_callback, earlystopping_callback])

trainer.test(model=model, test_dataloaders=test_dataloader)

Hope someone can help or answer how to do it.

Answered by carmocca

Apr 20, 2021

Please, check out trainer.predict() after 1.3 is released!

View full answer

carmocca · 2021-04-20T23:44:30Z

carmocca
Apr 20, 2021

Please, check out trainer.predict() after 1.3 is released!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trainer.test() on ddp can not get entire dataset. #5263

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Trainer.test() on ddp can not get entire dataset. #5263

Uh oh!

Uh oh!

MarsSu0618 Dec 25, 2020

Problem

Environment

Sample Code

Replies: 1 comment

Uh oh!

carmocca Apr 20, 2021

MarsSu0618
Dec 25, 2020

carmocca
Apr 20, 2021