Best practice: where to count number of samples per class #15199

mfoglio · 2022-10-19T20:44:50Z

mfoglio
Oct 19, 2022

Hello, I am looking for best practices as I know that there could be multiple ways to solve the problem. I just like to understand if there is a lightning approach that should be preferred.

In my LightningModule I initialize a CrossEntropyLoss with specific weight to handle imbalanced classes: torch.nn.CrossEntropyLoss(weight=my_weights). The weight for each class is defined as the 1 / number_of_samples_in_the_class.

In order to do this, I need to supply my LightningModule instance with the number of samples per class. However, usually you would load the data (and therefore count the number of samples per class in the setup function of the LightningDataModule instance. So here's the problem: usually, when you initialize the LightningModule you haven't loaded yet the data.

Example:

class MyDataModule(LightningDataModule):
    def __init__(self):
        self.number_of_samples_per_class = None

     def setup(self, stage):
         self.number_of_samples_per_class: Dict[str, int] = ....

class MyModel(LightningModule):
    def __init__(number_of_samples_per_class: Dict[str, int]):
        weights = self._compute_weights(number_of_samples_per_class)
        self.loss = CrossEntropyLoss(weights)
    
    def _compute_weights(self, number_of_samples_per_class: Dict[str, int]):
         .... compute weights here ...
         return weights

my_data_module = MyDataModule()
my_model = MyModel(my_data_module.number_of_samples_per_class) # Error! my_data_module.number_of_samples_per_class is still None because my_data_module.setup() has not been called yet!

As possible solutions, I could manually call my_data_module.setup() or I could compute the number of samples inside the __init__ function of MyDataModule but both ways seem not to follow torch lightning philosophy. What would be the cleanest way to solve this?
Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Best practice: where to count number of samples per class #15199

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Best practice: where to count number of samples per class #15199

Uh oh!

Uh oh!

mfoglio Oct 19, 2022

Replies: 0 comments

mfoglio
Oct 19, 2022