Skip to content

Commit 2ca211d

Browse files
Improve documentation (#372)
* Fix sweep to keep the best model and add best_score of the first model * Improve documentation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove wrong changes --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 5c09a7d commit 2ca211d

File tree

15 files changed

+142
-135
lines changed

15 files changed

+142
-135
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
PyTorch Tabular aims to make Deep Learning with Tabular data easy and accessible to real-world cases and research alike. The core principles behind the design of the library are:
1414

15-
- Low Resistance Useability
15+
- Low Resistance Usability
1616
- Easy Customization
1717
- Scalable and Easier to Deploy
1818

docs/models.md

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ While there are separate config classes for each model, all of them share a few
2727

2828
- `learning_rate`: float: The learning rate of the model. Defaults to 1e-3.
2929

30-
- `loss`: Optional\[str\]: The loss function to be applied. By Default it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification
30+
- `loss`: Optional\[str\]: The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification
3131

3232
- `metrics`: Optional\[List\[str\]\]: The list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in `torchmetrics`. By default, it is `accuracy` if classification and `mean_squared_error` for regression
3333

@@ -55,13 +55,13 @@ That's it, Thats the most basic necessity. All the rest is intelligently inferre
5555

5656
Adam Optimizer and the `learning_rate` of 1e-3 is a default that is set in PyTorch Tabular. It's a rule of thumb that works in most cases and a good starting point which has worked well empirically. If you want to change the learning rate(which is a pretty important hyperparameter), this is where you should. There is also an automatic way to derive a good learning rate which we will talk about in the TrainerConfig. In that case, Pytorch Tabular will ignore the learning rate set through this parameter
5757

58-
Another key component of the model is the `loss`. Pytorch Tabular can use any loss function from standard PyTorch([`torch.nn`](https://pytorch.org/docs/stable/nn.html#loss-functions)) through this config. By default it is set to `MSELoss` for regression and `CrossEntropyLoss` for classification, which works well for those use cases and are the most popular loss functions used. If you want to use something else specficaly, like `L1Loss`, you just need to mention it in the `loss` parameter
58+
Another key component of the model is the `loss`. Pytorch Tabular can use any loss function from standard PyTorch([`torch.nn`](https://pytorch.org/docs/stable/nn.html#loss-functions)) through this config. By default, it is set to `MSELoss` for regression and `CrossEntropyLoss` for classification, which works well for those use cases and are the most popular loss functions used. If you want to use something else specficaly, like `L1Loss`, you just need to mention it in the `loss` parameter
5959

6060
```python
6161
loss = "L1Loss
6262
```
6363

64-
PyTorch Tabular also accepts custom loss functions(which are drop in replacements for the standard loss functions) through the `fit` method in the `TabularModel`.
64+
PyTorch Tabular also accepts custom loss functions (which are drop in replacements for the standard loss functions) through the `fit` method in the `TabularModel`.
6565

6666
!!! warning
6767

@@ -113,7 +113,7 @@ All the parameters have intelligent default values. Let's look at few of them:
113113
- `use_batch_norm`: bool: Flag to include a BatchNorm layer after each Linear Layer+DropOut. Defaults to `False`
114114
- `dropout`: float: The probability of the element to be zeroed. This applies to all the linear layers. Defaults to `0.0`
115115

116-
**For a complete list of parameters refer to the API Docs**
116+
**For a complete list of parameters refer to the API Docs**
117117
[pytorch_tabular.models.CategoryEmbeddingModelConfig][]
118118

119119
### Gated Adaptive Network for Deep Automated Learning of Features (GANDALF)
@@ -141,7 +141,7 @@ All the parameters have beet set to recommended values from the paper. Let's loo
141141
GANDALF can be considered as a more light and more performant Gated Additive Tree Ensemble (GATE). For most purposes, GANDALF is a better choice than GATE.
142142

143143

144-
**For a complete list of parameters refer to the API Docs**
144+
**For a complete list of parameters refer to the API Docs**
145145
[pytorch_tabular.models.GANDALFConfig][]
146146

147147

@@ -165,14 +165,14 @@ All the parameters have beet set to recommended values from the paper. Let's loo
165165

166166
- `share_head_weights`: bool: If True, we will share the weights between the heads. Defaults to True
167167

168-
**For a complete list of parameters refer to the API Docs**
168+
**For a complete list of parameters refer to the API Docs**
169169
[pytorch_tabular.models.GatedAdditiveTreeEnsembleConfig][]
170170

171171
### Neural Oblivious Decision Ensembles (NODE)
172172

173-
[Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data](https://arxiv.org/abs/1909.06312) is a model presented in ICLR 2020 and according to the authors have beaten well-tuned Gradient Boosting models on many datasets. It uses a Neural equivalent of Oblivious Trees(the kind of trees Catboost uses) as the basic building blocks of the architecture. You can use it by choosing `NodeConfig`.
173+
[Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data](https://arxiv.org/abs/1909.06312) is a model presented in ICLR 2020 and according to the authors have beaten well-tuned Gradient Boosting models on many datasets. It uses a Neural equivalent of Oblivious Trees (the kind of trees Catboost uses) as the basic building blocks of the architecture. You can use it by choosing `NodeConfig`.
174174

175-
The basic block, or a "layer" looks something like below(from the paper)
175+
The basic block, or a "layer" looks something like below (from the paper)
176176

177177
![NODE Architecture](imgs/node_arch.png)
178178

@@ -185,37 +185,37 @@ All the parameters have beet set to recommended values from the paper. Let's loo
185185
- `num_layers`: int: Number of Oblivious Decision Tree Layers in the Dense Architecture. Defaults to `1`
186186
- `num_trees`: int: Number of Oblivious Decision Trees in each layer. Defaults to `2048`
187187
- `depth`: int: The depth of the individual Oblivious Decision Trees. Parameters increase exponentially with the increase in depth. Defaults to `6`
188-
- `choice_function`: str: Generates a sparse probability distribution to be used as feature weights(aka, soft feature selection). Choices are: `entmax15` `sparsemax`. Defaults to `entmax15`
189-
- `bin_function`: str: Generates a sparse probability distribution to be used as tree leaf weights. Choices are: `entmax15` `sparsemax`. Defaults to `entmax15`
188+
- `choice_function`: str: Generates a sparse probability distribution to be used as feature weights (aka, soft feature selection). Choices are: `entmax15` `sparsemax`. Defaults to `entmax15`
189+
- `bin_function`: str: Generates a sparse probability distribution to be used as tree leaf weights. Choices are: `entmoid15` `sparsemoid`. Defaults to `entmoid15`
190190
- `additional_tree_output_dim`: int: The additional output dimensions which is only used to pass through different layers of the architectures. Only the first output_dim outputs will be used for prediction. Defaults to `3`
191191
- `input_dropout`: float: Dropout which is applied to the input to the different layers in the Dense Architecture. The probability of the element to be zeroed. Defaults to `0.0`
192192

193193

194-
**For a complete list of parameters refer to the API Docs**
194+
**For a complete list of parameters refer to the API Docs**
195195
[pytorch_tabular.models.NodeConfig][]
196196

197197
!!! note
198198

199-
NODE model has a lot of parameters and therefore takes up a lot of memory. Smaller batchsizes(like 64 or 128) makes the model manageable in a smaller GPU(~4GB).
199+
NODE model has a lot of parameters and therefore takes up a lot of memory. Smaller batchsizes (like 64 or 128) makes the model manageable in a smaller GPU(~4GB).
200200

201201
### TabNet
202202

203203
- [TabNet: Attentive Interpretable Tabular Learning](https://arxiv.org/abs/1908.07442) is another model coming out of Google Research which uses Sparse Attention in multiple steps of decision making to model the output. You can use it by choosing `TabNetModelConfig`.
204204

205-
The architecture is as shown below(from the paper)
205+
The architecture is as shown below (from the paper)
206206

207207
![TabNet Architecture](imgs/tabnet_architecture.png)
208208

209209
All the parameters have beet set to recommended values from the paper. Let's look at few of them:
210210

211211
- `n_d`: int: Dimension of the prediction layer (usually between 4 and 64). Defaults to `8`
212212
- `n_a`: int: Dimension of the attention layer (usually between 4 and 64). Defaults to `8`
213-
- `n_steps`: int: Number of sucessive steps in the newtork (usually betwenn 3 and 10). Defaults to `3`
213+
- `n_steps`: int: Number of successive steps in the network (usually between 3 and 10). Defaults to `3`
214214
- `n_independent`: int: Number of independent GLU layer in each GLU block. Defaults to `2`
215215
- `n_shared`: int: Number of independent GLU layer in each GLU block. Defaults to `2`
216216
- `virtual_batch_size`: int: Batch size for Ghost Batch Normalization. BatchNorm on large batches sometimes does not do very well and therefore Ghost Batch Normalization which does batch normalization in smaller virtual batches is implemented in TabNet. Defaults to `128`
217217

218-
**For a complete list of parameters refer to the API Docs**
218+
**For a complete list of parameters refer to the API Docs**
219219
[pytorch_tabular.models.TabNetModelConfig][]
220220

221221
### Automatic Feature Interaction Learning via Self-Attentive Neural Networks(AutoInt)
@@ -228,9 +228,9 @@ All the parameters have beet set to recommended values from the paper. Let's loo
228228

229229
- `num_heads`: int: The number of heads in the Multi-Headed Attention layer. Defaults to 2
230230

231-
- `num_attn_blocks`: int: The number of layers of stacked Multi-Headed Attention layers. Defaults to 2
231+
- `num_attn_blocks`: int: The number of layers of stacked Multi-Headed Attention layers. Defaults to 3
232232

233-
**For a complete list of parameters refer to the API Docs**
233+
**For a complete list of parameters refer to the API Docs**
234234
[pytorch_tabular.models.AutoIntConfig][]
235235

236236
### DANETs: Deep Abstract Networks for Tabular Data Classification and Regression
@@ -239,18 +239,18 @@ All the parameters have beet set to recommended values from the paper. Let's loo
239239

240240
All the parameters have beet set to recommended values from the paper. Let's look at them:
241241

242-
- `n_layers`: int: Number of Blocks in the DANet. Defaults to 16
242+
- `n_layers`: int: Number of Blocks in the DANet. Each block has 2 Abstlay Blocks each. Defaults to 8
243243

244244
- `abstlay_dim_1`: int: The dimension for the intermediate output in the first ABSTLAY layer in a Block. Defaults to 32
245245

246-
- `abstlay_dim_2`: int: The dimension for the intermediate output in the second ABSTLAY layer in a Block. Defaults to 64
246+
- `abstlay_dim_2`: int: The dimension for the intermediate output in the second ABSTLAY layer in a Block. If None, it will be twice abstlay_dim_1. Defaults to None
247247

248248
- `k`: int: The number of feature groups in the ABSTLAY layer. Defaults to 5
249249

250250
- `dropout_rate`: float: Dropout to be applied in the Block. Defaults to 0.1
251251

252252

253-
**For a complete list of parameters refer to the API Docs**
253+
**For a complete list of parameters refer to the API Docs**
254254
[pytorch_tabular.models.DANetConfig][]
255255

256256
## Implementing New Architectures
@@ -308,7 +308,7 @@ In addition to the model, you will also need to define a config. Configs are pyt
308308

309309
**Key things to note:**
310310

311-
1. All the different parameters in the different configs(like TrainerConfig, OptimizerConfig, etc) are all available in `config` before calling `super()` and in `self.hparams` after.
311+
1. All the different parameters in the different configs (like TrainerConfig, OptimizerConfig, etc) are all available in `config` before calling `super()` and in `self.hparams` after.
312312
1. the input batch at the `forward` method is a dictionary with keys `continuous` and `categorical`
313313
1. In the `\_build_network` method, save every component that you want access in the `forward` to `self`
314314

src/pytorch_tabular/config/config.py

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -68,31 +68,31 @@ class DataConfig:
6868
introduction_date and with a monthly frequency like "2023-12" should have
6969
an entry ('intro_date','M','%Y-%m')
7070
71-
encode_date_columns (bool): Whether or not to encode the derived variables from date
71+
encode_date_columns (bool): Whether to encode the derived variables from date
7272
7373
validation_split (Optional[float]): Percentage of Training rows to keep aside as validation. Used
7474
only if Validation Data is not given separately
7575
76-
continuous_feature_transform (Optional[str]): Whether or not to transform the features before
77-
modelling. By default it is turned off.. Choices are: [`None`,`yeo-johnson`,`box-
78-
cox`,`quantile_normal`,`quantile_uniform`].
76+
continuous_feature_transform (Optional[str]): Whether to transform the features before
77+
modelling. By default, it is turned off. Choices are: [`None`,`yeo-johnson`,`box-cox`,
78+
`quantile_normal`,`quantile_uniform`].
7979
8080
normalize_continuous_features (bool): Flag to normalize the input features(continuous)
8181
8282
quantile_noise (int): NOT IMPLEMENTED. If specified fits QuantileTransformer on data with added
8383
gaussian noise with std = :quantile_noise: * data.std ; this will cause discrete values to be more
84-
separable. Please not that this transformation does NOT apply gaussian noise to the resulting
84+
separable. Please note that this transformation does NOT apply gaussian noise to the resulting
8585
data, the noise is only applied for QuantileTransformer
8686
8787
num_workers (Optional[int]): The number of workers used for data loading. For windows always set to
8888
0
8989
90-
pin_memory (bool): Whether or not to pin memory for data loading.
90+
pin_memory (bool): Whether to pin memory for data loading.
9191
92-
handle_unknown_categories (bool): Whether or not to handle unknown or new values in categorical
92+
handle_unknown_categories (bool): Whether to handle unknown or new values in categorical
9393
columns as unknown
9494
95-
handle_missing_values (bool): Whether or not to handle missing values in categorical columns as
95+
handle_missing_values (bool): Whether to handle missing values in categorical columns as
9696
unknown
9797
"""
9898

@@ -146,7 +146,7 @@ class DataConfig:
146146
)
147147
normalize_continuous_features: bool = field(
148148
default=True,
149-
metadata={"help": "Flag to normalize the input features(continuous)"},
149+
metadata={"help": "Flag to normalize the input features (continuous)"},
150150
)
151151
quantile_noise: int = field(
152152
default=0,
@@ -264,7 +264,7 @@ class TrainerConfig:
264264
Choices are: [`cpu`,`gpu`,`tpu`,`ipu`,'mps',`auto`].
265265
266266
devices (Optional[int]): Number of devices to train on (int). -1 uses all available devices. By
267-
default uses all available devices (-1)
267+
default, uses all available devices (-1)
268268
269269
devices_list (Optional[List[int]]): List of devices to train on (list). If specified, takes
270270
precedence over `devices` argument. Defaults to None
@@ -563,7 +563,7 @@ class ExperimentConfig:
563563
this defines the folder under which the logs will be saved and for W&B it defines the project name
564564
565565
run_name (Optional[str]): The name of the run; a specific identifier to recognize the run. If left
566-
blank, will be assigned a auto-generated name
566+
blank, will be assigned an auto-generated name
567567
568568
exp_watch (Optional[str]): The level of logging required. Can be `gradients`, `parameters`, `all`
569569
or `None`. Defaults to None. Choices are: [`gradients`,`parameters`,`all`,`None`].
@@ -695,7 +695,7 @@ def __init__(
695695
exp_version_manager: str = ".pt_tmp/exp_version_manager.yml",
696696
) -> None:
697697
"""The manages the versions of the experiments based on the name. It is a simple dictionary(yaml) based lookup.
698-
Primary purpose is to avoid overwriting of saved models while runing the training without changing the
698+
Primary purpose is to avoid overwriting of saved models while running the training without changing the
699699
experiment name.
700700
701701
Args:
@@ -752,7 +752,7 @@ class ModelConfig:
752752
753753
learning_rate (float): The learning rate of the model. Defaults to 1e-3.
754754
755-
loss (Optional[str]): The loss function to be applied. By Default it is MSELoss for regression and
755+
loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
756756
CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
757757
or L1Loss for regression and CrossEntropyLoss for classification
758758

0 commit comments

Comments
 (0)