2525class TabularClassificationTask (BaseTask ):
2626 """
2727 Tabular Classification API to the pipelines.
28+
2829 Args:
29- seed (int), (default=1): seed to be used for reproducibility.
30- n_jobs (int), (default=1): number of consecutive processes to spawn.
31- n_threads (int), (default=1):
30+ seed (int: default=1):
31+ seed to be used for reproducibility.
32+ n_jobs (int: default=1):
33+ number of consecutive processes to spawn.
34+ n_threads (int: default=1):
3235 number of threads to use for each process.
3336 logging_config (Optional[Dict]):
34- specifies configuration for logging, if None, it is loaded from the logging.yaml
35- ensemble_size (int), ( default=50):
37+ Specifies configuration for logging, if None, it is loaded from the logging.yaml
38+ ensemble_size (int: default=50):
3639 Number of models added to the ensemble built by
3740 Ensemble selection from libraries of models.
3841 Models are drawn with replacement.
39- ensemble_nbest (int), ( default=50):
40- only consider the ensemble_nbest
42+ ensemble_nbest (int: default=50):
43+ Only consider the ensemble_nbest
4144 models to build the ensemble
42- max_models_on_disc (int), (default=50):
43- maximum number of models saved to disc.
44- Also, controls the size of the ensemble as any additional models will be deleted.
45+ max_models_on_disc (int: default=50):
46+ Maximum number of models saved to disc.
47+ Also, controls the size of the ensemble
48+ as any additional models will be deleted.
4549 Must be greater than or equal to 1.
4650 temporary_directory (str):
47- folder to store configuration output and log file
51+ Folder to store configuration output and log file
4852 output_directory (str):
49- folder to store predictions for optional test set
53+ Folder to store predictions for optional test set
5054 delete_tmp_folder_after_terminate (bool):
51- determines whether to delete the temporary directory, when finished
55+ Determines whether to delete the temporary directory,
56+ when finished
5257 include_components (Optional[Dict]):
53- If None, all possible components are used. Otherwise
54- specifies set of components to use.
58+ If None, all possible components are used.
59+ Otherwise specifies set of components to use.
5560 exclude_components (Optional[Dict]):
56- If None, all possible components are used. Otherwise
57- specifies set of components not to use. Incompatible
58- with include components
61+ If None, all possible components are used.
62+ Otherwise specifies set of components not to use.
63+ Incompatible with include components.
5964 search_space_updates (Optional[HyperparameterSearchSpaceUpdates]):
6065 search space updates that can be used to modify the search
6166 space of particular components or choice modules of the pipeline
@@ -102,6 +107,16 @@ def __init__(
102107 )
103108
104109 def build_pipeline (self , dataset_properties : Dict [str , Any ]) -> TabularClassificationPipeline :
110+ """
111+ Build pipeline according to current task and for the passed dataset properties
112+
113+ Args:
114+ dataset_properties (Dict[str,Any])
115+
116+ Returns:
117+ TabularClassificationPipeline:
118+ Pipeline compatible with the given dataset properties.
119+ """
105120 return TabularClassificationPipeline (dataset_properties = dataset_properties )
106121
107122 def search (
@@ -143,38 +158,38 @@ def search(
143158 budget_type (str):
144159 Type of budget to be used when fitting the pipeline.
145160 It can be one of:
146- + ' epochs' : The training of each pipeline will be terminated after
147- a number of epochs have passed. This number of epochs is determined by the
148- budget argument of this method.
149- + ' runtime' : The training of each pipeline will be terminated after
150- a number of seconds have passed. This number of seconds is determined by the
151- budget argument of this method. The overall fitting time of a pipeline is
152- controlled by func_eval_time_limit_secs. 'runtime' only controls the allocated
153- time to train a pipeline, but it does not consider the overall time it takes
154- to create a pipeline (data loading and preprocessing, other i/o operations, etc.).
155- budget_type will determine the units of min_budget/max_budget. If budget_type=='epochs'
156- is used, min_budget will refer to epochs whereas if budget_type=='runtime' then
157- min_budget will refer to seconds.
161+ + ` epochs` : The training of each pipeline will be terminated after
162+ a number of epochs have passed. This number of epochs is determined by the
163+ budget argument of this method.
164+ + ` runtime` : The training of each pipeline will be terminated after
165+ a number of seconds have passed. This number of seconds is determined by the
166+ budget argument of this method. The overall fitting time of a pipeline is
167+ controlled by func_eval_time_limit_secs. 'runtime' only controls the allocated
168+ time to train a pipeline, but it does not consider the overall time it takes
169+ to create a pipeline (data loading and preprocessing, other i/o operations, etc.).
170+ budget_type will determine the units of min_budget/max_budget. If budget_type=='epochs'
171+ is used, min_budget will refer to epochs whereas if budget_type=='runtime' then
172+ min_budget will refer to seconds.
158173 min_budget (int):
159- Auto-PyTorch uses `Hyperband <https://arxiv.org/abs/1603.06560>_` to
174+ Auto-PyTorch uses `Hyperband <https://arxiv.org/abs/1603.06560>`_ to
160175 trade-off resources between running many pipelines at min_budget and
161176 running the top performing pipelines on max_budget.
162177 min_budget states the minimum resource allocation a pipeline should have
163178 so that we can compare and quickly discard bad performing models.
164179 For example, if the budget_type is epochs, and min_budget=5, then we will
165180 run every pipeline to a minimum of 5 epochs before performance comparison.
166181 max_budget (int):
167- Auto-PyTorch uses `Hyperband <https://arxiv.org/abs/1603.06560>_` to
182+ Auto-PyTorch uses `Hyperband <https://arxiv.org/abs/1603.06560>`_ to
168183 trade-off resources between running many pipelines at min_budget and
169184 running the top performing pipelines on max_budget.
170185 max_budget states the maximum resource allocation a pipeline is going to
171186 be ran. For example, if the budget_type is epochs, and max_budget=50,
172187 then the pipeline training will be terminated after 50 epochs.
173- total_walltime_limit (int), ( default=100): Time limit
174- in seconds for the search of appropriate models.
188+ total_walltime_limit (int: default=100):
189+ Time limit in seconds for the search of appropriate models.
175190 By increasing this value, autopytorch has a higher
176191 chance of finding better models.
177- func_eval_time_limit_secs (int), (default=None ):
192+ func_eval_time_limit_secs (Optional[ int] ):
178193 Time limit for a single call to the machine learning model.
179194 Model fitting will be terminated if the machine
180195 learning algorithm runs over the time limit. Set
@@ -185,47 +200,54 @@ def search(
185200 total_walltime_limit // 2 to allow enough time to fit
186201 at least 2 individual machine learning algorithms.
187202 Set to np.inf in case no time limit is desired.
188- enable_traditional_pipeline (bool), ( default=True):
203+ enable_traditional_pipeline (bool: default=True):
189204 We fit traditional machine learning algorithms
190205 (LightGBM, CatBoost, RandomForest, ExtraTrees, KNN, SVM)
191- before building PyTorch Neural Networks. You can disable this
206+ prior building PyTorch Neural Networks. You can disable this
192207 feature by turning this flag to False. All machine learning
193208 algorithms that are fitted during search() are considered for
194209 ensemble building.
195- memory_limit (Optional[int]), ( default=4096):
196- Memory limit in MB for the machine learning algorithm. autopytorch
197- will stop fitting the machine learning algorithm if it tries
198- to allocate more than memory_limit MB. If None is provided,
199- no memory limit is set. In case of multi-processing, memory_limit
200- will be per job. This memory limit also applies to the ensemble
201- creation process.
210+ memory_limit (Optional[int]: default=4096):
211+ Memory limit in MB for the machine learning algorithm.
212+ Autopytorch will stop fitting the machine learning algorithm
213+ if it tries to allocate more than memory_limit MB. If None
214+ is provided, no memory limit is set. In case of multi-processing,
215+ memory_limit will be per job. This memory limit also applies to
216+ the ensemble creation process.
202217 smac_scenario_args (Optional[Dict]):
203218 Additional arguments inserted into the scenario of SMAC. See the
204- [SMAC documentation] (https://automl.github.io/SMAC3/master/options.html?highlight=scenario#scenario)
219+ `SMAC documentation <https://automl.github.io/SMAC3/master/options.html?highlight=scenario#scenario>`_
220+ for a list of available arguments.
205221 get_smac_object_callback (Optional[Callable]):
206222 Callback function to create an object of class
207- [ smac.optimizer.smbo.SMBO]( https://automl.github.io/SMAC3/master/apidoc/smac.optimizer.smbo.html) .
223+ ` smac.optimizer.smbo.SMBO < https://automl.github.io/SMAC3/master/apidoc/smac.optimizer.smbo.html>`_ .
208224 The function must accept the arguments scenario_dict,
209225 instances, num_params, runhistory, seed and ta. This is
210226 an advanced feature. Use only if you are familiar with
211- [SMAC](https://automl.github.io/SMAC3/master/index.html).
212- all_supported_metrics (bool), (default=True):
213- if True, all metrics supporting current task will be calculated
227+ `SMAC <https://automl.github.io/SMAC3/master/index.html>`_.
228+ tae_func (Optional[Callable]):
229+ TargetAlgorithm to be optimised. If None, `eval_function`
230+ available in autoPyTorch/evaluation/train_evaluator is used.
231+ Must be child class of AbstractEvaluator.
232+ all_supported_metrics (bool: default=True):
233+ If True, all metrics supporting current task will be calculated
214234 for each pipeline and results will be available via cv_results
215- precision (int), (default=32): Numeric precision used when loading
216- ensemble data. Can be either '16', '32' or '64'.
235+ precision (int: default=32):
236+ Numeric precision used when loading ensemble data.
237+ Can be either '16', '32' or '64'.
217238 disable_file_output (Union[bool, List]):
218- load_models (bool), ( default=True):
239+ load_models (bool: default=True):
219240 Whether to load the models after fitting AutoPyTorch.
220- portfolio_selection (str), (default=None ):
241+ portfolio_selection (Optional[ str] ):
221242 This argument controls the initial configurations that
222243 AutoPyTorch uses to warm start SMAC for hyperparameter
223244 optimization. By default, no warm-starting happens.
224245 The user can provide a path to a json file containing
225246 configurations, similar to (...herepathtogreedy...).
226247 Additionally, the keyword 'greedy' is supported,
227248 which would use the default portfolio from
228- `AutoPyTorch Tabular <https://arxiv.org/abs/2006.13799>`
249+ `AutoPyTorch Tabular <https://arxiv.org/abs/2006.13799>`_.
250+
229251 Returns:
230252 self
231253
@@ -281,6 +303,16 @@ def predict(
281303 batch_size : Optional [int ] = None ,
282304 n_jobs : int = 1
283305 ) -> np .ndarray :
306+ """Generate the estimator predictions.
307+ Generate the predictions based on the given examples from the test set.
308+
309+ Args:
310+ X_test (np.ndarray):
311+ The test set examples.
312+
313+ Returns:
314+ Array with estimator predictions.
315+ """
284316 if self .InputValidator is None or not self .InputValidator ._is_fitted :
285317 raise ValueError ("predict() is only supported after calling search. Kindly call first "
286318 "the estimator fit() method." )
0 commit comments