diff --git a/_pkgdown.yml b/_pkgdown.yml index 4e3e7a1..1219684 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -21,6 +21,7 @@ guides: - functional_api - workflows_sequential - workflows_functional + - tuning_fit_compile_args # examples: @@ -88,6 +89,9 @@ navbar: href: articles/workflows_sequential.html - text: "Functional API" href: articles/workflows_functional.html + - text: "Tuning" + - text: "Tuning Fit and Compile Arguments" + href: articles/tuning_fit_compile_args.html github: icon: fa-github href: https://github.com/davidrsch/kerasnip diff --git a/vignettes/tuning_fit_compile_args.Rmd b/vignettes/tuning_fit_compile_args.Rmd new file mode 100644 index 0000000..1432d37 --- /dev/null +++ b/vignettes/tuning_fit_compile_args.Rmd @@ -0,0 +1,194 @@ +--- +title: "Tuning Fit and Compile Arguments" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Tuning Fit and Compile Arguments} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + eval = reticulate::py_module_available("keras") +) +# Suppress verbose Keras output for the vignette +options(keras.fit_verbose = 0) +set.seed(123) +``` + +## Introduction + +While `kerasnip` makes it easy to tune the architecture of a Keras model (e.g., the number of layers or the number of units in a layer), it is often just as important to tune the parameters that control the training process itself. `kerasnip` exposes these parameters through special `fit_*` and `compile_*` arguments in the model specification. + +This vignette provides a comprehensive example of how to tune these arguments within a `tidymodels` workflow. We will tune: + +* **`fit_epochs`**: The number of training epochs. +* **`fit_batch_size`**: The number of samples per gradient update. +* **`compile_optimizer`**: The optimization algorithm (e.g., "adam", "sgd"). +* **`compile_loss`**: The loss function used for training. +* **`learn_rate`**: The learning rate for the optimizer. + +## Setup + +First, we load the necessary packages. + +```{r load-packages} +library(kerasnip) +library(tidymodels) +library(keras3) +``` + +## Data Preparation + +We will use the classic `iris` dataset for this example. It's a simple, small dataset, which is ideal for demonstrating the tuning process without long training times. + +```{r data-prep} +# Split data into training and testing sets +set.seed(123) +iris_split <- initial_split(iris, prop = 0.8, strata = Species) +iris_train <- training(iris_split) +iris_test <- testing(iris_split) + +# Create cross-validation folds for tuning +iris_folds <- vfold_cv(iris_train, v = 3, strata = Species) +``` + +## Define a `kerasnip` Model + +We'll create a very simple sequential model with a single dense layer. This keeps the focus on tuning the `fit_*` and `compile_*` arguments rather than the model architecture. + +```{r define-kerasnip-model} +# Define layer blocks +input_block <- function(model, input_shape) { + keras_model_sequential(input_shape = input_shape) +} +dense_block <- function(model, units = 10) { + model |> layer_dense(units = units, activation = "relu") +} +output_block <- function(model, num_classes) { + model |> layer_dense(units = num_classes, activation = "softmax") +} + +# Create the kerasnip model specification function +create_keras_sequential_spec( + model_name = "iris_mlp", + layer_blocks = list( + input = input_block, + dense = dense_block, + output = output_block + ), + mode = "classification" +) +``` + +## Define the Tunable Specification + +Now, we create an instance of our `iris_mlp` model. We set the arguments we want to optimize to `tune()`. + +```{r define-tune-spec} +# Define the tunable model specification +tune_spec <- iris_mlp( + dense_units = 16, # Keep architecture fixed for this example + fit_epochs = tune(), + fit_batch_size = tune(), + compile_optimizer = tune(), + compile_loss = tune(), + learn_rate = tune() +) |> + set_engine("keras") + +print(tune_spec) +``` + +## Create Workflow and Tuning Grid + +Next, we create a `workflow` and define the search space for our hyperparameters using `dials`. `kerasnip` provides special `dials` parameter functions for `optimizer` and `loss`. + +```{r create-workflow-grid} +# Create a simple recipe +iris_recipe <- recipe(Species ~ ., data = iris_train) |> + step_normalize(all_numeric_predictors()) + +# Create the workflow +tune_wf <- workflow() |> + add_recipe(iris_recipe) |> + add_model(tune_spec) + +# Define the tuning grid +params <- extract_parameter_set_dials(tune_wf) |> + update( + fit_epochs = epochs(c(10, 30)), + fit_batch_size = batch_size(c(16, 64), trans = NULL), + compile_optimizer = optimizer_function(values = c("adam", "sgd", "rmsprop")), + compile_loss = loss_function_keras(values = c("categorical_crossentropy", "kl_divergence")), + learn_rate = learn_rate(c(0.001, 0.01), trans = NULL) + ) + +set.seed(456) +tuning_grid <- grid_regular(params, levels = 2) + +tuning_grid +``` + +## Tune the Model + +With the workflow and grid defined, we can now run the hyperparameter tuning using `tune_grid()`. + +```{r tune-model, cache=TRUE} +tune_res <- tune_grid( + tune_wf, + resamples = iris_folds, + grid = tuning_grid, + metrics = metric_set(accuracy, roc_auc), + control = control_grid(save_pred = FALSE, save_workflow = TRUE, verbose = FALSE) +) +``` + +## Inspect the Results + +Let's examine the results to see how the different combinations of fitting and compilation parameters performed. + +```{r inspect-results} +# Show the best performing models based on accuracy +show_best(tune_res, metric = "accuracy") + +# Plot the results +autoplot(tune_res) + theme_minimal() + +# Select the best hyperparameters +best_params <- select_best(tune_res, metric = "accuracy") +print(best_params) +``` + +The results show that `tune` has successfully explored different optimizers, loss functions, learning rates, epochs, and batch sizes, identifying the combination that yields the best accuracy. + +## Finalize and Fit + +Finally, we finalize our workflow with the best-performing hyperparameters and fit the model one last time on the full training dataset. + +```{r finalize-fit} +# Finalize the workflow +final_wf <- finalize_workflow(tune_wf, best_params) + +# Fit the final model +final_fit <- fit(final_wf, data = iris_train) + +print(final_fit) +``` + +We can now use this `final_fit` object to make predictions on the test set. + +```{r predict} +# Make predictions +predictions <- predict(final_fit, new_data = iris_test) + +# Evaluate performance +bind_cols(predictions, iris_test) |> + accuracy(truth = Species, estimate = .pred_class) +``` + +## Conclusion + +This vignette demonstrated how to tune the crucial `fit_*` and `compile_*` arguments of a Keras model within the `tidymodels` framework using `kerasnip`. By exposing these as tunable parameters, `kerasnip` gives you full control over the training process, allowing you to optimize not just the model's architecture, but also how it learns.