davidrsch
diff --git a/‎vignettes/getting-started.Rmd‎
Lines changed: 105 additions & 104 deletions b/‎vignettes/getting-started.Rmd‎
Lines changed: 105 additions & 104 deletions
diff --git a/‎vignettes/images/MNIST.png‎
16 KB b/‎vignettes/images/MNIST.png‎
16 KB
@@ -29,7 +29,7 @@ This new function behaves just like any other `parsnip` model (e.g., `rand_fores
 
 You can install the development version of `kerasnip` from GitHub. You will also need `keras3` and a backend (like TensorFlow).
 
-``` r
+```{r}
 # install.packages("pak")
 pak::pak("davidrsch/kerasnip")
 pak::pak("rstudio/keras3")
@@ -48,7 +48,15 @@ library(keras3)
 
 ## A `kerasnip` MNIST Example
 
-Let's replicate the standard Keras introductory example, an MLP on the MNIST dataset, but using the `kerasnip` workflow. This will show how to translate a standard Keras model into a reusable, modular `parsnip` specification.
+Let’s replicate the classic Keras introductory example, training a simple MLP on the MNIST dataset, but using the `kerasnip` workflow. This will demonstrate how to translate a standard Keras model into a reusable, modular `parsnip` specification.
+
+If you’re familiar with Keras, you’ll recognize the structure; if not, this is a perfect place to start. We’ll begin by learning the basics through a simple task: recognizing handwritten digits from the MNIST dataset.
+
+The MNIST dataset contains 28×28 pixel grayscale images of handwritten digits, like these:
+
+![MINIST](images/MNIST.png){fig-alt="A picture showing grayscale images of handwritten digits (5, 0, 4 and 1)"}
+
+Each image comes with a label indicating which digit it represents. For example, the labels for the images above might be 5, 0, 4, and 1.
 
 ### Preparing the Data
 
@@ -79,20 +87,47 @@ train_df <- data.frame(x = I(x_train), y = y_train_factor)
 test_df <- data.frame(x = I(x_test), y = y_test_factor)
 ```
 
+### The Standard Keras Approach (for comparison)
+
+Before diving into the `kerasnip` workflow, let's quickly look at how this same model is built using standard `keras3` code. This will help highlight the different approach `kerasnip` enables.
+
+```{r keras-standard, eval=FALSE, echo=TRUE, results='hide'}
+# The standard Keras3 approach
+model <- keras_model_sequential(input_shape = 784) |>
+  layer_dense(units = 256, activation = "relu") |>
+  layer_dropout(rate = 0.4) |>
+  layer_dense(units = 128, activation = "relu") |>
+  layer_dropout(rate = 0.3) |>
+  layer_dense(units = 10, activation = "softmax")
+
+summary(model)
+
+model |>
+  compile(
+    loss = "categorical_crossentropy",
+    optimizer = optimizer_rmsprop(),
+    metrics = "accuracy"
+  )
+
+# The model would then be trained with model |> fit(...)
+```
+
+The code above is imperative: you define each layer and add it to the model step-by-step. Now, let's see how `kerasnip` approaches this by defining reusable components for a declarative, `tidymodels`-friendly workflow.
+
 ### Defining the Model with Reusable Blocks
 
 The original Keras example interleaves `layer_dense()` and `layer_dropout()`. With `kerasnip`, we can encapsulate this pattern into a single, reusable block. This makes the overall architecture cleaner and more modular.
 
 ```{r define-blocks}
 # An input block to initialize the model.
+# The 'model' argument is supplied implicitly by the kerasnip backend.
 mlp_input_block <- function(model, input_shape) {
   keras_model_sequential(input_shape = input_shape)
 }
 
 # A reusable "module" that combines a dense layer and a dropout layer.
-# This pattern can now be repeated easily.
-# default values for parameters must be set
-dense_dropout_block <- function(model, units=128, rate=0.1) {
+# All arguments that should be tunable need a default value.
+dense_dropout_block <- function(model, units = 128, rate = 0.1) {
   model |>
     layer_dense(units = units, activation = "relu") |>
     layer_dropout(rate = rate)
@@ -121,7 +156,9 @@ create_keras_sequential_spec(
 
 ### Building and Fitting the Model
 
-We can now use our new `mnist_mlp()` function. To replicate the `keras3` example, we want to repeat our `hidden` block twice with different parameters. `kerasnip` makes this easy: we set `num_hidden = 2` and pass vectors for the `hidden_units` and `hidden_rate` arguments. `kerasnip` will supply the first value to the first instance of the block, the second value to the second instance, and so on.
+We can now use our new `mnist_mlp()` function. Notice how its arguments, such as `hidden_1_units` and `hidden_1_rate`, were automatically generated by `kerasnip`. The names are created by combining the name of the layer block (e.g., `hidden_1`) with the arguments of that block's function (e.g., `units`, `rate`).
+
+To replicate the `keras3` example, we'll use both `hidden` blocks and provide their parameters.
 
 ```{r use-spec}
 mlp_spec <- mnist_mlp(
@@ -140,36 +177,64 @@ mlp_spec <- mnist_mlp(
 
 # Fit the model
 mlp_fit <- fit(mlp_spec, y ~ x, data = train_df)
-keras_model <- mlp_fit$fit$fit
-training_history <- mlp_fit$fit$history
 ```
 
 ```{r model-summarize}
-summary(keras_model)
+mlp_fit |> 
+  extract_keras_summary()
 ```
 
 ```{r model-plot}
-plot(keras_model, show_shapes = TRUE)
+mlp_fit |> 
+  extract_keras_summary() |> 
+  plot(show_shapes = TRUE)
 ```
 
 ```{r model-fit-history}
-plot(training_history)
+mlp_fit |> 
+  extract_keras_history() |> 
+  plot()
 ```
 
-Evaluate the model’s performance on the test data: Evaluate method missing
+### Evaluating Model Performance
+
+The `keras_evaluate()` function provides a straightforward way to assess the model's performance on a test set, using the underlying `keras3::evaluate()` method. It returns the loss and any other metrics that were specified during the model compilation step.
 
 ```{r model-evaluate}
-# keras_model |> evaluate(x_test, y_test)
+mlp_fit |> keras_evaluate(x_test, y_test)
+```
+
+### Making Predictions
+
+Once the model is trained, we can use the standard `tidymodels` `predict()` function to generate predictions on new data. By default, `predict()` on a `parsnip` classification model returns the predicted class labels.
+
+```{r model-predict-class}
+# Predict the class for the first 5 images in the test set 
+class_preds <- mlp_fit |>
+  predict(new_data = head(test_df))
+class_preds
 ```
 
-Generate predictions on new data:
+To get the underlying probabilities for each class, we can set `type = "prob"`. This returns a tibble with a probability column for each of the 10 classes (0-9).
 
-```{r model-predict}
-probs <- keras_model |> predict(x_test)
+```{r model-predict-prob}
+# Predict probabilities for the first 5 images
+prob_preds <- mlp_fit |> predict(new_data = head(test_df), type = "prob")
+prob_preds
 ```
 
-```{r show-predictions}
-max.col(probs) - 1L
+We can then compare the predicted class to the actual class for these images to see how the model is performing.
+
+```{r model-predict-compare}
+# Combine predictions with actuals for comparison
+comparison <- bind_cols(
+  class_preds,
+  prob_preds
+) |>
+  bind_cols(
+    head(test_df[, "y", drop = FALSE])
+  )
+comparison
 ```
 
 ## Example 2: Tuning the Model Architecture
@@ -180,11 +245,12 @@ Using the `mnist_mlp` spec we just created, let's define a tunable model.
 
 ```{r tune-spec-mnist}
 # Define a tunable specification
+# We set num_hidden_2 = 0 to disable the second hidden block for this tuning example
 tune_spec <- mnist_mlp(
   num_hidden_1 = tune(),
   hidden_1_units = tune(),
   hidden_1_rate = tune(),
-  num_hidden2 = 0,
+  num_hidden_2 = 0,
   compile_loss = "categorical_crossentropy",
   compile_optimizer = optimizer_rmsprop(),
   compile_metrics = c("accuracy"),
@@ -231,97 +297,32 @@ Finally, we can inspect the results to find which architecture performed the bes
 show_best(tune_res, metric = "accuracy")
 ```
 
-Now, let's visualize the top 5 models from the tuning results in detail.
-
-```{r extract-top-models}
-# Get the top 5 results to iterate through
-top_5_results <- show_best(tune_res, metric = "accuracy") |> 
-  select(all_of(names(grid)), .config)
-
-finalize_fit_tops <- function(parameters, workflow) {
-  finalize_workflow(x = workflow, parameters = parameters) |>
-    fit(train_df)
-}
-
-fited_tops <- 1:5 |>
-  map(\(x) finalize_fit_tops(parameters = top_5_results[x,], tune_wf))
+Now that we've identified the best-performing hyperparameters, our final step is to create and train the final model. We use `select_best()` to get the top parameters, `finalize_workflow()` to update our workflow with them, and then `fit()` one last time on our full training dataset.
 
-get_models <- function(fited_model){
-  fited_model$fit$fit$fit$fit
-}
+```{r finalize-best-model}
+# Select the best hyperparameters
+best_hps <- select_best(tune_res, metric = "accuracy")
 
-models <- fited_tops |> map(get_models)
+# Finalize the workflow with the best hyperparameters
+final_wf <- finalize_workflow(tune_wf, best_hps)
 
-get_fit_histories <- function(fited_model){
-  fited_model$fit$fit$fit$history
-}
-
-fit_histories <- fited_tops |> map(get_fit_histories)
-
-summary(models[[1]])
-plot(models[[1]], show_shapes = TRUE)
-plot(fit_histories[[1]])
+# Fit the final model on the full training data
+final_fit <- fit(final_wf, data = train_df)
 ```
 
-### Top 5 Model Summaries
-
-```{r tops-summary, results='asis'}
-# Loop through each model and print its summary
-for (i in 1:length(models)) {
-  if (i == 1) {
-    cat("::: {.grid}")
-  } else if (i%%2 == 0) {
-    cat("::: {.grid}")
-    cat("::: {.g-col-6}")
-  } else {
-    cat("::: {.g-col-6}")
-  }
-  cat(paste0("\n\n#### Rank ", i, " Model Summary\n\n"))
-  capture.output(summary(models[[i]])) |> cat(sep="\n")
-  if (i == 1 || i%%2 != 0) {
-    cat(":::")
-  } else {
-    cat(":::")
-    cat(":::")
-  }
-}
-```
-
-### Top 5 Model Architectures
-
-```{r tops-models-plot, fig.height=20}
-# Use par(mfrow) to create a grid for the base plots
-mat <- matrix(
-  c(1, 1, 2, 3, 4, 5),
-  nrow = 3,
-  ncol = 2,
-  byrow = TRUE
-)
-
-layout(mat = mat)
-
-for (i in 1:length(models)) {
-  plot(models[[i]], show_shapes = TRUE)
-  title(paste0("Rank ", i, " Model"))
-}
-```
-
-### Top 5 Training Histories
-
-```{r tops-models-fit-history, fig.height=12}
-# The history plots are ggplots, so we use patchwork to combine them
-library(patchwork)
-
-design <- "A#
-BC
-DE"
+We can now inspect our final, tuned model.
 
-plot_list <- purrr::map(1:length(fit_histories), \(i) {
-  plot(fit_histories[[i]]) + labs(title = paste("Rank", i, "History"))
-})
+```{r inspect-final-model}
+# Print the model summary
+final_fit |>
+  extract_fit_parsnip() |>
+  extract_keras_summary()
 
-# Combine all plots into a single image
-wrap_plots(plot_list, design = design)
+# Plot the training history
+final_fit |> 
+  extract_fit_parsnip() |>
+  extract_keras_history() |>
+  plot()
 ```
 
-This result shows that `tune` has tested various network depths (`num_hidden`), widths (`hidden_units`), and dropout rates, successfully finding the best-performing combination within the search space. This demonstrates how `kerasnip` integrates complex architectural tuning directly into the standard `tidymodels` framework.
+This result shows that `tune` has tested various network depths, widths, and dropout rates, successfully finding the best-performing combination within the search space. By using `kerasnip`, we were able to integrate this complex architectural tuning directly into a standard `tidymodels` workflow.