`step_discretize_cart`

creates a *specification* of a recipe step that will
discretize numeric data (e.g. integers or doubles) into bins in a
supervised way using a CART model.

step_discretize_cart( recipe, ..., role = NA, trained = FALSE, outcome = NULL, cost_complexity = 0.01, tree_depth = 10, min_n = 20, rules = NULL, skip = FALSE, id = rand_id("discretize_cart") ) # S3 method for step_discretize_cart tidy(x, ...)

recipe | A recipe object. The step will be added to the sequence of operations for this recipe. |
---|---|

... | One or more selector functions to choose which variables are
affected by the step. See |

role | Defaults to |

trained | A logical to indicate if the quantities for preprocessing have been estimated. |

outcome | A call to |

cost_complexity | The regularization parameter. Any split that does not
decrease the overall lack of fit by a factor of |

tree_depth | The |

min_n | The number of data points in a node required to continue
splitting. Corresponds to |

rules | The splitting rules of the best CART tree to retain for each variable. If length zero, splitting could not be used on that column. |

skip | A logical. Should the step be skipped when the
recipe is baked by |

id | A character string that is unique to this step to identify it. |

x | A |

An updated version of `recipe`

with the new step added to the
sequence of existing steps (if any).

`step_discretize_cart()`

creates non-uniform bins from numerical
variables by utilizing the information about the outcome variable and
applying a CART model.

The best selection of buckets for each variable is selected using the standard cost-complexity pruning of CART, which makes this discretization method resistant to overfitting.

This step requires the rpart package. If not installed, the step will stop with a note about installing the package.

Note that the original data will be replaced with the new bins.

library(modeldata) data(ad_data) library(rsample) split <- initial_split(ad_data, strata = "Class") ad_data_tr <- training(split) ad_data_te <- testing(split) cart_rec <- recipe(Class ~ ., data = ad_data_tr) %>% step_discretize_cart(tau, age, p_tau, Ab_42, outcome = "Class", id = "cart splits") cart_rec <- prep(cart_rec, training = ad_data_tr)#> Warning: `step_discretize_cart()` failed to find any meaningful splits for predictor 'age', which will not be binned.#> # A tibble: 16 x 3 #> terms values id #> <chr> <dbl> <chr> #> 1 tau 6.15 cart splits #> 2 tau 6.25 cart splits #> 3 tau 6.32 cart splits #> 4 tau 6.42 cart splits #> 5 tau 6.66 cart splits #> 6 p_tau 3.90 cart splits #> 7 p_tau 4.36 cart splits #> 8 p_tau 4.40 cart splits #> 9 p_tau 4.49 cart splits #> 10 p_tau 4.54 cart splits #> 11 p_tau 4.62 cart splits #> 12 Ab_42 9.98 cart splits #> 13 Ab_42 10.3 cart splits #> 14 Ab_42 11.1 cart splits #> 15 Ab_42 11.2 cart splits #> 16 Ab_42 11.3 cart splitsbake(cart_rec, ad_data_te, tau)#> # A tibble: 82 x 1 #> tau #> <fct> #> 1 [6.147,6.25) #> 2 [6.418,6.661) #> 3 [-Inf,6.147) #> 4 [-Inf,6.147) #> 5 [6.147,6.25) #> 6 [-Inf,6.147) #> 7 [6.25,6.322) #> 8 [6.661, Inf] #> 9 [-Inf,6.147) #> 10 [-Inf,6.147) #> # … with 72 more rows