Builds the woe dictionary of a set of predictor variables upon a given binary outcome. Convenient to make a woe version of the given set of predictor variables and also to allow one to tweak some woe values by hand.
Arguments
- .data
A tbl. The data.frame where the variables come from.
- outcome
The bare name of the outcome variable with exactly 2 distinct values.
- ...
bare names of predictor variables or selectors accepted by
dplyr::select()
.- Laplace
Default to 1e-6. The
pseudocount
parameter of the Laplace Smoothing estimator. Value to avoid -Inf/Inf from predictor category with only one outcome class. Set to 0 to allow Inf/-Inf.
Details
You can pass a custom dictionary to step_woe()
. It must have the
exactly the same structure of the output of dictionary()
. One easy way to
do this is by tweaking an output returned from it.
References
Kullback, S. (1959). Information Theory and Statistics. Wiley, New York.
Hastie, T., Tibshirani, R. and Friedman, J. (1986). Elements of Statistical Learning, Second Edition, Springer, 2009.
Good, I. J. (1985), "Weight of evidence: A brief survey", Bayesian Statistics, 2, pp.249-270.
Examples
mtcars %>% dictionary("am", cyl, gear:carb)
#> # A tibble: 12 × 9
#> variable predictor n_tot n_0 n_1 p_0 p_1 woe outcome
#> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 cyl 4 11 3 8 0.158 0.615 1.36 am
#> 2 cyl 6 7 4 3 0.211 0.231 0.0918 am
#> 3 cyl 8 14 12 2 0.632 0.154 -1.41 am
#> 4 gear 3 15 15 0 0.789 0 -16.1 am
#> 5 gear 4 12 4 8 0.211 0.615 1.07 am
#> 6 gear 5 5 0 5 0 0.385 15.8 am
#> 7 carb 1 7 3 4 0.158 0.308 0.667 am
#> 8 carb 2 10 6 4 0.316 0.308 -0.0260 am
#> 9 carb 3 3 3 0 0.158 0 -14.5 am
#> 10 carb 4 10 7 3 0.368 0.231 -0.468 am
#> 11 carb 6 1 0 1 0 0.0769 14.2 am
#> 12 carb 8 1 0 1 0 0.0769 14.2 am