Builds the woe dictionary of a set of predictor variables upon a given binary outcome. Convenient to make a woe version of the given set of predictor variables and also to allow one to tweak some woe values by hand.

dictionary(.data, outcome, ..., Laplace = 1e-06)

Arguments

.data

A tbl. The data.frame where the variables come from.

outcome

The bare name of the outcome variable with exactly 2 distinct values.

...

bare names of predictor variables or selectors accepted by dplyr::select().

Laplace

Default to 1e-6. The pseudocount parameter of the Laplace Smoothing estimator. Value to avoid -Inf/Inf from predictor category with only one outcome class. Set to 0 to allow Inf/-Inf.

Value

a tibble with summaries and woe for every given predictor variable stacked up.

Details

You can pass a custom dictionary to step_woe(). It must have the exactly the same structure of the output of dictionary(). One easy way to do this is by tweaking an output returned from it.

References

Kullback, S. (1959). Information Theory and Statistics. Wiley, New York.

Hastie, T., Tibshirani, R. and Friedman, J. (1986). Elements of Statistical Learning, Second Edition, Springer, 2009.

Good, I. J. (1985), "Weight of evidence: A brief survey", Bayesian Statistics, 2, pp.249-270.

Examples

mtcars %>% dictionary("am", cyl, gear:carb)
#> # A tibble: 12 x 9 #> variable predictor n_tot n_0 n_1 p_0 p_1 woe outcome #> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 cyl 4 11 3 8 0.158 0.615 1.36 am #> 2 cyl 6 7 4 3 0.211 0.231 0.0918 am #> 3 cyl 8 14 12 2 0.632 0.154 -1.41 am #> 4 gear 3 15 15 0 0.789 0 -16.1 am #> 5 gear 4 12 4 8 0.211 0.615 1.07 am #> 6 gear 5 5 0 5 0 0.385 15.8 am #> 7 carb 1 7 3 4 0.158 0.308 0.667 am #> 8 carb 2 10 6 4 0.316 0.308 -0.0260 am #> 9 carb 3 3 3 0 0.158 0 -14.5 am #> 10 carb 4 10 7 3 0.368 0.231 -0.468 am #> 11 carb 6 1 0 1 0 0.0769 14.2 am #> 12 carb 8 1 0 1 0 0.0769 14.2 am