`step_lencode_glm`

creates a *specification* of a recipe step that
will convert a nominal (i.e. factor) predictor into a single set of
scores derived from a generalized linear model.

step_lencode_glm(
recipe,
...,
role = NA,
trained = FALSE,
outcome = NULL,
mapping = NULL,
skip = FALSE,
id = rand_id("lencode_glm")
)
# S3 method for step_lencode_glm
tidy(x, ...)

## Arguments

recipe |
A recipe object. The step will be added to the
sequence of operations for this recipe. |

... |
One or more selector functions to choose variables.
For `step_lencode_glm` , this indicates the variables to be encoded
into a numeric format. See `recipes::selections()` for more details. For
the `tidy` method, these are not currently used. |

role |
Not used by this step since no new variables are
created. |

trained |
A logical to indicate if the quantities for
preprocessing have been estimated. |

outcome |
A call to `vars` to specify which variable is
used as the outcome in the generalized linear model. Only
numeric and two-level factors are currently supported. |

mapping |
A list of tibble results that define the
encoding. This is `NULL` until the step is trained by
`recipes::prep.recipe()` . |

skip |
A logical. Should the step be skipped when the
recipe is baked by `recipes::bake.recipe()` ? While all operations are baked
when `recipes::prep.recipe()` is run, some operations may not be able to be
conducted on new data (e.g. processing the outcome variable(s)).
Care should be taken when using `skip = TRUE` as it may affect
the computations for subsequent operations |

id |
A character string that is unique to this step to identify it. |

x |
A `step_lencode_glm` object. |

## Value

An updated version of `recipe`

with the new step added
to the sequence of existing steps (if any). For the `tidy`

method, a tibble with columns `terms`

(the selectors or
variables for encoding), `level`

(the factor levels), and
`value`

(the encodings).

## Details

For each factor predictor, a generalized linear model
is fit to the outcome and the coefficients are returned as the
encoding. These coefficients are on the linear predictor scale
so, for factor outcomes, they are in log-odds units. The
coefficients are created using a no intercept model and, when
two factor outcomes are used, the log-odds reflect the event of
interest being the *first* level of the factor.

For novel levels, a slightly timmed average of the coefficients
is returned.

## References

Micci-Barreca D (2001) "A preprocessing scheme for
high-cardinality categorical attributes in classification and
prediction problems," ACM SIGKDD Explorations Newsletter, 3(1),
27-32.

Zumel N and Mount J (2017) "vtreat: a data.frame Processor for
Predictive Modeling," arXiv:1611.09477

## Examples