Batch Processing of Monotonic Binning
This article is originally published at https://statcompute.wordpress.com
In my GitHub repository (https://github.com/statcompute/MonotonicBinning), multiple R functions have been developed to implement the monotonic binning by using either iterative discretization or isotonic regression. With these functions, we can run the monotonic binning for one independent variable at a time. However, in a real-world production environment, we often would want to apply the binning algorithm to hundreds or thousands of variables at once. In addition, we might be interested in comparing different binning outcomes.
The function batch_bin() is designed to apply a monotonic binning function to all numeric variables in a data frame with the last column as the dependent variable. Currently, four binning algorithms are supported, including qtl_bin() and bad_bin() by iterative discretizations, iso_bin() by isotonic regression, and gbm_bin() by generalized boosted model. Before using these four functions, we need to save related R files in the working folder, which would be sourced by the batch_bin() function. Scripts for R functions can be downloaded from https://github.com/statcompute/MonotonicBinning/tree/master/code.
Below is the demonstrating showing how to use the batch_bin() function, which only requires two input parameters, a data frame and an integer number indicating the binning method. With method = 1, the batch_bin() function implements the iterative discretization by quantiles. With method = 4, the batch_bin() function implements the generalized boosted modelling. As shown below, both KS and IV with method = 4 are higher than with method = 1 due to more granular bins. For instance, while the method = 1 only generates 2 bins, the method = 4 can generate 11 bins.
Please visit source website for post related comments.