References for this project

The initial idea for this project came from the medium post “Approximate-Predictions” Make Feature Selection Radically Faster from from https://medium.com/@mazzanti.sam..

This lead me into new ideas of potential usage (even abuse) of shapley values for machine learning models.

In this project I tried to implement the ideas left by Samuelle in the above post, and tried to make it easy an accessible to be used on a MLproject. Also, I tried to increase my DS skills creating a python package and also in some way to give-back to the open-source community what I have been receiving from them over the years.

The main premise about the user of this package is that you know how to create you ML models and do some cleaning of the data previously. Please don't throw garbage into any automatic process and expect that roses will come at the end of it.

The second big premise is that you are acquainted with shapley values and/or getting a shapley values table from some calibration data and respective predictive model.

I will add more resources (if I can) here to help you out:

Shapley values

SHAP package used for calculate shapley values shap package
Details about the Shapley Tree Explainer shap tree explainer
Shapley value reference book Interpretable Machine Learning from Christoph Molnar (https://christophmolnar.com)
This interesting book about Shapley values Interpreting Machine Learning Models With SHAP also from Christoph Molnar

Shap Visualizations

P-values

About creating p-values, please take a look at this chapter from the book Feature Engineering and Selection: A Practical Approach for Predictive Models from Max Kuhn and Kjell Johnson