|
8 | 8 | "\n", |
9 | 9 | "Kernel `Python 3 (Data Science)` works well with this notebook.\n", |
10 | 10 | "\n", |
11 | | - "_This notebook was created and tested on an ml.m5.large notebook instance._\n", |
| 11 | + "_This notebook was created and tested on an ml.m5.xlarge notebook instance._\n", |
12 | 12 | "\n", |
13 | 13 | "## Table of Contents\n", |
14 | 14 | "\n", |
|
101 | 101 | "source": [ |
102 | 102 | "import shap\n", |
103 | 103 | "\n", |
104 | | - "from kernel_explainer_wrapper import KernelExplainerWrapper\n", |
| 104 | + "from shap import KernelExplainer\n", |
105 | 105 | "from shap import sample\n", |
106 | | - "from shap.common import LogitLink, IdentityLink\n", |
107 | 106 | "from scipy.special import expit\n", |
108 | 107 | "\n", |
109 | 108 | "# Initialize plugin to make plots interactive.\n", |
|
235 | 234 | "metadata": {}, |
236 | 235 | "outputs": [], |
237 | 236 | "source": [ |
238 | | - "churn_data = pd.read_csv('./Data sets/churn.txt')\n", |
| 237 | + "churn_data = pd.read_csv('../Data sets/churn.txt')\n", |
239 | 238 | "data_without_target = churn_data.drop(columns=['Churn?'])\n", |
240 | 239 | "\n", |
241 | 240 | "background_data = sample(data_without_target, 50)" |
|
252 | 251 | "cell_type": "markdown", |
253 | 252 | "metadata": {}, |
254 | 253 | "source": [ |
255 | | - "Next, we create the `KernelExplainer`. Note that since it's a black box explainer, `KernelExplainer` only requires a handle to the predict (or predict_proba) function and does not require any other information about the model. For classification it is recommended to derive feature importance scores in the log-odds space since additivity is a more natural assumption there thus we use `LogitLink`. For regression `IdentityLink` should be used." |
| 254 | + "Next, we create the `KernelExplainer`. Note that since it's a black box explainer, `KernelExplainer` only requires a handle to the\n", |
| 255 | + "predict (or predict_proba) function and does not require any other information about the model. For classification it is recommended to\n", |
| 256 | + "derive feature importance scores in the log-odds space since additivity is a more natural assumption there thus we use `logit`. For\n", |
| 257 | + "regression `identity` should be used." |
256 | 258 | ] |
257 | 259 | }, |
258 | 260 | { |
|
263 | 265 | "source": [ |
264 | 266 | "# Derive link function \n", |
265 | 267 | "problem_type = automl_job.describe_auto_ml_job(job_name=automl_job_name)['ResolvedAttributes']['ProblemType'] \n", |
266 | | - "link_fn = IdentityLink if problem_type == 'Regression' else LogitLink \n", |
| 268 | + "link = \"identity\" if problem_type == 'Regression' else \"logit\"\n", |
267 | 269 | "\n", |
268 | | - "# the handle to predict_proba is passed to KernelExplainerWrapper since KernelSHAP requires the class probability\n", |
269 | | - "explainer = KernelExplainerWrapper(automl_estimator.predict_proba, background_data, link=link_fn())" |
| 270 | + "# the handle to predict_proba is passed to KernelExplainer since KernelSHAP requires the class probability\n", |
| 271 | + "explainer = KernelExplainer(automl_estimator.predict_proba, background_data, link=link)" |
270 | 272 | ] |
271 | 273 | }, |
272 | 274 | { |
273 | 275 | "cell_type": "markdown", |
274 | 276 | "metadata": {}, |
275 | 277 | "source": [ |
276 | | - "Currently, `shap.KernelExplainer` only supports numeric data. A version of SHAP that supports text will become available soon. A workaround is provided by our wrapper `KernelExplainerWrapper`. Once a new version of SHAP is released, `shap.KernelExplainer` should be used instead of `KernelExplainerWrapper`.\n", |
277 | 278 | "\n", |
278 | 279 | "By analyzing the background data `KernelExplainer` provides us with `explainer.expected_value` which is the model prediction with all features missing. Considering a customer for which we have no data at all (i.e. all features are missing) this should theoretically be the model prediction." |
279 | 280 | ] |
|
326 | 327 | "outputs": [], |
327 | 328 | "source": [ |
328 | 329 | "# Since shap_values are provided in the log-odds space, we convert them back to the probability space by using LogitLink\n", |
329 | | - "shap.force_plot(explainer.expected_value, shap_values, x, link=link_fn())" |
| 330 | + "shap.force_plot(explainer.expected_value, shap_values, x, link=link)" |
330 | 331 | ] |
331 | 332 | }, |
332 | 333 | { |
|
348 | 349 | "source": [ |
349 | 350 | "with ManagedEndpoint(ep_name) as mep:\n", |
350 | 351 | " shap_values = explainer.shap_values(x, nsamples='auto', l1_reg='num_features(5)')\n", |
351 | | - "shap.force_plot(explainer.expected_value, shap_values, x, link=link_fn())" |
| 352 | + "shap.force_plot(explainer.expected_value, shap_values, x, link=link)" |
352 | 353 | ] |
353 | 354 | }, |
354 | 355 | { |
|
396 | 397 | "metadata": {}, |
397 | 398 | "outputs": [], |
398 | 399 | "source": [ |
399 | | - "shap.force_plot(explainer.expected_value, shap_values, X, link=link_fn())" |
| 400 | + "shap.force_plot(explainer.expected_value, shap_values, X, link=link)" |
400 | 401 | ] |
401 | 402 | }, |
402 | 403 | { |
|
0 commit comments