DALEX is designed to explore and explain the behaviour of Machine Learning
methods. This function creates a DALEX explainer (see DALEX::explain()
), which can then be queried
by multiple function to create explanations of the model.
Usage
explain_tidysdm(
model,
data,
y,
predict_function,
predict_function_target_column,
residual_function,
...,
label,
verbose,
precalculate,
colorize,
model_info,
type,
by_workflow
)
# Default S3 method
explain_tidysdm(
model,
data = NULL,
y = NULL,
predict_function = NULL,
predict_function_target_column = NULL,
residual_function = NULL,
...,
label = NULL,
verbose = TRUE,
precalculate = TRUE,
colorize = !isTRUE(getOption("knitr.in.progress")),
model_info = NULL,
type = "classification",
by_workflow = FALSE
)
# S3 method for class 'simple_ensemble'
explain_tidysdm(
model,
data = NULL,
y = NULL,
predict_function = NULL,
predict_function_target_column = NULL,
residual_function = NULL,
...,
label = NULL,
verbose = TRUE,
precalculate = TRUE,
colorize = !isTRUE(getOption("knitr.in.progress")),
model_info = NULL,
type = "classification",
by_workflow = FALSE
)
# S3 method for class 'repeat_ensemble'
explain_tidysdm(
model,
data = NULL,
y = NULL,
predict_function = NULL,
predict_function_target_column = NULL,
residual_function = NULL,
...,
label = NULL,
verbose = TRUE,
precalculate = TRUE,
colorize = !isTRUE(getOption("knitr.in.progress")),
model_info = NULL,
type = "classification",
by_workflow = FALSE
)
Arguments
- model
object - a model to be explained
- data
data.frame or matrix - data which will be used to calculate the explanations. If not provided, then it will be extracted from the model. Data should be passed without a target column (this shall be provided as the
y
argument). NOTE: If the target variable is present in thedata
, some of the functionalities may not work properly.- y
numeric vector with outputs/scores. If provided, then it shall have the same size as
data
- predict_function
function that takes two arguments: model and new data and returns a numeric vector with predictions. By default it is
yhat
.- predict_function_target_column
Character or numeric containing either column name or column number in the model prediction object of the class that should be considered as positive (i.e. the class that is associated with probability 1). If NULL, the second column of the output will be taken for binary classification. For a multiclass classification setting, that parameter cause switch to binary classification mode with one vs others probabilities.
- residual_function
function that takes four arguments: model, data, target vector y and predict function (optionally). It should return a numeric vector with model residuals for given data. If not provided, response residuals (\(y-\hat{y}\)) are calculated. By default it is
residual_function_default
.- ...
other parameters
- label
character - the name of the model. By default it's extracted from the 'class' attribute of the model
- verbose
logical. If TRUE (default) then diagnostic messages will be printed
- precalculate
logical. If TRUE (default) then
predicted_values
andresidual
are calculated when explainer is created. This will happen also ifverbose
is TRUE. Set bothverbose
andprecalculate
to FALSE to omit calculations.- colorize
logical. If TRUE (default) then
WARNINGS
,ERRORS
andNOTES
are colorized. Will work only in the R console. Now by default it isFALSE
while knitting andTRUE
otherwise.- model_info
a named list (
package
,version
,type
) containing information about model. IfNULL
,DALEX
will seek for information on it's own.- type
type of a model, either
classification
orregression
. If not specified thentype
will be extracted frommodel_info
.- by_workflow
boolean determining whether a list of explainer, one per model, should be returned instead of a single explainer for the ensemble
Value
explainer object DALEX::explain
ready to work with DALEX
Examples
# \donttest{
# using the whole ensemble
lacerta_explainer <- explain_tidysdm(tidysdm::lacerta_ensemble)
#> Preparation of a new explainer is initiated
#> -> model label : data.frame ( default )
#> -> data : 448 rows 4 cols
#> -> data : tibble converted into a data.frame
#> -> target variable : 448 values
#> -> predict function : predict_function
#> -> predicted values : No value for predict function target column. ( default )
#> -> model_info : package tidysdm , ver. 0.9.6.9005 , task classification ( default )
#> -> model_info : type set to classification
#> -> predicted values : numerical, min = 0.02638562 , mean = 0.2951438 , max = 0.7778588
#> -> residual function : difference between y and yhat ( default )
#> -> residuals : numerical, min = -0.6797394 , mean = -0.04514377 , max = 0.7720678
#> A new explainer has been created!
# by workflow
explainer_list <- explain_tidysdm(tidysdm::lacerta_ensemble,
by_workflow = TRUE
)
#> Preparation of a new explainer is initiated
#> -> model label : default_glm
#> -> data : 448 rows 4 cols
#> -> data : tibble converted into a data.frame
#> -> target variable : 448 values
#> -> predict function : yhat.workflow will be used ( default )
#> -> predicted values : No value for predict function target column. ( default )
#> -> model_info : package tidymodels , ver. 1.2.0 , task classification ( default )
#> -> model_info : type set to classification
#> -> predicted values : numerical, min = 0.2554356 , mean = 0.75 , max = 0.9838188
#> -> residual function : difference between y and yhat ( default )
#> -> residuals : numerical, min = -0.9314151 , mean = 9.574021e-12 , max = 0.7445644
#> A new explainer has been created!
#> Preparation of a new explainer is initiated
#> -> model label : default_rf
#> -> data : 448 rows 4 cols
#> -> data : tibble converted into a data.frame
#> -> target variable : 448 values
#> -> predict function : yhat.workflow will be used ( default )
#> -> predicted values : No value for predict function target column. ( default )
#> -> model_info : package tidymodels , ver. 1.2.0 , task classification ( default )
#> -> model_info : type set to classification
#> -> predicted values : numerical, min = 0.06715952 , mean = 0.7496213 , max = 1
#> -> residual function : difference between y and yhat ( default )
#> -> residuals : numerical, min = -0.6303111 , mean = 0.0003787291 , max = 0.5929444
#> A new explainer has been created!
#> Preparation of a new explainer is initiated
#> -> model label : default_gbm
#> -> data : 448 rows 4 cols
#> -> data : tibble converted into a data.frame
#> -> target variable : 448 values
#> -> predict function : yhat.workflow will be used ( default )
#> -> predicted values : No value for predict function target column. ( default )
#> -> model_info : package tidymodels , ver. 1.2.0 , task classification ( default )
#> -> model_info : type set to classification
#> -> predicted values : numerical, min = 0.2205058 , mean = 0.7324831 , max = 0.9634812
#> -> residual function : difference between y and yhat ( default )
#> -> residuals : numerical, min = -0.8709313 , mean = 0.01751694 , max = 0.6855955
#> A new explainer has been created!
#> Preparation of a new explainer is initiated
#> -> model label : default_maxent
#> -> data : 448 rows 4 cols
#> -> data : tibble converted into a data.frame
#> -> target variable : 448 values
#> -> predict function : yhat.workflow will be used ( default )
#> -> predicted values : No value for predict function target column. ( default )
#> -> model_info : package tidymodels , ver. 1.2.0 , task classification ( default )
#> -> model_info : type set to classification
#> -> predicted values : numerical, min = 0.03884931 , mean = 0.5873206 , max = 0.9584494
#> -> residual function : difference between y and yhat ( default )
#> -> residuals : numerical, min = -0.8338183 , mean = 0.1626794 , max = 0.9611507
#> A new explainer has been created!
# }