library(sparklyr)
<- spark_connect(master = "local")
sc <- sdf_copy_to(sc, mtcars, name = "mtcars_tbl", overwrite = TRUE)
mtcars_tbl
<- mtcars_tbl %>%
partitions sdf_random_split(training = 0.7, test = 0.3, seed = 1111)
<- partitions$training
mtcars_training <- partitions$test
mtcars_test
# for multiclass classification
<- mtcars_training %>%
rf_model ml_random_forest(cyl ~ ., type = "classification")
<- ml_predict(rf_model, mtcars_test)
pred
ml_multiclass_classification_evaluator(pred)
#> [1] 1
# for regression
<- mtcars_training %>%
rf_model ml_random_forest(cyl ~ ., type = "regression")
<- ml_predict(rf_model, mtcars_test)
pred
ml_regression_evaluator(pred, label_col = "cyl")
#> [1] 0.4444097
# for binary classification
<- mtcars_training %>%
rf_model ml_random_forest(am ~ gear + carb, type = "classification")
<- ml_predict(rf_model, mtcars_test)
pred
ml_binary_classification_evaluator(pred)
#> [1] 0.96875
Spark ML - Evaluators
R/ml_evaluation_prediction.R
ml_evaluator
Description
A set of functions to calculate performance metrics for prediction models. Also see the Spark ML Documentation https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.evaluation.package
Usage
ml_binary_classification_evaluator(
x, label_col = "label",
raw_prediction_col = "rawPrediction",
metric_name = "areaUnderROC",
uid = random_string("binary_classification_evaluator_"),
...
)
ml_binary_classification_eval(
x, label_col = "label",
prediction_col = "prediction",
metric_name = "areaUnderROC"
)
ml_multiclass_classification_evaluator(
x, label_col = "label",
prediction_col = "prediction",
metric_name = "f1",
uid = random_string("multiclass_classification_evaluator_"),
...
)
ml_classification_eval(
x, label_col = "label",
prediction_col = "prediction",
metric_name = "f1"
)
ml_regression_evaluator(
x, label_col = "label",
prediction_col = "prediction",
metric_name = "rmse",
uid = random_string("regression_evaluator_"),
... )
Arguments
Arguments | Description |
---|---|
x | A spark_connection object or a tbl_spark containing label and prediction columns. The latter should be the output of sdf_predict . |
label_col | Name of column string specifying which column contains the true labels or values. |
raw_prediction_col | Raw prediction (a.k.a. confidence) column name. |
metric_name | The performance metric. See details. |
uid | A character string used to uniquely identify the ML estimator. |
… | Optional arguments; currently unused. |
prediction_col | Name of the column that contains the predicted label or value NOT the scored probability. Column should be of type Double . |
Details
The following metrics are supported
Binary Classification:
areaUnderROC
(default) orareaUnderPR
(not available in Spark 2.X.)Multiclass Classification:
f1
(default),precision
,recall
,weightedPrecision
,weightedRecall
oraccuracy
; for Spark 2.X:f1
(default),weightedPrecision
,weightedRecall
oraccuracy
.Regression:
rmse
(root mean squared error, default),mse
(mean squared error),r2
, ormae
(mean absolute error.)ml_binary_classification_eval()
is an alias forml_binary_classification_evaluator()
for backwards compatibility.ml_classification_eval()
is an alias forml_multiclass_classification_evaluator()
for backwards compatibility.
Value
The calculated performance metric