Extracts metrics from a fitted table

R/ml_metrics.R

ml_metrics_regression

Description

The function works best when passed a tbl_spark created by ml_predict(). The output tbl_spark will contain the correct variable types and format that the given Spark model “evaluator” expects.

Usage

 
ml_metrics_regression( 
  x, 
  truth, 
  estimate = prediction, 
  metrics = c("rmse", "rsq", "mae"), 
  ... 
)

Arguments

Arguments	Description
x	A `tbl_spark` containing the estimate (prediction) and the truth (value of what actually happened)
truth	The name of the column from `x` that contains the value of what actually happened
estimate	The name of the column from `x` that contains the prediction. Defaults to `prediction`, since it is the default that `ml_predict()` uses.
metrics	A character vector with the metrics to calculate. For regression models the possible values are: `rmse` (Root mean squared error), `mse` (Mean squared error),`rsq` (R squared), `mae` (Mean absolute error), and `var` (Explained variance). Defaults to: `rmse`, `rsq`, `mae`
…	Optional arguments; currently unused.

Details

The ml_metrics family of functions implement Spark’s evaluate closer to how the yardstick package works. The functions expect a table containing the truth and estimate, and return a tibble with the results. The tibble has the same format and variable names as the output of the yardstick functions.

Examples

library(sparklyr)
 
sc <- spark_connect("local") 
tbl_iris <- copy_to(sc, iris) 
iris_split <- sdf_random_split(tbl_iris, training = 0.5, test = 0.5) 
training <- iris_split$training 
reg_formula <- "Sepal_Length ~ Sepal_Width + Petal_Length + Petal_Width" 
model <- ml_generalized_linear_regression(training, reg_formula) 
tbl_predictions <- ml_predict(model, iris_split$test) 
tbl_predictions %>% 
  ml_metrics_regression(Sepal_Length) 
#> # A tibble: 3 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 rmse    standard       0.313
#> 2 rsq     standard       0.863
#> 3 mae     standard       0.249