Perform Tidymodels grid search tuning inside Spark
tune_grid_spark
Description
Perform Tidymodels grid search tuning inside Spark
Usage
tune_grid_spark(
sc,
object,
preprocessor,
resamples,
...,
param_info = NULL,
grid = 10,
metrics = NULL,
eval_time = NULL,
control = NULL,
no_tasks = NULL
)Arguments
| Arguments | Description |
|---|---|
| sc | A Spark Connection |
| object | A parsnip model specification or an unfitted workflow(). No tuning parameters are allowed; if arguments have been marked with tune(), their values must be finalized. |
| preprocessor | A traditional model formula or a recipe created using recipes::recipe(). |
| resamples | An rset resampling object created from an rsample function, such as rsample::vfold_cv(). |
| … | Not currently used. |
| param_info | A dials::parameters() object or NULL. If none is given, a parameters set is derived from other arguments. Passing this argument can be useful when parameter ranges need to be customized. |
| grid | A data frame of tuning combinations or a positive integer. The data frame should have columns for each parameter being tuned and rows for tuning parameter candidates. An integer denotes the number of candidate parameter sets to be created automatically. |
| metrics | A yardstick::metric_set(), or NULL to compute a standard set of metrics. |
| eval_time | A numeric vector of time points where dynamic event time metrics should be computed (e.g. the time-dependent ROC curve, etc). The values must be non-negative and should probably be no greater than the largest event time in the training set (See Details below). |
| control | An object used to modify the tuning process, likely created by control_grid(). |
| no_tasks | Number of parallel tasks (jobs) to request Spark |
Details
The parsnip model, or the unfitted workflow(), the pre-processor and the re-samples are uploaded to the Spark cluster as R objects. The Spark cluster runs each tuning combinations in parallel using R, Tidymodels and any other modeling packages used for the given workflow. The results are then downloaded back to the local R session.
Value
tune_results object