Create a Pipeline Stage Object
R/ml_pipeline_utils.R
spark_pipeline_stage
Description
Helper function to create pipeline stage objects with common parameter setters.
Usage
spark_pipeline_stage(
sc,
class,
uid, features_col = NULL,
label_col = NULL,
prediction_col = NULL,
probability_col = NULL,
raw_prediction_col = NULL,
k = NULL,
max_iter = NULL,
seed = NULL,
input_col = NULL,
input_cols = NULL,
output_col = NULL,
output_cols = NULL
)
Arguments
Arguments | Description |
---|---|
sc | A spark_connection object. |
class | Class name for the pipeline stage. |
uid | A character string used to uniquely identify the ML estimator. |
features_col | Features column name, as a length-one character vector. The column should be single vector column of numeric values. Usually this column is output by ft_r_formula . |
label_col | Label column name. The column should be a numeric column. Usually this column is output by ft_r_formula . |
prediction_col | Prediction column name. |
probability_col | Column name for predicted class conditional probabilities. |
raw_prediction_col | Raw prediction (a.k.a. confidence) column name. |
k | The number of clusters to create |
max_iter | The maximum number of iterations to use. |
seed | A random seed. Set this value if you need your results to be reproducible across repeated calls. |
input_col | The name of the input column. |
input_cols | Names of output columns. |
output_col | The name of the output column. |
thresholds | Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold. |