Create a Pipeline Stage Object

R/ml_pipeline_utils.R

spark_pipeline_stage

Description

Helper function to create pipeline stage objects with common parameter setters.

Usage

spark_pipeline_stage( 
  sc, 
  class, 
  uid, 
  features_col = NULL, 
  label_col = NULL, 
  prediction_col = NULL, 
  probability_col = NULL, 
  raw_prediction_col = NULL, 
  k = NULL, 
  max_iter = NULL, 
  seed = NULL, 
  input_col = NULL, 
  input_cols = NULL, 
  output_col = NULL, 
  output_cols = NULL 
)

Arguments

Arguments	Description
sc	A `spark_connection` object.
class	Class name for the pipeline stage.
uid	A character string used to uniquely identify the ML estimator.
features_col	Features column name, as a length-one character vector. The column should be single vector column of numeric values. Usually this column is output by `ft_r_formula`.
label_col	Label column name. The column should be a numeric column. Usually this column is output by `ft_r_formula`.
prediction_col	Prediction column name.
probability_col	Column name for predicted class conditional probabilities.
raw_prediction_col	Raw prediction (a.k.a. confidence) column name.
k	The number of clusters to create
max_iter	The maximum number of iterations to use.
seed	A random seed. Set this value if you need your results to be reproducible across repeated calls.
input_col	The name of the input column.
input_cols	Names of output columns.
output_col	The name of the output column.
thresholds	Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value `p/t` is predicted, where `p` is the original probability of that class and `t` is the class’s threshold.