Compute (Approximate) Quantiles with a Spark DataFrame
R/sdf_interface.R
sdf_quantile
Description
Given a numeric column within a Spark DataFrame, compute approximate quantiles.
Usage
sdf_quantile(
x,
column, probabilities = c(0, 0.25, 0.5, 0.75, 1),
relative.error = 1e-05,
weight.column = NULL
)
Arguments
Arguments | Description |
---|---|
x | A spark_connection , ml_pipeline , or a tbl_spark . |
column | The column(s) for which quantiles should be computed. Multiple columns are only supported in Spark 2.0+. |
probabilities | A numeric vector of probabilities, for which quantiles should be computed. |
relative.error | The maximal possible difference between the actual percentile of a result and its expected percentile (e.g., if relative.error is 0.01 and probabilities is 0.95, then any value between the 94th and 96th percentile will be considered an acceptable approximation). |
weight.column | If not NULL, then a generalized version of the Greenwald- Khanna algorithm will be run to compute weighted percentiles, with each sample from column having a relative weight specified by the corresponding value in weight.column . The weights can be considered as relative frequencies of sample data points. |