library(sparklyr)
<- spark_connect(master = "local")
sc <- sdf_copy_to(sc, iris, name = "iris_tbl", overwrite = TRUE)
iris_tbl
<- c("Petal_Width", "Petal_Length", "Sepal_Length", "Sepal_Width")
features
ml_corr(iris_tbl, columns = features, method = "pearson")
#> New names:
#> • `` -> `...1`
#> • `` -> `...2`
#> • `` -> `...3`
#> • `` -> `...4`
#> # A tibble: 4 × 4
#> Petal_Width Petal_Length Sepal_Length Sepal_Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.963 0.818 -0.366
#> 2 0.963 1 0.872 -0.428
#> 3 0.818 0.872 1 -0.118
#> 4 -0.366 -0.428 -0.118 1
Compute correlation matrix
R/ml_stat.R
ml_corr
Description
Compute correlation matrix
Usage
ml_corr(x, columns = NULL, method = c("pearson", "spearman"))
Arguments
Arguments | Description |
---|---|
x | A tbl_spark . |
columns | The names of the columns to calculate correlations of. If only one column is specified, it must be a vector column (for example, assembled using ft_vector_assember() ). |
method | The method to use, either "pearson" or "spearman" . |
Value
A correlation matrix organized as a data frame.