Compute correlation matrix

R/ml_stat.R

ml_corr

Description

Compute correlation matrix

Usage

 
ml_corr(x, columns = NULL, method = c("pearson", "spearman")) 

Arguments

Arguments Description
x A tbl_spark.
columns The names of the columns to calculate correlations of. If only one column is specified, it must be a vector column (for example, assembled using ft_vector_assember()).
method The method to use, either "pearson" or "spearman".

Value

A correlation matrix organized as a data frame.

Examples

library(sparklyr)
 
sc <- spark_connect(master = "local") 
iris_tbl <- sdf_copy_to(sc, iris, name = "iris_tbl", overwrite = TRUE) 
 
features <- c("Petal_Width", "Petal_Length", "Sepal_Length", "Sepal_Width") 
 
ml_corr(iris_tbl, columns = features, method = "pearson") 
#> New names:
#> • `` -> `...1`
#> • `` -> `...2`
#> • `` -> `...3`
#> • `` -> `...4`
#> # A tibble: 4 × 4
#>   Petal_Width Petal_Length Sepal_Length Sepal_Width
#>         <dbl>        <dbl>        <dbl>       <dbl>
#> 1       1            0.963        0.818      -0.366
#> 2       0.963        1            0.872      -0.428
#> 3       0.818        0.872        1          -0.118
#> 4      -0.366       -0.428       -0.118       1