library(sparklyr)
<- spark_connect(master = "local[3]")
sc # copy some test data into a Spark Dataframe
<- sdf_copy_to(sc, iris, overwrite = TRUE)
sdf # create a writer function
<- function(df, path) {
writer write.csv(df, path)
} spark_write(
sdf,
writer, # re-partition sdf into 3 partitions and write them to 3 separate files
paths = list("file:///tmp/file1", "file:///tmp/file2", "file:///tmp/file3"),
) #> list()
spark_write(
sdf,
writer, # save all rows into a single file
paths = list("file:///tmp/all_rows")
) #> list()
Write Spark DataFrame to file using a custom writer
R/data_interface.R
spark_write
Description
Run a custom R function on Spark worker to write a Spark DataFrame into file(s). If Spark’s speculative execution feature is enabled (i.e., spark.speculation
is true), then each write task may be executed more than once and the user-defined writer function will need to ensure no concurrent writes happen to the same file path (e.g., by appending UUID to each file name).
Usage
spark_write(x, writer, paths, packages = NULL)
Arguments
Arguments | Description |
---|---|
x | A Spark Dataframe to be saved into file(s) |
writer | A writer function with the signature function(partition, path) where partition is a R dataframe containing all rows from one partition of the original Spark Dataframe x and path is a string specifying the file to write partition to |
paths | A single destination path or a list of destination paths, each one specifying a location for a partition from x to be written to. If number of partition(s) in x is not equal to length(paths) then x will be re-partitioned to contain length(paths) partition(s) |
packages | Boolean to distribute .libPaths() packages to each node, a list of packages to distribute, or a package bundle created with |