Package index
checkpoint_directory() spark_set_checkpoint_dir() spark_get_checkpoint_dir()
Set/Get Spark checkpoint directory
Collect Spark data serialized in RDS format into R
Compile Scala sources into a Java Archive (jar)
Read configuration values for a connection
Copy an R Data Frame to Spark
Distinct
Downloads default Scala Compilers
dplyr wrappers for Apache Spark higher order functions
Enforce Specific Structure for R Objects
Fill
Filter
Discover the Scala Compiler
Feature Transformation - Binarizer (Transformer)
Feature Transformation - Bucketizer (Transformer)
Feature Transformation - ChiSqSelector (Estimator)
ft_count_vectorizer() ml_vocabulary()
Feature Transformation - CountVectorizer (Estimator)
ft_dct() ft_discrete_cosine_transform()
Feature Transformation - Discrete Cosine Transform (DCT) (Transformer)
Feature Transformation - ElementwiseProduct (Transformer)
Feature Transformation - FeatureHasher (Transformer)
Feature Transformation - HashingTF (Transformer)
Feature Transformation - IDF (Estimator)
Feature Transformation - Imputer (Estimator)
Feature Transformation - IndexToString (Transformer)
Feature Transformation - Interaction (Transformer)
ft_lsh() ft_bucketed_random_projection_lsh() ft_minhash_lsh()
Feature Transformation - LSH (Estimator)
ft_lsh_utils() ml_approx_nearest_neighbors() ml_approx_similarity_join()
Utility functions for LSH models
Feature Transformation - MaxAbsScaler (Estimator)
Feature Transformation - MinMaxScaler (Estimator)
Feature Transformation - NGram (Transformer)
Feature Transformation - Normalizer (Transformer)
Feature Transformation - OneHotEncoder (Transformer)
ft_one_hot_encoder_estimator()
Feature Transformation - OneHotEncoderEstimator (Estimator)
Feature Transformation - PCA (Estimator)
Feature Transformation - PolynomialExpansion (Transformer)
Feature Transformation - QuantileDiscretizer (Estimator)
Feature Transformation - RFormula (Estimator)
Feature Transformation - RegexTokenizer (Transformer)
Feature Transformation - RobustScaler (Estimator)
Feature Transformation - StandardScaler (Estimator)
Feature Transformation - StopWordsRemover (Transformer)
ft_string_indexer() ml_labels() ft_string_indexer_model()
Feature Transformation - StringIndexer (Estimator)
Feature Transformation - Tokenizer (Transformer)
Feature Transformation - VectorAssembler (Transformer)
Feature Transformation - VectorIndexer (Estimator)
Feature Transformation - VectorSlicer (Transformer)
ft_word2vec() ml_find_synonyms()
Feature Transformation - Word2Vec (Estimator)
Full join
Generic Call Interface
get_spark_sql_catalog_implementation()
Retrieve the Spark connection’s SQL catalog implementation property
Infix operator for composing a lambda expression
Runtime configuration interface for Hive
Apply Aggregate Function to Array Column
Sorts array using a custom comparator
Determine Whether Some Element Exists in an Array Column
Filter Array Column
Checks whether all elements in an array satisfy a predicate
Filters a map
Merges two maps into one
Transform Array Column
Transforms keys of a map
Transforms values of a map
Combines 2 Array Columns
Inner join
invoke() invoke_static() invoke_new()
Invoke a Method on a JVM Object
j_invoke() j_invoke_static() j_invoke_new()
Invoke a Java function.
Instantiate a Java array with a specific element type.
Instantiate a Java float type.
Instantiate an Array[Float].
join.tbl_spark() inner_join.tbl_spark() left_join.tbl_spark() right_join.tbl_spark() full_join.tbl_spark()
Join Spark tbls.
Left join
list all sparklyr-*.jar files that have been built
Create a Spark Configuration for Livy
livy_service_start() livy_service_stop()
Start Livy
ml-params() ml_is_set() ml_param_map() ml_param() ml_params()
Spark ML - ML Params
ml-persistence() ml_save() ml_save.ml_model() ml_load()
Spark ML - Model Persistence
ml-transform-methods() is_ml_transformer() is_ml_estimator() ml_fit() ml_fit.default() ml_transform() ml_fit_and_transform() ml_predict() ml_predict.ml_model_classification()
Spark ML - Transform, fit, and predict methods (ml_ interface)
ml-tuning() ml_sub_models() ml_validation_metrics() ml_cross_validator() ml_train_validation_split()
Spark ML - Tuning
ml_aft_survival_regression() ml_survival_regression()
Spark ML - Survival Regression
Spark ML - ALS
ml_als_tidiers() tidy.ml_model_als() augment.ml_model_als() glance.ml_model_als()
Tidying methods for Spark ML ALS
Spark ML - Bisecting K-Means Clustering
Chi-square hypothesis testing for categorical data.
Spark ML - Clustering Evaluator
Compute correlation matrix
ml_decision_tree_classifier() ml_decision_tree() ml_decision_tree_regressor()
Spark ML - Decision Trees
Default stop words
ml_evaluate() ml_evaluate.ml_model_logistic_regression() ml_evaluate.ml_logistic_regression_model() ml_evaluate.ml_model_linear_regression() ml_evaluate.ml_linear_regression_model() ml_evaluate.ml_model_generalized_linear_regression() ml_evaluate.ml_generalized_linear_regression_model() ml_evaluate.ml_model_clustering() ml_evaluate.ml_model_classification() ml_evaluate.ml_evaluator()
Evaluate the Model on a Validation Set
ml_evaluator() ml_binary_classification_evaluator() ml_binary_classification_eval() ml_multiclass_classification_evaluator() ml_classification_eval() ml_regression_evaluator()
Spark ML - Evaluators
ml_feature_importances() ml_tree_feature_importance()
Spark ML - Feature Importance for Tree Models
ml_fpgrowth() ml_association_rules() ml_freq_itemsets()
Frequent Pattern Mining - FPGrowth
Spark ML - Gaussian Mixture clustering.
ml_generalized_linear_regression()
Spark ML - Generalized Linear Regression
ml_glm_tidiers() tidy.ml_model_generalized_linear_regression() tidy.ml_model_linear_regression() augment.ml_model_generalized_linear_regression() augment._ml_model_linear_regression() augment.ml_model_linear_regression() glance.ml_model_generalized_linear_regression() glance.ml_model_linear_regression()
Tidying methods for Spark ML linear models
ml_gbt_classifier() ml_gradient_boosted_trees() ml_gbt_regressor()
Spark ML - Gradient Boosted Trees
Spark ML - Isotonic Regression
ml_isotonic_regression_tidiers() tidy.ml_model_isotonic_regression() augment.ml_model_isotonic_regression() glance.ml_model_isotonic_regression()
Tidying methods for Spark ML Isotonic Regression
ml_kmeans() ml_compute_cost() ml_compute_silhouette_measure()
Spark ML - K-Means Clustering
Evaluate a K-mean clustering
ml_lda() ml_describe_topics() ml_log_likelihood() ml_log_perplexity() ml_topics_matrix()
Spark ML - Latent Dirichlet Allocation
ml_lda_tidiers() tidy.ml_model_lda() augment.ml_model_lda() glance.ml_model_lda()
Tidying methods for Spark ML LDA models
Spark ML - Linear Regression
Spark ML - LinearSVC
ml_linear_svc_tidiers() tidy.ml_model_linear_svc() augment.ml_model_linear_svc() glance.ml_model_linear_svc()
Tidying methods for Spark ML linear svc
Spark ML - Logistic Regression
ml_logistic_regression_tidiers() tidy.ml_model_logistic_regression() augment.ml_model_logistic_regression() augment._ml_model_logistic_regression() glance.ml_model_logistic_regression()
Tidying methods for Spark ML Logistic Regression
Extracts metrics from a fitted table
Extracts metrics from a fitted table
Extracts metrics from a fitted table
Extracts data associated with a Spark ML model
ml_multilayer_perceptron_classifier() ml_multilayer_perceptron()
Spark ML - Multilayer Perceptron
ml_multilayer_perceptron_tidiers() tidy.ml_model_multilayer_perceptron_classification() augment.ml_model_multilayer_perceptron_classification() glance.ml_model_multilayer_perceptron_classification()
Tidying methods for Spark ML MLP
Spark ML - Naive-Bayes
ml_naive_bayes_tidiers() tidy.ml_model_naive_bayes() augment.ml_model_naive_bayes() glance.ml_model_naive_bayes()
Tidying methods for Spark ML Naive Bayes
Spark ML - OneVsRest
ml_pca_tidiers() tidy.ml_model_pca() augment.ml_model_pca() glance.ml_model_pca()
Tidying methods for Spark ML Principal Component Analysis
Spark ML - Pipelines
Spark ML - Power Iteration Clustering
ml_prefixspan() ml_freq_seq_patterns()
Frequent Pattern Mining - PrefixSpan
ml_random_forest_classifier() ml_random_forest() ml_random_forest_regressor()
Spark ML - Random Forest
Spark ML - Pipeline stage extraction
Spark ML - Extraction of summary metrics
ml_survival_regression_tidiers() tidy.ml_model_aft_survival_regression() augment.ml_model_aft_survival_regression() glance.ml_model_aft_survival_regression()
Tidying methods for Spark ML Survival Regression
ml_tree_tidiers() tidy.ml_model_decision_tree_classification() tidy.ml_model_decision_tree_regression() augment.ml_model_decision_tree_classification() augment._ml_model_decision_tree_classification() augment.ml_model_decision_tree_regression() augment._ml_model_decision_tree_regression() glance.ml_model_decision_tree_classification() glance.ml_model_decision_tree_regression() tidy.ml_model_random_forest_classification() tidy.ml_model_random_forest_regression() augment.ml_model_random_forest_classification() augment._ml_model_random_forest_classification() augment.ml_model_random_forest_regression() augment._ml_model_random_forest_regression() glance.ml_model_random_forest_classification() glance.ml_model_random_forest_regression() tidy.ml_model_gbt_classification() tidy.ml_model_gbt_regression() augment.ml_model_gbt_classification() augment._ml_model_gbt_classification() augment.ml_model_gbt_regression() augment._ml_model_gbt_regression() glance.ml_model_gbt_classification() glance.ml_model_gbt_regression()
Tidying methods for Spark ML tree models
Spark ML - UID
ml_unsupervised_tidiers() tidy.ml_model_kmeans() augment.ml_model_kmeans() glance.ml_model_kmeans() tidy.ml_model_bisecting_kmeans() augment.ml_model_bisecting_kmeans() glance.ml_model_bisecting_kmeans() tidy.ml_model_gaussian_mixture() augment.ml_model_gaussian_mixture() glance.ml_model_gaussian_mixture()
Tidying methods for Spark ML unsupervised models
Mutate
Replace Missing Values in Objects
Nest
Pivot longer
Pivot wider
Random string generation
Reactive spark reader
Register a Parallel Backend
register_extension() registered_extensions()
Register a Package that Implements a Spark Extension
Replace NA
Right join
sdf-saveload() sdf_save_table() sdf_load_table() sdf_save_parquet() sdf_load_parquet()
Save / Load a Spark DataFrame
sdf-transform-methods() sdf_predict() sdf_transform() sdf_fit() sdf_fit_and_transform()
Spark ML - Transform, fit, and predict methods (sdf_ interface)
Create DataFrame for along Object
sdf_bind() sdf_bind_rows() sdf_bind_cols()
Bind multiple Spark DataFrames by row and column
Broadcast hint
Checkpoint a Spark DataFrame
Coalesces a Spark DataFrame
Collect a Spark DataFrame into R.
Copy an Object into Spark
Cross Tabulation
Debug Info for Spark DataFrame
Compute summary statistics for columns of a data frame
sdf_dim() sdf_nrow() sdf_ncol()
Support for Dimension Operations
Invoke distinct on a Spark DataFrame
Remove duplicates from a Spark DataFrame
Create a Spark dataframe containing all combinations of inputs
Convert column(s) from avro format
Spark DataFrame is Streaming
Returns the last index of a Spark DataFrame
Create DataFrame for Length
Gets number of partitions of a Spark DataFrame
Compute the number of records within each partition of a Spark DataFrame
Persist a Spark DataFrame
Pivot a Spark DataFrame
Project features onto principal components
Compute (Approximate) Quantiles with a Spark DataFrame
sdf_random_split() sdf_partition()
Partition a Spark Dataframe
Generate random samples from a Beta distribution
Generate random samples from a binomial distribution
Generate random samples from a Cauchy distribution
Generate random samples from a chi-squared distribution
Read a Column from a Spark DataFrame
Register a Spark DataFrame
Repartition a Spark DataFrame
sdf_residuals.ml_model_generalized_linear_regression() sdf_residuals.ml_model_linear_regression() sdf_residuals()
Model Residuals
Generate random samples from an exponential distribution
Generate random samples from a Gamma distribution
Generate random samples from a geometric distribution
Generate random samples from a hypergeometric distribution
Generate random samples from a log normal distribution
Generate random samples from the standard normal distribution
Generate random samples from a Poisson distribution
Generate random samples from a t-distribution
Generate random samples from the uniform distribution U(0, 1).
Generate random samples from a Weibull distribution.
Randomly Sample Rows from a Spark DataFrame
Read the Schema of a Spark DataFrame
Separate a Vector Column into Scalar Columns
Create DataFrame for Range
Sort a Spark DataFrame
Spark DataFrame from SQL
Convert column(s) to avro format
Unnest longer
Unnest wider
Perform Weighted Random Sampling on a Spark DataFrame
Add a Sequential ID Column to a Spark DataFrame
Add a Unique ID Column to a Spark DataFrame
Select
Separate
spark-api() spark_context() java_context() hive_context() spark_session()
Access the Spark API
spark-connections() spark_connect() spark_connection_is_open() spark_disconnect() spark_disconnect_all() spark_submit()
Manage Spark Connections
spark_adaptive_query_execution()
Retrieves or sets status of Spark AQE
spark_advisory_shuffle_partition_size()
Retrieves or sets advisory size of the shuffle partition
Apply an R Function in Spark
Create Bundle for Spark Apply
Log Writer for Spark Apply
spark_auto_broadcast_join_threshold()
Retrieves or sets the auto broadcast join threshold
spark_coalesce_initial_num_partitions()
Retrieves or sets initial number of shuffle partitions before coalescing
spark_coalesce_min_num_partitions()
Retrieves or sets the minimum number of shuffle partitions after coalescing
spark_coalesce_shuffle_partitions()
Retrieves or sets whether coalescing contiguous shuffle partitions is enabled
Define a Spark Compilation Specification
Read Spark Configuration
Kubernetes Configuration
Retrieve Available Settings
Runtime configuration interface for the Spark Session
Function that negotiates the connection with the Spark back-end
spark_connection class
Retrieve the Spark Connection Associated with an R Object
Find Spark Connection
Runtime configuration interface for the Spark Context.
Retrieve a Spark DataFrame
spark_default_compilation_spec()
Default Compilation Specification for Spark Extensions
Define a Spark dependency
Fallback to Spark Dependency
Create Spark Extension
Set the SPARK_HOME environment variable
spark_ide_connection_open() spark_ide_connection_closed() spark_ide_connection_updated() spark_ide_connection_actions() spark_ide_objects() spark_ide_columns() spark_ide_preview()
Set of functions to provide integration with the RStudio IDE
Inserts a Spark DataFrame into a Spark table
spark_install() spark_uninstall() spark_install_dir() spark_install_tar() spark_installed_versions() spark_available_versions()
Download and install various versions of Spark
It lets the package know if it should test a particular functionality or not
spark_jobj class
Retrieve a Spark JVM Object Reference
Surfaces the last error from Spark captured by internal spark_error
function
Reads from a Spark Table into a Spark DataFrame.
View Entries in the Spark Log
Read file(s) into a Spark DataFrame using a custom reader
Read Apache Avro data into a Spark DataFrame.
Read binary data into a Spark DataFrame.
Read a CSV file into a Spark DataFrame
Read from Delta Lake into a Spark DataFrame.
Read image data into a Spark DataFrame.
Read from JDBC connection into a Spark DataFrame.
Read a JSON file into a Spark DataFrame
Read libsvm file into a Spark DataFrame.
Read a ORC file into a Spark DataFrame
Read a Parquet file into a Spark DataFrame
Read from a generic source into a Spark DataFrame.
Reads from a Spark Table into a Spark DataFrame.
Read a Text file into a Spark DataFrame
Saves a Spark DataFrame as a Spark table
Generate random samples from some distribution
Generate a Table Name from Expression
Get the Spark Version Associated with a Spark Connection
Get the Spark Version Associated with a Spark Installation
Open the Spark web interface
Write Spark DataFrame to file using a custom writer
Serialize a Spark DataFrame into Apache Avro format
Write a Spark DataFrame to a CSV
Writes a Spark DataFrame into Delta Lake
Writes a Spark DataFrame into a JDBC table
Write a Spark DataFrame to a JSON file
Write a Spark DataFrame to a ORC file
Write a Spark DataFrame to a Parquet file
Write Spark DataFrame to RDS files
Writes a Spark DataFrame into a generic source
Writes a Spark DataFrame into a Spark table
Write a Spark DataFrame to a Text file
Return the port number of a sparklyr
backend.
ft_sql_transformer() ft_dplyr_transformer()
Feature Transformation - SQLTransformer
Show database list
Find Stream
Generate Test Stream
Spark Stream’s Identifier
Apply lag function to columns of a Spark Streaming DataFrame
Spark Stream’s Name
stream_read_csv() stream_read_text() stream_read_json() stream_read_parquet() stream_read_orc() stream_read_kafka() stream_read_socket() stream_read_delta() stream_read_cloudfiles() stream_read_table()
Read files created by the stream
Render Stream
Stream Statistics
Stops a Spark Stream
Spark Stream Continuous Trigger
Spark Stream Interval Trigger
View Stream
Watermark Stream
stream_write_csv() stream_write_text() stream_write_json() stream_write_parquet() stream_write_orc() stream_write_kafka() stream_write_console() stream_write_delta()
Write files to the stream
Write Memory Stream
Write Stream to Table
Subsetting operator for Spark dataframe
Cache a Spark Table
Use specific database
Uncache a Spark Table
transform a subset of column(s) in a Spark Dataframe
Unite
Unnest