Utility functions for LSH models

R/ml_feature_lsh_utils.R

ft_lsh_utils

Description

Utility functions for LSH models

Usage

ml_approx_nearest_neighbors( 
  model, 
  dataset, 
  key, 
  num_nearest_neighbors, 
  dist_col = "distCol" 
) 

ml_approx_similarity_join( 
  model, 
  dataset_a, 
  dataset_b, 
  threshold, 
  dist_col = "distCol" 
) 

Arguments

Arguments Description
model A fitted LSH model, returned by either ft_minhash_lsh()
or ft_bucketed_random_projection_lsh().
dataset The dataset to search for nearest neighbors of the key.
key Feature vector representing the item to search for.
num_nearest_neighbors The maximum number of nearest neighbors.
dist_col Output column for storing the distance between each result row and the key.
dataset_a One of the datasets to join.
dataset_b Another dataset to join.
threshold The threshold for the distance of row pairs.