DEV Community

Cover image for Quick tip: Using $euclideanDistance with SingleStore Kai for MongoDB
Akmal Chaudhri for SingleStore

Posted on • Edited on

Quick tip: Using $euclideanDistance with SingleStore Kai for MongoDB

Abstract

The announcement of SingleStore Kai for MongoDB provides exciting opportunities for turbocharged JSON analytics. Two vector functions are available in the preview release - $euclideanDistance and $dotProduct. In this short article, we'll evaluate $euclideanDistance using some example data from a previous article.

The notebook file used in this article is available on GitHub.

Introduction

SingleStoreDB supports a range of vector functions. In a previous article, we used the EUCLIDEAN_DISTANCE and JSON_ARRAY_PACK functions. In another previous article, we used the DOT_PRODUCT and UNHEX functions. In this short article, we'll use $euclideanDistance from SingleStore Kai for MongoDB.

Create a SingleStoreDB Cloud account

A previous article showed the steps to create a free SingleStoreDB Cloud account. We'll use the following settings:

  • Workspace Group Name: Iris Demo Group
  • Cloud Provider: AWS
  • Region: US East 1 (N. Virginia)
  • Workspace Name: iris-demo
  • Size: S-00
  • Advanced Settings:
    • SingleStore Kai for MongoDB selected
    • MarTech Application deselected

New notebook

A previous article showed the steps to create a new notebook.

We'll call the notebook kai_demo, and select a Blank notebook template from the available options.

Fill out the notebook

Create Table

We'll use the SQL code from a GitHub Gist for our table, as follows:

CREATE DATABASE IF NOT EXISTS iris_db; USE iris_db; DROP TABLE IF EXISTS iris; CREATE TABLE IF NOT EXISTS iris ( vector BLOB, species VARCHAR(20) ); 
Enter fullscreen mode Exit fullscreen mode

Load Data

We'll now load the data into the table, as follows:

USE iris_db; INSERT INTO iris VALUES (JSON_ARRAY_PACK('[5.1,3.5,1.4,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.9,3,1.4,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.7,3.2,1.3,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.6,3.1,1.5,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5,3.6,1.4,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.4,3.9,1.7,0.4]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.6,3.4,1.4,0.3]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5,3.4,1.5,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.4,2.9,1.4,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.9,3.1,1.5,0.1]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.4,3.7,1.5,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.8,3.4,1.6,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.8,3,1.4,0.1]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.3,3,1.1,0.1]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.8,4,1.2,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.7,4.4,1.5,0.4]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.4,3.9,1.3,0.4]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.1,3.5,1.4,0.3]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.7,3.8,1.7,0.3]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.1,3.8,1.5,0.3]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.4,3.4,1.7,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.1,3.7,1.5,0.4]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.6,3.6,1,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.1,3.3,1.7,0.5]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.8,3.4,1.9,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5,3,1.6,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5,3.4,1.6,0.4]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.2,3.5,1.5,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.2,3.4,1.4,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.7,3.2,1.6,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.8,3.1,1.6,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.4,3.4,1.5,0.4]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.2,4.1,1.5,0.1]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.5,4.2,1.4,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.9,3.1,1.5,0.1]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5,3.2,1.2,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.5,3.5,1.3,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.9,3.1,1.5,0.1]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.4,3,1.3,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.1,3.4,1.5,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5,3.5,1.3,0.3]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.5,2.3,1.3,0.3]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.4,3.2,1.3,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5,3.5,1.6,0.6]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.1,3.8,1.9,0.4]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.8,3,1.4,0.3]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.1,3.8,1.6,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[4.6,3.2,1.4,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5.3,3.7,1.5,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[5,3.3,1.4,0.2]'),'Iris-setosa'), (JSON_ARRAY_PACK('[7,3.2,4.7,1.4]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.4,3.2,4.5,1.5]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.9,3.1,4.9,1.5]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.5,2.3,4,1.3]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.5,2.8,4.6,1.5]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.7,2.8,4.5,1.3]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.3,3.3,4.7,1.6]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[4.9,2.4,3.3,1]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.6,2.9,4.6,1.3]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.2,2.7,3.9,1.4]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5,2,3.5,1]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.9,3,4.2,1.5]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6,2.2,4,1]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.1,2.9,4.7,1.4]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.6,2.9,3.6,1.3]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.7,3.1,4.4,1.4]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.6,3,4.5,1.5]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.8,2.7,4.1,1]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.2,2.2,4.5,1.5]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.6,2.5,3.9,1.1]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.9,3.2,4.8,1.8]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.1,2.8,4,1.3]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.3,2.5,4.9,1.5]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.1,2.8,4.7,1.2]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.4,2.9,4.3,1.3]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.6,3,4.4,1.4]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.8,2.8,4.8,1.4]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.7,3,5,1.7]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6,2.9,4.5,1.5]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.7,2.6,3.5,1]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.5,2.4,3.8,1.1]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.5,2.4,3.7,1]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.8,2.7,3.9,1.2]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6,2.7,5.1,1.6]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.4,3,4.5,1.5]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6,3.4,4.5,1.6]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.7,3.1,4.7,1.5]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.3,2.3,4.4,1.3]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.6,3,4.1,1.3]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.5,2.5,4,1.3]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.5,2.6,4.4,1.2]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.1,3,4.6,1.4]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.8,2.6,4,1.2]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5,2.3,3.3,1]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.6,2.7,4.2,1.3]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.7,3,4.2,1.2]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.7,2.9,4.2,1.3]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.2,2.9,4.3,1.3]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.1,2.5,3,1.1]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[5.7,2.8,4.1,1.3]'),'Iris-versicolor'), (JSON_ARRAY_PACK('[6.3,3.3,6,2.5]'),'Iris-virginica'), (JSON_ARRAY_PACK('[5.8,2.7,5.1,1.9]'),'Iris-virginica'), (JSON_ARRAY_PACK('[7.1,3,5.9,2.1]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.3,2.9,5.6,1.8]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.5,3,5.8,2.2]'),'Iris-virginica'), (JSON_ARRAY_PACK('[7.6,3,6.6,2.1]'),'Iris-virginica'), (JSON_ARRAY_PACK('[4.9,2.5,4.5,1.7]'),'Iris-virginica'), (JSON_ARRAY_PACK('[7.3,2.9,6.3,1.8]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.7,2.5,5.8,1.8]'),'Iris-virginica'), (JSON_ARRAY_PACK('[7.2,3.6,6.1,2.5]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.5,3.2,5.1,2]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.4,2.7,5.3,1.9]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.8,3,5.5,2.1]'),'Iris-virginica'), (JSON_ARRAY_PACK('[5.7,2.5,5,2]'),'Iris-virginica'), (JSON_ARRAY_PACK('[5.8,2.8,5.1,2.4]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.4,3.2,5.3,2.3]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.5,3,5.5,1.8]'),'Iris-virginica'), (JSON_ARRAY_PACK('[7.7,3.8,6.7,2.2]'),'Iris-virginica'), (JSON_ARRAY_PACK('[7.7,2.6,6.9,2.3]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6,2.2,5,1.5]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.9,3.2,5.7,2.3]'),'Iris-virginica'), (JSON_ARRAY_PACK('[5.6,2.8,4.9,2]'),'Iris-virginica'), (JSON_ARRAY_PACK('[7.7,2.8,6.7,2]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.3,2.7,4.9,1.8]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.7,3.3,5.7,2.1]'),'Iris-virginica'), (JSON_ARRAY_PACK('[7.2,3.2,6,1.8]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.2,2.8,4.8,1.8]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.1,3,4.9,1.8]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.4,2.8,5.6,2.1]'),'Iris-virginica'), (JSON_ARRAY_PACK('[7.2,3,5.8,1.6]'),'Iris-virginica'), (JSON_ARRAY_PACK('[7.4,2.8,6.1,1.9]'),'Iris-virginica'), (JSON_ARRAY_PACK('[7.9,3.8,6.4,2]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.4,2.8,5.6,2.2]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.3,2.8,5.1,1.5]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.1,2.6,5.6,1.4]'),'Iris-virginica'), (JSON_ARRAY_PACK('[7.7,3,6.1,2.3]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.3,3.4,5.6,2.4]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.4,3.1,5.5,1.8]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6,3,4.8,1.8]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.9,3.1,5.4,2.1]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.7,3.1,5.6,2.4]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.9,3.1,5.1,2.3]'),'Iris-virginica'), (JSON_ARRAY_PACK('[5.8,2.7,5.1,1.9]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.8,3.2,5.9,2.3]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.7,3.3,5.7,2.5]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.7,3,5.2,2.3]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.3,2.5,5,1.9]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.5,3,5.2,2]'),'Iris-virginica'), (JSON_ARRAY_PACK('[6.2,3.4,5.4,2.3]'),'Iris-virginica'), (JSON_ARRAY_PACK('[5.9,3,5.1,1.8]'),'Iris-virginica'); 
Enter fullscreen mode Exit fullscreen mode

Install Libraries

We'll install a library that we'll use later:

!pip install tabulate --quiet 
Enter fullscreen mode Exit fullscreen mode

Import Libraries

Next, we'll import some libraries, as follows:

import pymongo import struct from pymongo import MongoClient from tabulate import tabulate 
Enter fullscreen mode Exit fullscreen mode

Connect to SingleStore Kai

We'll now connect to our system, as follows:

client = MongoClient(connection_url_kai) 
Enter fullscreen mode Exit fullscreen mode

We'll switch to the Iris database and list the collections, as follows:

db = client["iris_db"] for coll in db.list_collection_names(): print(coll) 
Enter fullscreen mode Exit fullscreen mode

The output should be as follows:

iris 
Enter fullscreen mode Exit fullscreen mode

Example Queries

Query 1

Here is the first SQL query we used in a previous article:

USE iris_db; SELECT species FROM iris WHERE EUCLIDEAN_DISTANCE(vector, JSON_ARRAY_PACK('[5.9,3,5.1,1.8]')) = 0; 
Enter fullscreen mode Exit fullscreen mode

The result was:

+----------------+ | species | +----------------+ | Iris-virginica | +----------------+ 
Enter fullscreen mode Exit fullscreen mode

Here is one solution using SingleStore Kai:

vector = [5.1, 3.5, 1.4, 0.2] vector_bytes = struct.pack('f' * len(vector), *vector) query = { "$expr": { "$eq": [ { "$euclideanDistance": ["$vector", vector_bytes] }, 0 ] } } projection = { "species": 1 } document = db.iris.find_one(query, projection) species = document["species"] print(species) 
Enter fullscreen mode Exit fullscreen mode

Since the vector we used was stored in the database and we are looking for a single match, the result should be:

Iris-virginica 
Enter fullscreen mode Exit fullscreen mode

Query 2

Here is the second SQL query we used in a previous article, looking for other nearby flowers:

USE iris_db; SELECT EUCLIDEAN_DISTANCE(vector, JSON_ARRAY_PACK('[5.9,3,5.1,1.8]')) AS euclidean_distance, species FROM iris ORDER BY euclidean_distance LIMIT 5; 
Enter fullscreen mode Exit fullscreen mode

The result was:

+---------------------+----------------+ | euclidean_distance | species | +---------------------+----------------+ | 0 | Iris-virginica | | 0.28284244589567653 | Iris-virginica | | 0.31622746208231284 | Iris-virginica | | 0.3316624219760969 | Iris-virginica | | 0.3316624219760969 | Iris-virginica | +---------------------+----------------+ 
Enter fullscreen mode Exit fullscreen mode

Here is one solution using SingleStore Kai:

pipeline = [{ "$project": { "euclidean_distance": { "$euclideanDistance": [ "$vector", vector_bytes ] }, "species": "$species" } }, { "$sort": { "euclidean_distance": 1 } }, { "$limit": 5 } ] cursor = db.iris.aggregate(pipeline) table = [] for document in cursor: species = document["species"] euclidean_distance = document["euclidean_distance"] table.append([euclidean_distance, species]) print(tabulate(table, headers = ["euclidean_distance", "species"])) 
Enter fullscreen mode Exit fullscreen mode

The result should be:

 euclidean_distance species -------------------- -------------- 0 Iris-virginica 0.282842 Iris-virginica 0.316227 Iris-virginica 0.331662 Iris-virginica 0.331662 Iris-virginica 
Enter fullscreen mode Exit fullscreen mode

Query 3

Here is the third SQL query we used in a previous article, using some fictitious data values to make a prediction:

USE iris_db; SELECT EUCLIDEAN_DISTANCE(vector, JSON_ARRAY_PACK('[5.2,3.6,1.5,0.3]')) AS euclidean_distance, species FROM iris ORDER BY euclidean_distance LIMIT 5; 
Enter fullscreen mode Exit fullscreen mode

The result was:

+---------------------+-------------+ | euclidean_distance | species | +---------------------+-------------+ | 0.14142129538778386 | Iris-setosa | | 0.1732049874122573 | Iris-setosa | | 0.17320510570613526 | Iris-setosa | | 0.17320538530952567 | Iris-setosa | | 0.19999992325900512 | Iris-setosa | +---------------------+-------------+ 
Enter fullscreen mode Exit fullscreen mode

Here is one solution using SingleStore Kai:

vector = [5.2, 3.6, 1.5, 0.3] vector_bytes = struct.pack('f' * len(vector), *vector) pipeline = [{ "$project": { "euclidean_distance": { "$euclideanDistance": [ "$vector", vector_bytes ] }, "species": "$species" } }, { "$sort": { "euclidean_distance": 1 } }, { "$limit": 5 } ] cursor = db.iris.aggregate(pipeline) table = [] for document in cursor: species = document["species"] euclidean_distance = document["euclidean_distance"] table.append([euclidean_distance, species]) print(tabulate(table, headers = ["euclidean_distance", "species"])) 
Enter fullscreen mode Exit fullscreen mode

The result should be:

 euclidean_distance species -------------------- ----------- 0.141421 Iris-setosa 0.173205 Iris-setosa 0.173205 Iris-setosa 0.173205 Iris-setosa 0.2 Iris-setosa 
Enter fullscreen mode Exit fullscreen mode

Query 4

Finally, here is the fourth SQL query we used in a previous article:

USE iris_db; SELECT species FROM iris ORDER BY EUCLIDEAN_DISTANCE(vector, JSON_ARRAY_PACK('[5.2,3.6,1.5,0.3]')) LIMIT 1; 
Enter fullscreen mode Exit fullscreen mode

The output was:

+-------------+ | species | +-------------+ | Iris-setosa | +-------------+ 
Enter fullscreen mode Exit fullscreen mode

Here is one solution using SingleStore Kai:

pipeline = [{ "$project": { "euclidean_distance": { "$euclideanDistance": [ "$vector", vector_bytes ] }, "species": "$species" } }, { "$sort": { "euclidean_distance": 1 } }, { "$limit": 1 } ] cursor = db.iris.aggregate(pipeline) table = [] for document in cursor: species = document["species"] table.append([species]) print(tabulate(table, headers = ["species"])) 
Enter fullscreen mode Exit fullscreen mode

The result should be:

species ----------- Iris-setosa 
Enter fullscreen mode Exit fullscreen mode

Comparing the SQL results with the SingleStore Kai results, we can see that the $euclideanDistance function in SingleStore Kai is working as expected.

Summary

In this short article, we tested SQL against SingleStore Kai using the $euclideanDistance function. In future articles, we'll try the additional functionality of this new product offering. Stay tuned.

Top comments (0)