Update: As of September 2024, running R is no longer possible on the SingleStore Portal. I will find another way to show the integration between SingleStore and R.
Abstract
SingleStore provides a Jupyter-based notebook environment with support for Python, SQL and Markdown. However, we can also install and use the R programming language. In this article, we'll see how.
The notebook file used in this article is available on GitHub.
Create a SingleStore Cloud account
A previous article showed the steps to create a free SingleStore Cloud account. We'll use the following settings:
- Workspace Group Name: R Demo Group
- Cloud Provider: AWS
- Region: US East 1 (N. Virginia)
- Workspace Name: r-demo
- Size: S-00
Create a new notebook
From the left navigation pane in the cloud portal, we'll select DEVELOP > Data Studio.
In the top right of the web page, we'll select New Notebook > New Notebook, as shown in Figure 1.
We'll call the notebook r_demo, select a Blank notebook template from the available options, and save it in the Personal location.
Create a database
In our SingleStore Cloud account, let's use the SQL Editor to create a new database. Call this iris_db
, as follows:
DROP DATABASE IF EXISTS iris_db; CREATE DATABASE IF NOT EXISTS iris_db;
Fill out the notebook
First, let's install the R kernel and some other packages we need for this article:
!conda install -y --quiet -c conda-forge r-irkernel r-rjava r-rjdbc r-ggplot2
Next, we need to change the kernel. Refreshing the page will help the notebook detect any changes, including the installation of a new kernel.
In the top right, we can see that Python is currently selected, as shown in Figure 2.
Selecting Python 3 will present a box with a pull-down as shown in Figure 3.
Clicking the pull-down will show some options and R should be one of the options. We'll choose R, as shown in Figure 4.
Next, we'll click the Select button.
To connect to SingleStore, we'll use JDBC, as follows:
library(RJDBC)
This will also load DBI
and rJava
:
Loading required package: DBI Loading required package: rJava
Next, we'll download the SingleStore JDBC Client and save it in a jars
directory:
# URL of the JDBC driver file driver_url <- "https://repo1.maven.org/maven2/com/singlestore/singlestore-jdbc-client/1.2.1/singlestore-jdbc-client-1.2.1.jar" # Set the JDBC driver class name driver <- "com.singlestore.jdbc.Driver" # Local directory to save the driver file local_dir <- "jars" dir.create(local_dir, showWarnings = FALSE, recursive = TRUE) # Check if the driver file already exists driver_file <- file.path( local_dir, "singlestore-jdbc-client-1.2.1.jar" ) if (!file.exists(driver_file)) { # Download the JDBC driver file if it doesn't exist download.file( driver_url, destfile = driver_file, mode = "wb", quiet = TRUE ) } # Check if the driver file has been downloaded successfully if (file.exists(driver_file)) { print("Driver file downloaded successfully") } else { print("Failed to download the driver file") }
Now we'll create the connection details to SingleStore:
host <- "<HOST>" port <- 3306 database <- "iris_db" user <- "admin" password <- "<PASSWORD>" url <- paste0("jdbc:singlestore://", host, ":", port, "/", database)
Replace <HOST>
and <PASSWORD>
with the values for your environment. These values can be obtained from the workspace using Connect > SQL IDE.
Next, let's load the iris
dataset, make some small adjustments to the column names and show the first few rows:
# Load the iris dataset data(iris) # Replace "." with "_" in column names colnames(iris) <- gsub("\\.", "_", colnames(iris)) # Print the first few rows of the dataset head(iris)
Example output:
Sepal_Length Sepal_Width Petal_Length Petal_Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa
We'll now prepare the connection to SingleStore, write the iris
dataset to the database and read it back again.
# Establish the JDBC connection conn <- dbConnect( drv = JDBC(driver, driver_file), url = url, user = user, password = password ) # Write the iris dataset to the database dbWriteTable(conn, "iris", iris, overwrite = TRUE) # Read the iris dataset from the database iris_from_db <- dbReadTable(conn, "iris") # Print the first few rows of the dataset read from the database head(iris_from_db) # Close the JDBC connection dbDisconnect(conn)
Example output:
Sepal_Length Sepal_Width Petal_Length Petal_Width Species 1 5.7 2.9 4.2 1.3 versicolor 2 5.1 3.8 1.5 0.3 setosa 3 5.4 3.0 4.5 1.5 versicolor 4 4.3 3.0 1.1 0.1 setosa 5 5.5 2.5 4.0 1.3 versicolor 6 6.4 2.9 4.3 1.3 versicolor
Bonus: Create visualisations
We can easily create some plots using ggplot
:
library(ggplot2)
First, a scatter plot of Sepal Length vs Sepal Width, as shown in Figure 5.
# Scatter plot of Sepal Length vs Sepal Width ggplot(iris_from_db, aes(x = Sepal_Length, y = Sepal_Width, color = Species)) + geom_point() + labs(x = "Sepal Length", y = "Sepal Width", title = "Sepal Length vs Sepal Width")
Next, a box plot of Petal Length by Species, as shown in Figure 6.
# Box plot of Petal Length by Species ggplot(iris_from_db, aes(x = Species, y = Petal_Length, fill = Species)) + geom_boxplot() + labs(x = "Species", y = "Petal Length", title = "Petal Length by Species")
Finally, a histogram of Petal Width, as shown in Figure 7.
# Histogram of Petal Width ggplot(iris_from_db, aes(x = Petal_Width)) + geom_histogram(binwidth = 0.1, fill = "skyblue", color = "black") + labs(x = "Petal Width", y = "Frequency", title = "Petal Width")
Summary
In this short article, we've seen how to install R, how to connect to SingleStore from R, and how to write and read data using R and SingleStore. We've also quickly and easily created several powerful visualisations.
Top comments (0)