Inviting nominations for the Champions of Scale - Nominate a leader building internet-scale data infastructure today! Nominate someone
Developers/Tutorials/SQL/SELECT - Java
Implementing SQL Operations: SELECT
For an interactive Jupyter notebook experience: Binder
This tutorial describes how to implement certain SQL SELECT statements in Aerospike.
This notebook requires the Aerospike Database running locally with Java kernel and Aerospike Java Client. To create a Docker container that satisfies the requirements and holds a copy of Aerospike notebooks, visit the Aerospike Notebooks Repo.
Introduction
In this notebook, we will see how specific SELECT statements in SQL can be implemented in Aerospike.
SQL is a widely known data access language. If you have used SQL, the examples in this notebook will make it easier to implement specific SQL SELECT statements.
This notebook is the first in the SQL Operations series that consists of the following notebooks:
Implementing SQL Operations: UPDATE, CREATE, and DELETE
Implementing SQL Operations: Aggregates
The specific topics and SQL SELECT statements we discuss include:
Components of the SELECT statement
Single record retrieval
Batch retrieval
Predicate based retrieval
Query operation using an index
Query operation using an expression filter
Scan operation using an expression filter
Computed fields
The purpose of this notebook is to provide Aerospike API equivalents for specific SQL operations. Not all SQL operations (such as JOIN) are directly supported in Aerospike API. The SQL syntax in this notebook while largely accurate is meant only to convey the semantics of the operations. Check out Aerospike Presto Connector for ad-hoc SQL access to Aerospike data.
While Aerospike provides both synchronous and asynchronous execution modes for many operations, we use mostly synchronous execution examples in this notebook, leaving asynchronous execution as a separate topic for a future tutorial.
Prerequisites
This tutorial assumes familiarity with the following topics:
The test data has ten records with user-key “id-1” through “id-10”, two integer bins (fields) “bin1” and “bin2”, in the namespace “test” and sets “sql-select-small”and null, and similarly structured 1000 records in set “sql-select-large”.
System.out.println("Initialized the client and connected to the cluster.");
StringNamespace="test";
StringSmallSet="sql-select-small";
StringLargeSet="sql-select-large";
StringNullSet="";
WritePolicywpolicy=newWritePolicy();
wpolicy.sendKey=true;
for (inti=1; i <=10; i++) {
Keykey=newKey(Namespace, SmallSet, "id-"+i);
Binbin1=newBin(new String("bin1"), i);
Binbin2=newBin(new String("bin2"), 1000+i);
client.put(wpolicy, key, bin1, bin2);
}
for (inti=1; i <=10; i++) {
Keykey=newKey(Namespace, NullSet, "id-"+i);
Binbin1=newBin(new String("bin1"), i);
Binbin2=newBin(new String("bin2"), 1000+i);
client.put(wpolicy, key, bin1, bin2);
}
for (inti=1; i <=1000; i++) {
Keykey=newKey(Namespace, LargeSet, "id-"+i);
Binbin1=newBin(new String("bin1"), i);
Binbin2=newBin(new String("bin2"), 1000+i);
client.put(wpolicy, key, bin1, bin2);
}
System.out.format("Test data populated");;
Output:
Initialized the client and connected to the cluster.
Test data populated
Mapping Components of SELECT Statement
Columns and tables
In Aerospike, a relational database or schema maps to a namespace, a table maps to a set, and a column maps to a bin. Thus a query SELECT columns FROM table WHERE condition can be written in Aerospike terminology as SELECT bins FROM namespace.set WHERE condition.
Record id
Records are stored in a namespace, organized in sets, and each record is uniquely identified by a key or id of the record that consists of a triple: (namespace, set, user-key) where user-key is a unique user specified id within the set. The key is closely identified with a record, and can be seen either as a metadata or a primary key field, and is returned in all retrieval APIs.
Record metadata
Each record has generation (or version) and expiration (or time-to-live in seconds) associated with it. This metadata is returned in all retrieval operations. It is possible to retrieve only the metadata without the record’s bins through “getHeader” operation explained below.
A note on Policy
All APIs take a Policy argument. A policy specifies many request parameters such as timeout and maximum retries, as well as operations modifiers such as an expression filter.
Single Record Get
Let’s start with a simple example of a single record retrieval using its key. You can either get the entire record or specific bins.
SELECT * FROM namespace.set WHERE id = key
Record Client::get(Policy policy, Key key)
SELECT bins FROM namsepace.set WHERE id = key
Record Client::get(Policy policy, Key key, String... binNames)
importcom.aerospike.client.Record;
// Read all bins of the record with user-key "id-3" from the small set
A batch operation operates on a list of records identified by the keys provided. This works similar to a single record retrieval, except multiple records are returned.
It is possible to obtain header info or metadata consisting of generation (or version) and expiration time (time-to-live in seconds) for a specified set of records.
SELECT generation, expiration FROM namespace.set WHERE id IN key-list
System.out.format("Key not found: key=%s\n", keys[i].userKey);
}
}
Output:
Batch metadata results:
Key not found: key=id-0
key=id-1 generation=1 expiration=355535130
key=id-2 generation=1 expiration=355535130
key=id-3 generation=1 expiration=355535130
Union of Batch Retrievals
A more general form of batch reads is also available that provides a union of simple batch results with different namespace, set, and bin specification. It populates the argument “records” on return.
(SELECT bins1 FROM namespace1.set1 WHERE id IN key-list1) UNION (SELECT bins2 FROM namespace2.set2 WHERE id IN key-list2) UNION ...
System.out.format("Key not found: set='%s' key=%s\n", key.setName, key.userKey);
}
}
Output:
Union of multi batch results:
set='sql-select-small' key=id-1 bins={bin2=1001}
set='' key=id-1 bins={bin1=1, bin2=1001}
set='sql-select-small' key=id-2 bins=null
set='sql-select-small' key=id-3 bins=null
Key not found: set='sql-select-small' key=no-such-key
Predicate Based Retrieval
In these operations, records matching a general predicate (or a condition) are retrieved.
SELECT bins FROM namespace.set WHERE condition
There are multiple ways of performing this SQL query in Aerospike. They involve query and scan operations.
Query operation using an index and/or expression filter
Scan operation using an expression filter
The query operation must be used when an index is involved, but may be used without an index. The scan operation can only be used without an index.
Query Based on Index
In SQL, an index if applicable is used automatically. In Aerospike, one must know the index and specify it explicitly in the statement argument in a query operation.
To use the query API with index based filter, a secondary index must exist on the filter bin. Here we create a numeric index on “bin1” in “sql-select-small” set.
importcom.aerospike.client.policy.Policy;
importcom.aerospike.client.query.IndexType;
importcom.aerospike.client.task.IndexTask;
importcom.aerospike.client.AerospikeException;
importcom.aerospike.client.ResultCode;
StringIndexName="test_small_bin1_number_idx";
Policypolicy=newPolicy();
policy.socketTimeout=0; // Do not timeout on index create.
Here are some key points to remember about query and scan operations in Aerospike.
To leverage an index, one must use a query operation.
A query takes either or both: an index predicate and an expression filter.
An expression filter may be used instead of an index predicate, but it will not perform as well.
When only an expression filter is needed, either a query or a scan may be used (as shown above).
A null set value when an index predicate is used works on the null set (records belonging to no set), but without an index predicate works on the entire namespace.
An expression filter is specified within the policy, and is applied generally for filtering records beyond query and scan. You can find examples of this outside of this tutorial.
Some of these are illustrated with examples in the following cells.
To leverage an index, one must use a query operation.
If an index predicate is used on an unindexed bin, it results in an error.
Statementstmt=newStatement();
stmt.setNamespace(Namespace);
stmt.setSetName(SmallSet);
// try to use an index predicate on bin2 which has no index
stmt.setFilter(Filter.range("bin2", 1004, 1007));
try {
RecordSetrs=client.query(null, stmt);
System.out.format("Query with index predicate on unindexed bin results:\n");
Query based on index predicate and expression results
key=id-2 bins={bin1=2, bin2=1002}
key=id-3 bins={bin1=3, bin2=1003}
A null set value when an index predicate is used works records belonging to no (null) set, but without an index predicate works on the entire namespace.
The scope of an index is a set. An index must exist on the null set when an index predicate is used with the null set.
// query with a null set
Statementstmt=newStatement();
stmt.setNamespace(Namespace);
stmt.setSetName(NullSet);
// the filter selects records with bin1=3 in all sets
An arbitrary function registered on the server (UDF) is invoked on the specified record. In this tutorial, we deal with a single record oriented functions as opposed to “stream oriented” functions. The latter will be discussed in a subsequent notebook on Aggregates in this series.
The API returns a generic Object which can be anything like a single value or a dictionary. Note, UDFs may not be appropriate for performance sensitive applications; for record-oriented functions, simply retrieving the record and computing the function on the client site may be faster. A read-write function may be alternatively implemented atomically on the client side using the read-modify-write pattern.
Create User Defined Function (UDF)
Examine the following Lua code that takes two bins and returns their sum and product. Create a “udf” directory under “java” and create a file “computed_fields.lua” with this Lua code.
Add the following code to the file “computed_fields.lua” in the sub-directory “udf”:
<pre>
-- computed_fields.lua - return sum and product of specified bins
function sum_and_product(rec, binName1, binName2)
local ret = map() -- Initialize the return value (a map)
Many developers that are familiar with SQL would like to see how SQL operations translate to Aerospike. We looked at how to implement various SELECT statements. This should be generally useful irrespective of the reader’s SQL knowledge. While the examples here use synchronous execution, many operations can also be performed asynchronously.
Visit Aerospike notebooks repo to run additional Aerospike notebooks. To run a different notebook, download the notebook from the repo to your local machine, and then click on File->Open, and select Upload.