Database ( latest 8.1.0 )

/Glossary

Glossary

The Aerospike schemaless data model gives application designers maximum flexibility. Aerospike uses the following terms to differentiate it from the relational database (RDBMS) world. In our documentation, we introduce Aerospike concepts with their corresponding common RDBMS terms.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

ACID compliant

ACID compliance refers to database transaction characteristics of Atomicity, Consistency, Isolation, and Durability (ACID).

Atomicity: Ensures that the transaction’s commands are executed in order, and all record modifications succeed. Otherwise, all record modifications fail and are rolled back to their state before the transaction.
Consistency: Ensures that transactions only make changes in predictable ways.
Isolation: Ensures that record modifications inside the transaction aren’t viewable from the outside, until it is committed, preventing concurrent transactions from interfering with each other. Aerospike transactions guarantee strict serializability.
Durability: Ensures that data is saved after a transaction is completed, even if there is system failure such as a power outage.

Adopting ACID principles puts Aerospike in industry-standard compliance with reliability, validity, and accuracy measures. These principles ensure that there is no data loss or corruption due to network errors, disruptions, or hardware failures. Industries that require ACID compliance include financial institutions, manufacturing operations, transportation, IoT environments, and energy production.

AQL

Aerospike Quick Look. A client built around a familiar and common query language. May be familiar to SQL users but does not maintain parity with SQL by design.

all flash

When the primary index and the data are both stored on NVMe flash devices rather than storing just the primary index in memory.

asadm

Aerospike admin tool. A multifunctional utility to extract and change configuration, configure authentication, and analyze performance and health information from a cluster or a collectinfo file. Python based.

asd

Aerospike Daemon. An Aerospike database process that is created by the user and runs on a server or node.

asmt

Aerospike Shared Memory Tool. Enables primary and secondary indexes to be backed up from shared memory to files and restored from files to shared memory. This allows the database to be restarted and the indexes restored, enabling a fast restart.

available mode

Aerospike’s default consistency mode represents an Available and Partition-tolerant (AP) mode from the CAP theorem perspective. In the event of a network partition, any sub-cluster of servers claims complete ownership of all data partitions. This is in contrast to Strong Consistency (SC) mode (CP).

batch operations

Batch operations are repeating computing tasks that can be kicked off and left unattended until they run to completion. The term batch operations arose when punched cards were used to tell computers what to do when performing more than one program. When multiple directions were needed, these cards were run in batches.

In the database world, batch operations refers to the processing of a large number of like tasks (batch reads and batch writes are most common) instead of processing each task separately. Batch updates also fall under batch operations. Batch updates are sets of multiple update statements that are submitted to the database for processing as a batch.

Batch operations usually save compute resources and time because executing a hundred (or a million) individual reads or writes usually takes much longer than executing those operations in a batch.

batch

A batch transaction is when you have the key or digest and know the records you want to access, which are sent together directly to the relevant nodes. A batch groups multiple operations into one unit and transmits them over a single network socket to each cluster node.

batched commands

Batch commands are repeating computing tasks that can be kicked off and left unattended until they run to completion.

Batched operations refers to the processing of a group of commands such as reads, writes, and deletes, instead of processing each task separately. Batch updates also fall under batched commands. Batch updates are sets of multiple update statements that are submitted to the database for processing as a batch.

Batched commands usually save compute resources and time because executing multiple individual reads or writes usually takes much longer than executing those operations in a batch.

bin

A sub-object of a record in Aerospike. Each bin has a data type, which does not need to match the data types of bins in other records. In the Aerospike database, each record (similar to a row in a relational database) stores data using one or more bins (like columns in a relational database). The major difference between bins and RDBMS columns is that you don’t need to define a schema. Each record can have multiple bins. Bins accept these data types (which are also referred to as “particles” in documentation and messages about bins):

Boolean
Bytes
Double
Geospatial
HyperLogLog
Integer
List
Maps
String

For information about these data types and how bins support them, see “Scalar Data Types”

Although the bin for a given record or object must be typed, bins in different rows do not have to be the same type. There are some internal performance optimizations for single-bin namespaces.

CAP theorem

CAP theorem states that distributed systems can provide at most two of the following three properties: Consistency, Availability, and Partition Tolerance. In Aerospike, you can choose Available and Partition-Tolerant (AP) or Consistent and Partition-Tolerant (CP), which is known as strong consistency (SC) in Aerospike. Consistent and Available (CA) cannot be implemented in practice for distributed systems as consistency and availability fail during partition events.

client

A library included by the user’s application, which provides an API that allows the application to perform operations against the Aerospike database cluster. In our documentation, client, API, and application are used interchangeably. The client is written in a language such as Java, C, C#, Go, Python, Node.js, Ruby, Rust and others

cluster protocol

Shared-nothing architecture is a distributed computing design where each node operates independently, owning its own dedicated CPU, RAM, and storage (disk). Every node is a peer. There’s no single master node that can fail and bring the system down.

cluster

Aerospike is a distributed database, made up of a collection of one or more database nodes, a cluster. The cluster acts together to distribute and replicate both data and traffic. Client applications use Aerospike APIs to interact with the cluster, rather than with individual nodes. This means that the application does not need to know cluster configuration. Data in the cluster is evenly distributed to ensure an even resource consumption on the nodes. As you add or remove nodes from the cluster, it dynamically adjusts without needing any application code or configuration changes.

cold start

The server starts and rebuilds its primary index by scanning the data on persistent storage, either disk or flash devices. The cold start process can take hours in some cases. During a cold start, records that were not durably deleted may be resurrected, potentially reverting to older states.

collection data type (CDT)

Collection data types (CDTs) are flexible, schema-free containers that can hold scalar data or nest other collections within them. The elements in a CDT can be of mixed types. CDTs are a superset of JSON, supporting more data types as elements of the collection, such as integers for map keys and binary data as list values.

consistency

Requirement in database systems that any given database transaction must change affected data only in allowed ways. Any data written to the database must be valid according to all defined rules.

cross-datacenter replication

Cross-datacenter replication (XDR) lets data be reproduced – or replicated – across clusters that can be located in different clouds and various data centers.

The replication guards against data-center failure. It’s also used to supply high-performance access to globally distributed applications that are mission critical. Cross-datacenter replication guarantees continuous service because if one of the data centers has a problem, there is backup data in another center.

Once replications are established, they continuously replicate until paused or deleted.

The telecommunications industry relies on cross-datacenter replication because data availability, consistency, resilience and low latency are critical.

data storage layer

A data storage layer is where your gathered data is stored and saved for when it is needed. There are four layers in data warehouse architecture: data source layer, data staging layer, data storage layer and data presentation layer. The data storage layer makes it easier to back up files to ensure they remain safe and can be recovered quickly if computer hackers strike or there is some sort of outage.

In the data storage layer, the data is cleaned, transformed and prepared with a specific structure. This enables access by those within a business who require the data for various reasons.

data synchronization

Data synchronization is required when two or more systems want to access and manipulate the same datasets with accuracy and consistency. Data synchronization can take place in memory in the case of a traditional relational database, or it may be required with datasets that are widely distributed – in different cities, regions, or data centers.

In order to achieve effective Data synchronization, a database/data platform must prepare and cleanse data, check for errors or duplication and then ensure consistency before it can be distributed, replicated, and synchronized. This is important because if synchronized data is changed by any replica, those updates must be reflected throughout the system to avoid errors, prevent fraud, protect private data and deliver accurate, up-to-date information and insights.

Data synchronization is becoming more vital as the population grows mobile and globalization continues. Data synchronization is also important with the growing accessibility to cloud-based data.

Some of the data synchronization methods include data replication in databases, file synchronization – typically used for home/cloud backups – and version control methods to synchronize files that might be changed by more than one user simultaneously. A distributed file system usually requires that devices be connected in order to sync multiple file versions. Mirror computing provides different sources with the same copy of the data set.

defragmentation

When records are updated or deleted, the percentage of active records in a previously written block may fall below the defrag-lwm-pct threshold. This makes the block eligible for defragmentation, where records from partially empty blocks are read and rewritten to a new write block to optimize space and access efficiency.

demarshaling

Also known as deserializing. This process converts a serialized data structure, such as from incoming network communication, into an internal data structure. The reverse operation, converting an internal structure to a serial format, is called marshaling or serializing.

digest

The Primary Index Digest is a 20-byte unique object identifier created on the client side by hashing the user key and, if available, the record’s set name using the RIPEMD-160 algorithm, which takes a key of any length and always returns a 20-byte digest. By default the record saves the digest but not the key, which saves storage for long keys over 20 bytes.

eviction

Eviction is the forced, preventative removal of records from the database when memory or disk utilization exceed a predefined safety threshold. The server does it automatically and without explicit, direct user input (like a client delete command). This process is handled by the Namespace Supervisor (NSUP) background thread. Only records that have a positive time to live (TTL) are eligible for eviction. Records with a TTL of 0, which indicates they never expire, are not evicted.

heartbeat

Heartbeat notifies that nodes are still active in the cluster. Messages exchanged between nodes in an Aerospike cluster to monitor and detect changes in cluster membership. The heartbeat protocol can be multicast (using IP multicast) or mesh (using configured IP addresses of peer nodes to connect).

high availability database

A high availability database is a database that is designed to operate with no interruptions in service, even if there are hardware outages or network problems. High availability databases often exceed even what’s stipulated in a service level agreement.

A high availability database ensures greater uptime by eliminating single points of failure, ensuring a reliable crossover between redundant systems and detecting failures right away, such as through environmental problems or hardware or software failures.

Typical high availability database features include server or node failover, hot standby, data replication and distributed microservice architecture.

Many businesses today have critical databases and applications, such as data warehouses and ecommerce applications that require high availability. High availability databases are important to reduce the risk of losing revenue or dissatisfied customers.

hotkey

A hotkey (also hot key or hot-key) is a specific key or digest subjected to a disproportionately large number of read/write operations in a short time window. This can occur when multiple clients or processes attempt to access or modify the same data element simultaneously, leading to a concentrated workload on a single node.

When a server node receives too many concurrent requests for the same key, it may reject the request with a KEY_BUSY error to avoid uneven load distribution. This also increments the fail_key_busy statistic for monitoring such scenarios.

Write hotkeys are logged, while read hotkeys are logged only in strong consistency mode. You can also enable key logging by setting the rw-client logging context to detail for hotkey analysis.

hybrid storage

Hybrid storage is a storage strategy that blends the use of flash storage, solid state drives (SSDs) and mechanical disk drives in order to provide the optimal combination of cost and performance for a given set of workloads. A hybrid storage approach enables a myriad set of different applications and use cases to have the storage performance they need at the right price point provided by the hybrid storage platform.

One of the benefits of hybrid storage is that it enables organizations to leverage high performance storage – such as flash drives or SSDs – when it is needed. Organizations can determine whether data is hot, warm or cold and then choose the most appropriate storage medium for the application. This enables businesses to craft a plan about how data will be used and when to achieve the greatest impact and efficiency.

Hybrid storage can sometimes be implemented in a single storage system. This offers users a single point of accountability for hardware and software issues. This can be important when businesses are looking for greater efficiency when data volumes are increasing and storing everything on flash storage can be too expensive.

Index

A data structure or mechanism that improves the speed and efficiency of data retrieval operations within a database or other data storage systems.

key value NoSQL database

A key value NoSQL database used to describe a new generation of non-relational databases that use a key-value method to store data as a collection of key-value pairs in order to get fast lookup results on very large datasets. In this way, the key becomes the unique identifier. A key value NOSQL database is considered the simplest type of NoSQL databases.

A key value NoSQL database offers rapid data storage (writes) and information retrieval (reads) because of its simple data structure and lack of a predefined schema. It also has a high performance because of common use of integrated caching features that enables users to store and retrieve data very quickly. Because of its relative architectural simplicity, a key value NoSQL database can or scale out quickly in cloud environments without causing operational disruptions.

key

The unique identifier of a record in Aerospike, similar to how a primary key in an RDBMS identifies a single record in a table. By default, the key is not stored with the record to optimize storage. The key is the distinct (set, userKey) pair in a specified namespace. The userKey data type can be a string, integer or bytes (blob). For example in a namespace user_profiles, a specific user record can be identified by the key (eu-users, 'foo@gmail.com').

KNN (exact)

An exact k Nearest Neighbor search, which is an exhaustive search technique that gets the best results but can take a long time to perform.

Linearizability

Linearizability is one of the strongest single-object consistency models, and implies that every operation appears to take place atomically, in some order, consistent with the real-time ordering of those operations. This means that if operation A completes before operation B begins, then B should logically take effect after A. Linearizability is a single-object model, but the scope of an object varies. Some systems provide linearizability on individual keys in a key-value store; others might provide linearizable operations on multiple keys in a table, or multiple tables in a database—but not between different tables or databases, respectively. When you need linearizability across multiple objects, try strict serializability. When real-time constraints are not important, but you still want each process to observe a logical order (but allows other processes to observe a different logical order), try sequential consistency.

lut

Last Update Time of a namespace or set, measured in nanoseconds since the last update.

master record

The primary copy of a record in a namespace with a replication factor. For example, with a replication factor of 2, there is one master record and one replica. Writes occur on the master, which may also be referred to as the write-master, while the replica is known as the write-prole.

migration

When nodes are added or removed from a database cluster, data migrates between the remaining nodes. After migrations are complete, the data in the new cluster is evenly distributed. Migrations occur when the cluster topology changes, such as when a node is added or removed or during network issues. During migrations, record data moves as part of the partition it is mapped to via its key hash.

monitor

The monitor keeps track of the records locked and written by a transaction. If the client goes away without fully committing or aborting the transaction, the monitor steps in after the transaction timeout deadline to prevent it from dangling.

multi-model database

A multi-model database is a database management system that supports different data modeling approaches in a single storage engine. This provides a single backend database that can service a wider range of data processing and data retrieval tasks and use cases. This differs from most database management systems that are organized around a single data model, such as relational, key-value, wide-column, document, or graph, that decides how data is organized, stored and manipulated.

Aerospike is a multi-model database supporting key-value, wide-column, document and graph data modeling.

namespace

A top-level data container in Aerospike. It is a physical collection of similar records within a storage engine that share common policies, such as replication factor, encryption, and storage type. A namespace is similar to a tablespace in an RDBMS. Aerospike database clusters contain one or more namespaces. Namespaces segregate data with different storage requirements. For example, some data may have high performance/low storage requirements more suitable for RAM, while other data can be stored on SSD storage. The Aerospike schemaless data model allows you to mix data types within a namespace. You can store data on users and URLs in the same namespace, and separate them using sets.

node

An Aerospike database cluster is made of one or more nodes. These are the individual servers that act together as a distributed database to fulfill client requests. Each node holds a portion of the data and contributes to the overall computing power of the cluster.

nsup

Namespace supervisor, the main server thread responsible for handling expirations and evictions within a namespace.

particle

Synonym for “data type” in documentation and messages referring to bins. For example, “Boolean particle” means “Boolean data type” in reference to bins.

policy

Policies control the behavior of individual operations against the database, such as reading records, performing read and write operations on distinct data types within the record. They also dictate the operational behavior of a namespace or the entire database node or cluster.

primary index

Primary key index (PI) is a set of 20-byte RIPEMD-160 hashes created from the set and identifier portion of the record key tuple (namespace, set, identifier). The first 12 bytes of this hash determine the partition within the namespace to which the record is assigned. The hash is stored in a hash table that links to a red-black tree data structure called a sprig, containing the data location metadata. When looking up a record using its primary key, a digest is created from the set and identifier, allowing the client to locate the partition and node. Once the request is received, the digest is used to find the record entry in the hash table and retrieve the metadata to access the full record.

pristine blocks

Blocks that have never been written to by Aerospike. Aerospike prioritizes writing to blocks that have been cleared by defragmentation before using pristine blocks. This improves cold start performance as unwritten blocks can be skipped during indexing.

provisional record

A record that was locked during a transaction.

query

A request for all records matching specific criteria. Queries can be performed against a primary index (key) or a secondary index (bin value). Primary index queries support read-only operations like fetching records from a namespace or set or those before a given last update time (LUT). They can also perform background read-write queries with UDF-defined actions. Secondary index queries locate records by bin values, with the number and location of matching records often unknown at the time of the query.

Retrieval-augmented Generation (RAG)

Retrieval-augmented Generation (RAG) is a model that combines retrieval-based and generation-based approaches to generate more accurate and informative responses by augmenting generative models with retrieved documents.

record block

The initial landing spot for an incoming written record. One record block holds only one record, though one record can span multiple blocks.

record

An object containing data identified by a single key, similar to a row in an RDBMS. Each record is stored in a partition and optionally in a set.

record/object

A record (or object) is similar to a row in an RDBMS. It is a contiguous storage unit for all the data uniquely identified by a single key. A record is subdivided into bins (like columns in an RDBMS)

replication factor (RF)

The number of copies of each record maintained in a namespace.

roll back

An abort of a transaction due to a failure of some sort. An abort rolls back the transaction by removing the provisional records (records that were locked during a transaction) Roll back is a component of a commit. Rolls are performed by either the client or the monitor. The last step of a commit is to delete the monitor record.

roll forward

A commit of the provisional records (records that were locked during a transaction). The roll forward command removes the older version. Roll forward is a component of a commit. Rolls are performed by either the client or the monitor. The last step of a commit is to delete the monitor record.

rw-hash

Replica Write hash, a structure used to park transactions that require coordination with another node before responding to the client. It is used for write transactions, read transactions during migrations, and for parking read transactions in strong consistency-enabled namespaces.

SC mode

Strong consistency mode ensures that while writes in progress can be reordered, the read mode determines the order in which the app observes the writes. Sequential will see a progressing record version order for each record but the order across records may differ from client to client. Linearizable ensures that all clients see the records progress in the same order. From the CAP theorem perspective, it represents Consistent and Partition-tolerant (CP) behavior, as opposed to Aerospike’s default Available and Partition-tolerant (AP) mode.

sequential consistency

Sequential consistency is a strong safety property for concurrent systems. Informally, sequential consistency implies that operations appear to take place in some total order, and that that order is consistent with the order of operations on each individual process. A process in a sequentially consistent system may be far ahead of, or behind, other processes. For example, they may read arbitrarily stale state. However, once a process A has observed some operation from process B, it can never observe a state prior to B. This, combined with the total ordering property, makes sequential consistency a surprisingly strong model for programmers. In Aerospike, only cluster disruptions cause linearizability and sequential/session consistency to differ. While the cluster is stable without any migrations, they are indistinguishable from one another other than the linearizability performance penalty. When you need real-time constraints, such as a case where you want to tell some other process about an event using a side channel, and have that process observe that event, try linearizability. When a shared total order across all processes isn’t required, use sequential consistency.

serializability

Serializability is a transactional model. Transactions can involve several primitive sub-operations performed in order. Serializability guarantees that operations take place atomically where a transaction’s sub-operations do not appear to interleave with sub-operations from other transactions. It is also a multi-object property where operations can act on multiple objects in the system. Serializability applies not only to the particular objects involved in a transaction, but to the system as a whole. Also we provide strict serializability and currently do not allow relaxing this. We do not allow scans and queries as part of transactions.

service thread

A worker thread on a cluster node responsible for receiving client requests and executing transactions.

set index

A logical subset of records within a namespace. Set indexes reduce the number of full primary index scans needed to find a record. They are most effective for sets smaller than 1% of the namespace.

set

An optional method of logically grouping records within a namespace using a record attribute. Sets function like tables in an RDBMS but do not require a schema. A set is not a distinct storage unit, but instead it is a collection of records within a namespace. The namespace does have its own dedicated storage.

shared-nothing

Aerospike uses a shared-nothing architecture, where memory and storage resources are not shared between nodes in a cluster.

sindex

Secondary Index (SI) locates records within a namespace or set by a bin value. Each node builds its own sindex with references only to local data. A secondary index can include both master and replica records.

sprig

A memory-based binary tree data structure used by Aerospike to store and retrieve primary index data.

storage engine

The physical storage medium and the method by which data is written to the medium.

striped

Relating to SSDs, refers to the way Aerospike distributes data across multiple devices, which typically makes external RAID striping unnecessary and potentially harmful.

strong consistency

Referred to as SC mode in Aerospike Database. Guarantees that all writes to a single record will be applied in a specific, sequential order, and writes will not be re-ordered or skipped to ensure that data is not lost.

subset

A flag used during migrations to indicate that a partition is not yet full. This flag is removed when migration to the partition completes. For example, during an add-node operation, both the replaced and new partitions are marked as subsets until migration finishes.

system metadata

Also known as SMD, system metadata stores critical system information such as secondary indexes, user-defined function definitions, user permissions, and eviction data. It is typically located at /opt/aerospike/smd on the node.

tending

The process by which the client discovers the cluster’s addresses and maps partitions to nodes. Tending begins with a seed connection, where the client retrieves a list of cluster node addresses, partitions, and generations. The client regularly checks for partition updates and monitors socket usage.

transaction ID, TxnID, TRID

In Aerospike, a transaction ID (TxnID or TRID) is the identifier for all the commands inside a distributed transaction. A TRID refers to an identifier the server return to the client after it launches a query, for the purpose of monitoring its progress.

transaction id / TR ID

An identifier returned by the node to the client for a query. In Aerospike versions post-6.0.0, this ID is returned immediately.

transactional workload

A transactional workload means that over time, the database is getting requests for data and various changes to that data from different users. The modifications that are made are known as transactions.

For example, a transactional workload is built to aid in transactions such as in banking or accounting systems. Relational databases such as MySQL were designed to handle transactional workloads. They can scale as needed, ensure transactional consistency and have quick, responsive queries.

tsvc

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

UDF

A User-Defined Function (UDF) is code written by a developer that runs inside the Aerospike database server. UDFs can significantly extend the capability of the Aerospike Database engine in functionality and in performance. Aerospike currently only supports Lua as a UDF language.

warm start

The server starts and attempts to recover its primary index from a state stored in persistent memory (PMem) or from a prior in-memory state, allowing it to become active much faster without a full disk scan. Previously known as fast restart.

write block

The storage location for record blocks, also called a streaming write buffer (swb) or wblock. A write block cannot span multiple records, and its size determines the record size limit. The default size is 1 MiB. Write blocks are flushed when full or after the flush-max-ms interval (default 1 second).

write queue

A temporary cache in RAM where write blocks are stored before being written to the storage engine.

xdr

Cross-Datacenter Replication, which asynchronously replicates records across high-latency network links. XDR can replicate full namespaces, sets within namespaces, or specific bins within records.