Skip to content

Types

Deep Lake provides a comprehensive type system designed for efficient data storage and retrieval. The type system includes basic numeric types as well as specialized types optimized for common data formats like images, embeddings, and text.

Each type can be specified either using the full type class or a string shorthand:

# Using type class ds.add_column("col1", deeplake.types.Float32())  # Using string shorthand ds.add_column("col2", "float32") 

Types determine:

  • How data is stored and compressed
  • What operations are available
  • How the data can be queried and indexed
  • Integration with external libraries and frameworks

Numeric Types

All basic numeric types:

import deeplake  # Integers ds.add_column("int8", deeplake.types.Int8()) # -128 to 127 ds.add_column("int16", deeplake.types.Int16()) # -32,768 to 32,767 ds.add_column("int32", deeplake.types.Int32()) # -2^31 to 2^31-1 ds.add_column("int64", deeplake.types.Int64()) # -2^63 to 2^63-1  # Unsigned Integers ds.add_column("uint8", deeplake.types.UInt8()) # 0 to 255 ds.add_column("uint16", deeplake.types.UInt16()) # 0 to 65,535 ds.add_column("uint32", deeplake.types.UInt32()) # 0 to 2^32-1 ds.add_column("uint64", deeplake.types.UInt64()) # 0 to 2^64-1  # Floating Point ds.add_column("float16", deeplake.types.Float16()) # Half precision ds.add_column("float32", deeplake.types.Float32()) # Single precision ds.add_column("float64", deeplake.types.Float64()) # Double precision  # Boolean ds.add_column("is_valid", deeplake.types.Bool()) # True/False values 

Basic Type Functions

deeplake.types.Int8

Int8(  index_type: (  str | IndexType | NumericIndex | None  ) = None, ) -> DataType | Type 

Creates an 8-bit integer value type.

Parameters:

Name Type Description Default
index_type str | IndexType | NumericIndex | None

str | IndexType | NumericIndex | None How to index the data in the column for faster searching. Options are:

  • :class:deeplake.types.Inverted

Default is None meaning "do not index"

None

Returns:

Type Description
DataType | Type

DataType | Type: A new 8-bit integer data type.

Examples:

Create a column with 8-bit integer type:

ds.add_column("col", types.Int8) ds.add_column("idx_col", deeplake.types.Int8(deeplake.types.NumericIndex(deeplake.types.Inverted))) ds.add_column("idx_col_1", deeplake.types.Int8(deeplake.types.Inverted)) ds.add_column("idx_col_2", deeplake.types.Int8("Inverted")) 

deeplake.types.Int16

Int16(  index_type: (  str | IndexType | NumericIndex | None  ) = None, ) -> DataType | Type 

Creates a 16-bit integer value type.

Parameters:

Name Type Description Default
index_type str | IndexType | NumericIndex | None

str | IndexType | NumericIndex | None How to index the data in the column for faster searching. Options are:

  • :class:deeplake.types.Inverted

Default is None meaning "do not index"

None

Returns:

Type Description
DataType | Type

DataType | Type: A new 16-bit integer data type.

Examples:

Create a column with 16-bit integer type:

ds.add_column("col", types.Int16) ds.add_column("idx_col", deeplake.types.Int16(deeplake.types.NumericIndex(deeplake.types.Inverted))) ds.add_column("idx_col_1", deeplake.types.Int16(deeplake.types.Inverted)) ds.add_column("idx_col_2", deeplake.types.Int16("Inverted")) 

deeplake.types.Int32

Int32(  index_type: (  str | IndexType | NumericIndex | None  ) = None, ) -> DataType | Type 

Creates a 32-bit integer value type.

Parameters:

Name Type Description Default
index_type str | IndexType | NumericIndex | None

str | IndexType | NumericIndex | None How to index the data in the column for faster searching. Options are:

  • :class:deeplake.types.Inverted

Default is None meaning "do not index"

None

Returns:

Type Description
DataType | Type

DataType | Type: A new 32-bit integer data type.

Examples:

Create a column with 32-bit integer type:

ds.add_column("col", types.Int32) ds.add_column("idx_col", deeplake.types.Int32(deeplake.types.NumericIndex(deeplake.types.Inverted))) ds.add_column("idx_col_1", deeplake.types.Int32(deeplake.types.Inverted)) ds.add_column("idx_col_2", deeplake.types.Int32("Inverted")) 

deeplake.types.Int64

Int64(  index_type: (  str | IndexType | NumericIndex | None  ) = None, ) -> DataType | Type 

Creates a 64-bit integer value type.

Parameters:

Name Type Description Default
index_type str | IndexType | NumericIndex | None

str | IndexType | NumericIndex | None How to index the data in the column for faster searching. Options are:

  • :class:deeplake.types.Inverted

Default is None meaning "do not index"

None

Returns:

Type Description
DataType | Type

DataType | Type: A new 64-bit integer data type.

Examples:

Create a column with 64-bit integer type:

ds.add_column("col", types.Int64) ds.add_column("idx_col", deeplake.types.Int64(deeplake.types.NumericIndex(deeplake.types.Inverted))) ds.add_column("idx_col_1", deeplake.types.Int64(deeplake.types.Inverted)) ds.add_column("idx_col_2", deeplake.types.Int64("Inverted")) 

deeplake.types.UInt8

UInt8(  index_type: (  str | IndexType | NumericIndex | None  ) = None, ) -> DataType | Type 

Creates an unsigned 8-bit integer value type.

Parameters:

Name Type Description Default
index_type str | IndexType | NumericIndex | None

str | IndexType | NumericIndex | None How to index the data in the column for faster searching. Options are:

  • :class:deeplake.types.Inverted

Default is None meaning "do not index"

None

Returns:

Type Description
DataType | Type

DataType | Type: A new unsigned 8-bit integer data type.

Examples:

ds.add_column("col", types.UInt8) ds.add_column("idx_col", deeplake.types.UInt8(deeplake.types.NumericIndex(deeplake.types.Inverted))) ds.add_column("idx_col_1", deeplake.types.UInt8(deeplake.types.Inverted)) ds.add_column("idx_col_2", deeplake.types.UInt8("Inverted")) 

deeplake.types.UInt16

UInt16(  index_type: (  str | IndexType | NumericIndex | None  ) = None, ) -> DataType | Type 

Creates an unsigned 16-bit integer value type.

Parameters:

Name Type Description Default
index_type str | IndexType | NumericIndex | None

str | IndexType | NumericIndex | None How to index the data in the column for faster searching. Options are:

  • :class:deeplake.types.Inverted

Default is None meaning "do not index"

None

Returns:

Type Description
DataType | Type

DataType | Type: A new unsigned 16-bit integer data type.

Examples:

ds.add_column("col", types.UInt16) ds.add_column("idx_col", deeplake.types.UInt16(deeplake.types.NumericIndex(deeplake.types.Inverted))) ds.add_column("idx_col_1", deeplake.types.UInt16(deeplake.types.Inverted)) ds.add_column("idx_col_2", deeplake.types.UInt16("Inverted")) 

deeplake.types.UInt32

UInt32(  index_type: (  str | IndexType | NumericIndex | None  ) = None, ) -> DataType | Type 

Creates an unsigned 32-bit integer value type.

Parameters:

Name Type Description Default
index_type str | IndexType | NumericIndex | None

str | IndexType | NumericIndex | None How to index the data in the column for faster searching. Options are:

  • :class:deeplake.types.Inverted

Default is None meaning "do not index"

None

Returns:

Type Description
DataType | Type

DataType | Type: A new unsigned 32-bit integer data type.

Examples:

ds.add_column("col", types.UInt32) ds.add_column("idx_col", deeplake.types.UInt32(deeplake.types.NumericIndex(deeplake.types.Inverted))) ds.add_column("idx_col_1", deeplake.types.UInt32(deeplake.types.Inverted)) ds.add_column("idx_col_2", deeplake.types.UInt32("Inverted")) 

deeplake.types.UInt64

UInt64(  index_type: (  str | IndexType | NumericIndex | None  ) = None, ) -> DataType | Type 

Creates an unsigned 64-bit integer value type.

Parameters:

Name Type Description Default
index_type str | IndexType | NumericIndex | None

str | IndexType | NumericIndex | None How to index the data in the column for faster searching. Options are:

  • :class:deeplake.types.Inverted

Default is None meaning "do not index"

None

Returns:

Type Description
DataType | Type

DataType | Type: A new unsigned 64-bit integer data type.

Examples:

ds.add_column("col1", types.UInt64) ds.add_column("idx_col", deeplake.types.UInt64(deeplake.types.NumericIndex(deeplake.types.Inverted))) ds.add_column("idx_col_1", deeplake.types.UInt64(deeplake.types.Inverted)) ds.add_column("idx_col_2", deeplake.types.UInt64("Inverted")) 

deeplake.types.Float16

Float16(  index_type: (  str | IndexType | NumericIndex | None  ) = None, ) -> DataType | Type 

Creates a 16-bit (half) float value type.

Parameters:

Name Type Description Default
index_type str | IndexType | NumericIndex | None

str | IndexType | NumericIndex | None How to index the data in the column for faster searching. Options are:

  • :class:deeplake.types.Inverted

Default is None meaning "do not index"

None

Returns:

Type Description
DataType | Type

DataType | Type: A new 16-bit float data type.

Examples:

Create a column with 16-bit float type:

ds.add_column("col", types.Float16) ds.add_column("idx_col", deeplake.types.Float16(deeplake.types.NumericIndex(deeplake.types.Inverted))) ds.add_column("idx_col_1", deeplake.types.Float16(deeplake.types.Inverted)) ds.add_column("idx_col_2", deeplake.types.Float16("Inverted")) 

deeplake.types.Float32

Float32(  index_type: (  str | IndexType | NumericIndex | None  ) = None, ) -> DataType | Type 

Creates a 32-bit float value type.

Parameters:

Name Type Description Default
index_type str | IndexType | NumericIndex | None

str | IndexType | NumericIndex | None How to index the data in the column for faster searching. Options are:

  • :class:deeplake.types.Inverted

Default is None meaning "do not index"

None

Returns:

Type Description
DataType | Type

DataType | Type: A new 32-bit float data type.

Examples:

Create a column with 32-bit float type:

ds.add_column("col", types.Float32) ds.add_column("idx_col", deeplake.types.Float32(deeplake.types.NumericIndex(deeplake.types.Inverted))) ds.add_column("idx_col_1", deeplake.types.Float32(deeplake.types.Inverted)) ds.add_column("idx_col_2", deeplake.types.Float32("Inverted")) 

deeplake.types.Float64

Float64(  index_type: (  str | IndexType | NumericIndex | None  ) = None, ) -> DataType | Type 

Creates a 64-bit float value type.

Parameters:

Name Type Description Default
index_type str | IndexType | NumericIndex | None

str | IndexType | NumericIndex | None How to index the data in the column for faster searching. Options are:

  • :class:deeplake.types.Inverted

Default is None meaning "do not index"

None

Returns:

Type Description
DataType | Type

DataType | Type: A new 64-bit float data type.

Examples:

Create a column with 64-bit float type:

ds.add_column("col", types.Float64) ds.add_column("idx_col", deeplake.types.Float64(deeplake.types.NumericIndex(deeplake.types.Inverted))) ds.add_column("idx_col_1", deeplake.types.Float64(deeplake.types.Inverted)) ds.add_column("idx_col_2", deeplake.types.Float64("Inverted")) 

deeplake.types.Bool

Bool() -> DataType 

Creates a boolean value type.

Returns:

Name Type Description
DataType DataType

A new boolean data type.

Examples:

Create columns with boolean type:

ds.add_column("col1", types.Bool) ds.add_column("col2", "bool") 

deeplake.types.ClassLabel

ClassLabel(dtype: DataType | str) -> Type 

Stores categorical labels as numerical values with a mapping to class names.

ClassLabel is designed for classification tasks where you want to store labels as efficient numerical indices while maintaining human-readable class names. The class names are stored in the column's metadata under the key "class_names", and the actual data contains numerical indices pointing to these class names.

Parameters:

Name Type Description Default
dtype DataType | str

DataType | str The datatype for storing the numerical class indices. Common choices are "uint8", "uint16", "uint32" or their DataType equivalents. Choose based on the number of classes you have.

required
How it works
  1. Define a column with ClassLabel type
  2. Set the "class_names" in the column's metadata as a list of strings
  3. Store numerical indices (0, 1, 2, ...) that map to the class names
  4. When reading, you can use the metadata to convert indices back to class names

Examples:

Basic usage with class labels:

# Create a column for object categories ds.add_column("categories", types.ClassLabel(types.Array("uint32", 1)))  # Define the class names in metadata ds["categories"].metadata["class_names"] = ["person", "car", "dog", "cat"]  # Store numerical indices corresponding to class names # 0 = "person", 1 = "car", 2 = "dog", 3 = "cat" ds.append({  "categories": [np.array([0, 1], dtype="uint32")] # person and car }) ds.append({  "categories": [np.array([2, 3], dtype="uint32")] # dog and cat })  # Access the numerical values print(ds[0]["categories"]) # Output: [0 1]  # Get the class names from metadata class_names = ds["categories"].metadata["class_names"] indices = ds[0]["categories"] labels = [class_names[i] for i in indices] print(labels) # Output: ['person', 'car'] 

Advanced usage from COCO ingestion pattern:

# This example shows the pattern used in COCO dataset ingestion # where you have multiple annotation groups  # Create dataset ds = deeplake.create("tmp://")  # Add category columns with ClassLabel type ds.add_column("categories", types.ClassLabel(types.Array("uint32", 1))) ds.add_column("super_categories", types.ClassLabel(types.Array("uint32", 1)))  # Set class names from COCO categories ds["categories"].metadata["class_names"] = [  "person", "bicycle", "car", "motorcycle", "airplane" ] ds["super_categories"].metadata["class_names"] = [  "person", "vehicle", "animal" ]  # Ingest data with numerical indices # Categories: [0, 2, 1] maps to ["person", "car", "bicycle"] # Super categories: [0, 1, 1] maps to ["person", "vehicle", "vehicle"] ds.append({  "categories": [np.array([0, 2, 1], dtype="uint32")],  "super_categories": [np.array([0, 1, 1], dtype="uint32")] }) 

Using different data types for different numbers of classes:

# For datasets with fewer than 256 classes, use uint8 ds.add_column("small_set", types.ClassLabel(types.Array("uint8", 1))) ds["small_set"].metadata["class_names"] = ["class_a", "class_b"]  # For datasets with more classes, use uint16 or uint32 ds.add_column("large_set", types.ClassLabel(types.Array("uint32", 1))) ds["large_set"].metadata["class_names"] = [f"class_{i}" for i in range(1000)] 

Numeric Indexing

Numeric columns support indexing for efficient comparison operations:

# Create numeric column with inverted index for range queries ds.add_column("timestamp", deeplake.types.UInt64())  # Create the index manually ds["timestamp"].create_index(  deeplake.types.NumericIndex(deeplake.types.Inverted) )  # Now you can use efficient comparison operations in queries: # - Greater than: WHERE timestamp > 1609459200 # - Less than: WHERE timestamp < 1640995200  # - Between: WHERE timestamp BETWEEN 1609459200 AND 1640995200 # - Value list: WHERE timestamp IN (1609459200, 1640995200) 

deeplake.types.Audio

Audio(  dtype: DataType | str = "uint8",  sample_compression: str = "mp3", ) -> Type 

Creates an audio data type.

Parameters:

Name Type Description Default
dtype DataType | str

DataType | str The datatype of the audio samples. Defaults to "uint8".

'uint8'
sample_compression str

str The compression format for the audio samples wav or mp3. Defaults to "mp3".

'mp3'

Returns:

Name Type Description
Type Type

A new audio data type.

Examples:

Create an audio column with default settings:

ds.add_column("col1", types.Audio()) 

Create an audio column with specific sample compression:

ds.add_column("col2", types.Audio(sample_compression="wav")) 

# Basic audio storage ds.add_column("audio", deeplake.types.Audio())  # WAV format ds.add_column("audio", deeplake.types.Audio(  sample_compression="wav" ))  # MP3 compression (default) ds.add_column("audio", deeplake.types.Audio(  sample_compression="mp3" ))  # With specific dtype ds.add_column("audio", deeplake.types.Audio(  dtype="uint8",  sample_compression="wav" ))  # Audio with Link for external references ds.add_column("audio_links", deeplake.types.Link(  deeplake.types.Audio(sample_compression="mp3") )) 

deeplake.types.Image

Image(  dtype: DataType | str = "uint8",  sample_compression: str = "png", ) -> Type 

An image of a given format. The value returned will be a multidimensional array of values rather than the raw image bytes.

Available sample_compressions:

  • png (default)
  • jpg / jpeg

Parameters:

Name Type Description Default
dtype DataType | str

The data type of the array elements to return

'uint8'
sample_compression str

The on-disk compression/format of the image

'png'

Examples:

ds.add_column("col1", types.Image) ds.add_column("col2", types.Image(sample_compression="jpg")) 
# Basic image storage ds.add_column("images", deeplake.types.Image())  # JPEG compression ds.add_column("images", deeplake.types.Image(  sample_compression="jpeg" ))  # With specific dtype ds.add_column("images", deeplake.types.Image(  dtype="uint8" # 8-bit RGB )) 

deeplake.types.Embedding

Embedding(  size: int | None = None,  dtype: DataType | str = "float32",  index_type: (  EmbeddingIndexType | QuantizationType | None  ) = None, ) -> Type 

Creates a single-dimensional embedding of a given length.

Parameters:

Name Type Description Default
size int | None

int | None The size of the embedding

None
dtype DataType | str

DataType | str The datatype of the embedding. Defaults to float32

'float32'
index_type EmbeddingIndexType | QuantizationType | None

EmbeddingIndexType | QuantizationType | None How to compress the embeddings in the index. Default uses no compression, but can be set to :class:deeplake.types.QuantizationType.Binary

None

Returns:

Name Type Description
Type Type

A new embedding data type.

See Also

:func:deeplake.types.Array for a multidimensional array.

Examples:

Create embedding columns:

ds.add_column("col1", types.Embedding(768)) ds.add_column("col2", types.Embedding(768, index_type=types.EmbeddingIndex(types.ClusteredQuantized))) 

# Basic embeddings ds.add_column("embeddings", deeplake.types.Embedding(768))  # With binary quantization for faster search ds.add_column("embeddings", deeplake.types.Embedding(  size=768,  index_type=deeplake.types.EmbeddingIndex(deeplake.types.ClusteredQuantized) ))  # Custom dtype ds.add_column("embeddings", deeplake.types.Embedding(  size=768,  dtype="float32" )) 

deeplake.types.Text

Text(  index_type: str | IndexType | TextIndex | None = None,  chunk_compression: str | None = "lz4", ) -> Type 

Creates a text data type of arbitrary length.

Parameters:

Name Type Description Default
index_type str | IndexType | TextIndex | None

str | IndexType | TextIndex | None How to index the data in the column for faster searching. Options are:

  • :class:deeplake.types.Inverted
  • :class:deeplake.types.BM25
  • :class:deeplake.types.Exact

Default is None meaning "do not index"

None
chunk_compression str | None

str | None defines the compression algorithm for on-disk storage of text data. supported values are 'lz4', 'zstd', and 'null' (no compression).

Default is lz4

'lz4'

Returns:

Name Type Description
Type Type

A new text data type.

Examples:

Create text columns with different configurations:

ds.add_column("col1", types.Text) ds.add_column("col2", "text") ds.add_column("col3", str) ds.add_column("col4", types.Text(index_type=types.Inverted)) ds.add_column("col5", types.Text(index_type=types.BM25)) 

# Basic text ds.add_column("text", deeplake.types.Text())  # Text with BM25 index for semantic search ds.add_column("text2", deeplake.types.Text(  index_type=deeplake.types.BM25 ))  # Text with inverted index for keyword search ds.add_column("text3", deeplake.types.Text(  index_type=deeplake.types.Inverted ))  # Text with exact index for whole text matching ds.add_column("text4", deeplake.types.Text(  index_type=deeplake.types.Exact )) 

deeplake.types.Dict

Dict(  index_type: str | IndexType | JsonIndex | None = None, ) -> Type 

Creates a type that supports storing arbitrary key/value pairs in each row.

Parameters:

Name Type Description Default
index_type str | IndexType | JsonIndex | None

str | IndexType | JsonIndex | None How to index the data in the column for faster searching. Options are:

  • :class:deeplake.types.Inverted

Default is None meaning "do not index"

None

Returns:

Name Type Description
Type Type

A new dictionary data type.

See Also

:func:deeplake.types.Struct for a type that supports defining allowed keys.

Examples:

Create and use a dictionary column:

ds.add_column("col1", types.Dict) ds.append([{"col1": {"a": 1, "b": 2}}]) ds.append([{"col1": {"b": 3, "c": 4}}]) 

# Store arbitrary key/value pairs ds.add_column("metadata", deeplake.types.Dict())  # Add data ds.append([{  "metadata": {  "timestamp": "2024-01-01",  "source": "camera_1",  "settings": {"exposure": 1.5}  } }]) 

deeplake.types.Array

Array(dtype: DataType | str, dimensions: int) -> DataType 
Array(dtype: DataType | str, shape: list[int]) -> DataType 
Array(dtype: DataType | str) -> DataType 
Array(  dtype: DataType | str,  dimensions: int | None,  shape: list[int] | None, ) -> DataType 

Creates a generic array of data.

Parameters:

Name Type Description Default
dtype DataType | str

DataType | str The datatype of values in the array

required
dimensions int | None

int | None The number of dimensions/axes in the array. Unlike specifying shape, there is no constraint on the size of each dimension.

required
shape list[int] | None

list[int] | None Constrain the size of each dimension in the array

required

Returns:

Name Type Description
DataType DataType

A new array data type with the specified parameters.

Examples:

Create a three-dimensional array, where each dimension can have any number of elements:

ds.add_column("col1", types.Array("int32", dimensions=3)) 

Create a three-dimensional array, where each dimension has a known size:

ds.add_column("col2", types.Array(types.Float32(), shape=[50, 30, 768])) 

# Fixed-size array ds.add_column("features", deeplake.types.Array(  "float32",  shape=[512] # Enforces size ))  # Variable-size array ds.add_column("sequences", deeplake.types.Array(  "int32",  dimensions=1 # Allows any size )) 

Numeric Indexes

Deep Lake supports indexing numeric columns for faster lookup operations:

from deeplake.types import NumericIndex, Inverted # Add numeric column and create an inverted index ds.add_column("scores", "float32") ds["scores"].create_index(NumericIndex(Inverted))  # Use with TQL for efficient filtering results = ds.query("SELECT * WHERE CONTAINS(scores, 0.95)") 

deeplake.types.Bytes

Bytes() -> DataType 

Creates a byte array value type. This is useful for storing raw binary data.

Returns:

Name Type Description
DataType DataType

A new byte array data type.

Examples:

Create columns with byte array type:

ds.add_column("col1", types.Bytes) ds.add_column("col2", "bytes") 

Append raw binary data to a byte array column:

ds.append([{"col1": b"hello", "col2": b"world"}]) 

deeplake.types.BinaryMask

BinaryMask(  sample_compression: str | None = None,  chunk_compression: str | None = None, ) -> Type 

In binary mask, pixel value is a boolean for whether there is/is-not an object of a class present.

NOTE: Since binary masks often contain large amounts of data, it is recommended to compress them using lz4.

Parameters:

Name Type Description Default
sample_compression str | None

How to compress each row's value. supported values are lz4, zstd, and null (no compression).

None
chunk_compression str | None

Defines the compression algorithm for on-disk storage of mask data. supported values are lz4, zstd, and null (no compression).

None

Examples:

ds.add_column("col1", types.BinaryMask(sample_compression="lz4")) ds.append([{"col1": np.zeros((512, 512, 5), dtype="bool")}]) 
# Basic binary mask ds.add_column("masks", deeplake.types.BinaryMask())  # With compression ds.add_column("masks", deeplake.types.BinaryMask(  sample_compression="lz4" )) 

deeplake.types.SegmentMask

SegmentMask(  dtype: DataType | str = "uint8",  sample_compression: str | None = None,  chunk_compression: str | None = None, ) -> Type 

Segmentation masks are 2D representations of class labels where a numerical class value is encoded in an array of same shape as the image.

NOTE: Since segmentation masks often contain large amounts of data, it is recommended to compress them using lz4.

Parameters:

Name Type Description Default
sample_compression str | None

How to compress each row's value. supported values are lz4, zstd, and null (no compression).

None
chunk_compression str | None

Defines the compression algorithm for on-disk storage of mask data. supported values are lz4, zstd, png, nii, nii.gz, and null (no compression).

None

Examples:

ds.add_column("col1", types.SegmentMask(sample_compression="lz4")) ds.append([{"col1": np.zeros((512, 512, 3))}]) 
# Basic segmentation mask ds.add_column("segmentation", deeplake.types.SegmentMask())  # With compression ds.add_column("segmentation", deeplake.types.SegmentMask(  dtype="uint8",  sample_compression="lz4" )) 

deeplake.types.BoundingBox

BoundingBox(  dtype: DataType | str = "float32",  format: str | None = None,  bbox_type: str | None = None, ) -> Type 

Stores an array of values specifying the bounding boxes of an image.

Parameters:

Name Type Description Default
dtype DataType | str

The datatype of values (default float32)

'float32'
format str | None

The bounding box format. Possible values: ccwh, ltwh, ltrb, unknown

None
bbox_type str | None

The pixel type. Possible values: pixel, fractional

None

Examples:

ds.add_column("col1", types.BoundingBox()) ds.add_column("col2", types.BoundingBox(format="ltwh")) 
# Basic bounding boxes ds.add_column("boxes", deeplake.types.BoundingBox())  # With specific format ds.add_column("boxes", deeplake.types.BoundingBox(  format="ltwh" # left, top, width, height )) 

deeplake.types.Point

Point(dimensions: int = 2) -> Type 

Point datatype for storing points with ability to visualize them.

Parameters:

Name Type Description Default
dimensions int

The dimension of the point. For example, 2 for 2D points, 3 for 3D points, etc.: defaults to "2"

2

Examples:

ds.add_column("col1", types.Point()) ds.append([{"col1": [[1.0, 2.0], [0.0, 1.0]]}]) 

deeplake.types.Polygon

Polygon() -> Type 

Polygon datatype for storing polygons with ability to visualize them.

Examples:

ds.add_column("col1", deeplake.types.Polygon()) poly1 = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]) poly2 = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]) ds.append({"col1": [[poly1, poly2], [poly1, poly2]]}) print(ds[0]["col1"]) # Output: [[[1. 2.] # [3. 4.] # [5. 6.]]  # [[1. 2.] # [3. 4.] # [5. 6.]]] print(ds[1]["col1"]) # Output: [[[1. 2.] # [3. 4.] # [5. 6.]] # [[1. 2.] # [3. 4.] # [5. 6.]]] 

deeplake.types.Video

Video(compression: str = 'mp4') -> Type 

Video datatype for storing videos.

Parameters:

Name Type Description Default
compression str

The compression format. Only H264 codec is supported at the moment.

'mp4'

Examples:

ds.add_column("video", types.Video(compression="mp4"))  with open("path/to/video.mp4", "rb") as f:  bytes_data = f.read()  ds.append([{"video": bytes_data}]) 

deeplake.types.Medical

Medical(compression: str) -> Type 

Medical datatype for storing medical images.

Available compressions:

  • nii
  • nii.gz
  • dcm

Parameters:

Name Type Description Default
compression str

How to compress each row's value. Possible values: dcm, nii, nii.gz

required

Examples:

ds.add_column("col1", types.Medical(compression="dcm"))  with open("path/to/dicom/file.dcm", "rb") as f:  bytes_data = f.read()  ds.append([{"col1": bytes_data}]) 

deeplake.types.Struct

Struct(fields: dict[str, DataType | str | Type]) -> Type 

Defines a custom datatype with specified keys.

See deeplake.types.Dict for a type that supports different key/value pairs per value.

Parameters:

Name Type Description Default
fields dict[str, DataType | str | Type]

A dict where the key is the name of the field, and the value is the datatype definition for it

required

Examples:

ds.add_column("col1", types.Struct({  "field1": types.Int16(),  "field2": "text", }))  ds.append([{"col1": {"field1": 3, "field2": "a"}}]) print(ds[0]["col1"]["field1"]) # Output: 3 
# Define fixed structure with specific types ds.add_column("info", deeplake.types.Struct({  "id": deeplake.types.Int64(),  "name": "text",  "score": deeplake.types.Float32() }))  # Add data ds.append([{  "info": {  "id": 1,  "name": "sample",  "score": 0.95  } }]) 

deeplake.types.Sequence

Sequence(nested_type: DataType | str | Type) -> Type 

Creates a sequence type that represents an ordered list of other data types.

A sequence maintains the order of its values, making it suitable for time-series data like videos (sequences of images).

Parameters:

Name Type Description Default
nested_type DataType | str | Type

DataType | str | Type The data type of the values in the sequence. Can be any data type, not just primitive types.

required

Returns:

Name Type Description
Type Type

A new sequence data type.

Examples:

Create a sequence of images:

ds.add_column("col1", types.Sequence(types.Image(sample_compression="jpg"))) 

# Sequence of images (e.g., video frames) ds.add_column("frames", deeplake.types.Sequence(  deeplake.types.Image(sample_compression="jpeg") ))  # Sequence of embeddings ds.add_column("token_embeddings", deeplake.types.Sequence(  deeplake.types.Embedding(768) ))  # Add data ds.append([{  "frames": [frame1, frame2, frame3], # List of images  "token_embeddings": [emb1, emb2, emb3] # List of embeddings }]) 
Link(type: DataType | Type) -> Type 

A link to an external resource. The value returned will be a reference to the external resource rather than the raw data.

Link only supports the Bytes DataType and the Image, SegmentMask, Medical, and Audio Types.

Parameters:

Name Type Description Default
type DataType | Type

The type of the linked data. Must be the Bytes DataType or one of the following Types: Image, SegmentMask, Medical, or Audio.

required

Examples:

ds.add_column("col1", types.Link(types.Image())) 

Index Types

Deep Lake supports several index types for optimizing queries on different data types.

IndexType Enum

deeplake.types.IndexType

Enumeration of available text/numeric/JSON/embeddings/embeddings matrix indexing types.

Attributes:

Name Type Description
Inverted IndexType

An index that supports keyword lookup. Can be used with CONTAINS(column, 'wanted_value').

BM25 IndexType

A BM25-based index of text data. Can be used with BM25_SIMILARITY(column, 'search text') in a TQL ORDER BY clause.

Exact IndexType

An exact match index for text data.

PooledQuantized IndexType

A pooled quantized index for 2D embeddings matrices. Can be used with MAXSIM(column, query_embeddings) for ColBERT-style maximum similarity search.

Clustered IndexType

Clusters embeddings in the index to speed up search. This is the default index type for embeddings.

ClusteredQuantized IndexType

Stores a binary quantized representation of the original embedding in the index rather than a full copy of the embedding. This slightly decreases accuracy of searches, while significantly improving query time.

BM25 class-attribute

BM25: IndexType 

Clustered class-attribute

Clustered: IndexType 

ClusteredQuantized class-attribute

ClusteredQuantized: IndexType 

Exact class-attribute

Exact: IndexType 

Inverted class-attribute

Inverted: IndexType 

PooledQuantized class-attribute

PooledQuantized: IndexType 

__hash__

__hash__() -> int 

__index__

__index__() -> int 

__init__

__init__(value: int) -> None 

__int__

__int__() -> int 

__members__ class-attribute

__members__: dict[str, IndexType] 

name property

name: str 

value property

value: int 

Returns:

Name Type Description
int int

The integer value of the text index type.

Text Index Types

deeplake.types.TextIndex

Represents a text column index type.

Used to create indexes on text columns for faster query performance. Supports inverted indexing (CONTAINS), BM25 similarity search, and exact matching.

__hash__ class-attribute

__hash__: None = None 

__init__

__init__(type: IndexType) -> None 

deeplake.types.Inverted module-attribute

Inverted: Inverted 

A text index that supports keyword lookup.

This index can be used with CONTAINS(column, 'wanted_value').

deeplake.types.BM25 module-attribute

BM25: BM25 

A BM25-based index of text data.

This index can be used with BM25_SIMILARITY(column, 'search text') in a TQL ORDER BY clause.

See Also

BM25 Algorithm <https://en.wikipedia.org/wiki/Okapi_BM25>_

deeplake.types.Exact module-attribute

Exact: Exact 

A text index that supports whole text lookup.

This index can be used with EQUALS(column, 'wanted_value').

Numeric Index Types

deeplake.types.NumericIndex

Represents a numeric column index type.

Used to create indexes on numeric columns for faster query performance. Supports inverted indexing for CONTAINS operations.

__hash__ class-attribute

__hash__: None = None 

__init__

__init__(type: IndexType) -> None 

JSON Index Types

deeplake.types.JsonIndex

Represents a Dict column index type.

Used to create indexes on Dict columns for faster query performance. Supports inverted indexing for CONTAINS operations on JSON fields.

__hash__ class-attribute

__hash__: None = None 

__init__

__init__(type: IndexType) -> None 

Embedding Index Types

deeplake.types.EmbeddingIndexType

Represents embedding index type.

__init__

__init__(type: IndexType) -> None 
__init__(quantization: QuantizationType) -> None 
__init__(type: IndexType | QuantizationType) -> None 

deeplake.types.EmbeddingIndex

EmbeddingIndex(  type: IndexType | QuantizationType | None = None, ) -> EmbeddingIndexType 

Creates an embedding index.

Parameters:

Name Type Description Default
type IndexType | QuantizationType | None

IndexType | QuantizationType | None = None The index type for embeddings. Can be:

  • :class:deeplake.types.IndexType.Clustered - Default clustered index
  • :class:deeplake.types.IndexType.ClusteredQuantized - Quantized clustered index
  • :class:deeplake.types.QuantizationType.Binary - Binary quantization (maps to ClusteredQuantized)
None

Returns:

Name Type Description
Type EmbeddingIndexType

EmbeddingIndexType.

Examples:

Create embedding columns with different index types:

# Using IndexType enum ds.add_column("col1", types.Embedding(768, index_type=types.EmbeddingIndex(types.IndexType.ClusteredQuantized)))  # Using QuantizationType for backward compatibility ds.add_column("col2", types.Embedding(768, index_type=types.EmbeddingIndex(types.QuantizationType.Binary))) 

deeplake.types.EmbeddingsMatrixIndexType

Represents a 2D embeddings matrix index type.

Used for ColBERT-style maximum similarity search on 2D embedding matrices. Supports pooled quantized indexing for efficient MAXSIM queries.

__init__

__init__() -> None 

deeplake.types.EmbeddingsMatrixIndex

EmbeddingsMatrixIndex() -> EmbeddingsMatrixIndexType 

Creates an embeddings matrix index.

Generic Index Wrapper

deeplake.types.Index

Represents all available index types in the deeplake. This is a polymorphic wrapper that can hold any specific index type.

__hash__ class-attribute

__hash__: None = None 

__init__

__init__(  index_type: (  TextIndex  | EmbeddingIndexType  | EmbeddingsMatrixIndexType  | JsonIndex  | NumericIndex  ), ) -> None 
# Create numeric index for efficient range queries ds.add_column("age", deeplake.types.Int32()) ds["age"].create_index(  deeplake.types.NumericIndex(deeplake.types.Inverted) )  # Use in queries with comparison operators results = ds.query("SELECT * WHERE age > 25") results = ds.query("SELECT * WHERE age BETWEEN 18 AND 65") results = ds.query("SELECT * WHERE age IN (25, 30, 35)")