Documentation
Introduction

Introduction

Cube is the agentic analytics platform built on top of the open-source semantic layer.

Cube enables AI agents and users to query, explore, and manipulate data models — transforming the semantic layer into a dynamic, governed workspace for generating insights, automating workflows, and building data products.

Cube is a new generation of a BI platform built to be used by both humans and AI agents. It empowers different personas across your organization:

  • Data Engineers can quickly curate data models with AI assistance, accelerating the development and maintenance of semantic layers
  • Data Analysts can perform deep analysis with AI assistance, diving into complex data relationships and patterns
  • Business Users benefit from workbooks and dashboards that Cube can automatically build and maintain
  • AI Agents can be powered by Cube features through MCP and A2A integrations, enabling automated data discovery, analysis, and reporting workflows

With Cube, you can power copilots, automate data workflows, and create interactive analytics experiences—all grounded in a consistent and governed data model.

Semantic layer

At the foundation of Cube's agentic analytics platform is an open-source semantic layer (opens in a new tab)—the critical infrastructure that enables both AI agents and humans to work with trusted, consistent data.

The semantic layer provides the governed data foundation that makes agentic analytics possible. It organizes data from your cloud data warehouses into centralized, consistent definitions that AI agents can reliably query, explore, and reason about. Without a semantic layer, AI agents would struggle with inconsistent metrics, scattered business logic, and ungoverned data access—making their outputs unreliable and potentially dangerous.

By establishing a single source of truth for metrics, relationships, and business logic, the semantic layer ensures that AI agents and users work with the same trusted definitions. This consistency is essential for agentic analytics: when an AI agent generates insights or automates workflows, it relies on the semantic layer's data model to understand what metrics mean, how entities relate, and what data users are authorized to access.

The semantic layer also provides the performance and governance infrastructure needed for agentic workflows. Through caching and pre-aggregations, it ensures AI agents can respond quickly without overwhelming your data warehouse. Through access controls, it guarantees that agents respect the same data security policies as human users.

Data engineers use Cube's semantic layer to build and maintain data models, manage access control and caching, and expose data through REST, GraphQL, and SQL APIs—creating the governed foundation that powers agentic analytics experiences, traditional BI tools, and custom data applications.

Code-first

A code-first approach is essential for both traditional data engineering and agentic analytics. Managing data models, configurations, and policies as code enables the same proven practices that power modern software development: version control for collaboration and code reviews, automated testing and documentation, and established patterns for reusability and maintainability.

For agentic analytics specifically, a code-first semantic layer creates new possibilities. AI agents can help curate and maintain data models themselves, accelerating development while maintaining quality through git workflows. The structured, version-controlled nature of code makes it easier for agents to understand changes, suggest improvements, and even implement modifications autonomously.

Everything within Cube—from configurations to data models to access control policies—is managed through code. This foundation enables both human data engineers and AI agents to collaborate on building and maintaining the semantic layer that powers agentic analytics.

Four pillars of semantic layer

The semantic layer that powers Cube's agentic analytics platform is built on four essential pillars: data modeling, access control, caching, and APIs. Each pillar plays a critical role in enabling AI agents and users to work with data reliably, securely, and efficiently.

Data Modeling

The data model provides the knowledge graph that AI agents use to understand your business. It centralizes metric definitions, entity relationships, and business logic upstream from all consumption tools—whether those are AI agents, BI tools, or custom applications. This centralization is critical for agentic analytics: AI agents need a structured understanding of what metrics mean, how entities relate, and what calculations are valid.

When an AI agent analyzes sales performance or answers questions about customer behavior, it relies on the semantic layer's data model to understand that "revenue" is calculated consistently, that customers have orders, and that orders contain line items. This structured knowledge enables agents to generate reliable insights and navigate complex data relationships autonomously.

Cube's data model is code-first. Data teams define data models with YAML or JavaScript code, managed through version control systems. This enables AI-assisted development where agents can help curate and maintain the semantic layer itself, accelerating model development while maintaining quality through git workflows and multiple isolated environments.

Cube's data model is dataset-centric, inspired by and expanding upon dimensional modeling. You work with two types of objects:

Cubes represent business entities such as customers, line items, and orders. They define all calculations within measures and dimensions, as well as relationships between entities. These relationships form the knowledge graph that AI agents traverse when exploring data and generating insights.

Views sit on top of the data graph of cubes, creating facades that data consumers interact with. Think of views as the final data products for AI agents, BI users, and applications. Views select measures and dimensions from connected cubes and present them as unified datasets, providing AI agents with the right context and scope for specific analytical tasks.

Access Control

Access control ensures that AI agents respect the same data security policies as human users. This is critical for agentic analytics: when AI agents autonomously query and analyze data, they must enforce the same governance rules that apply to human users—whether that's row-level security, column-level restrictions, or data masking.

By centralizing access control in the semantic layer, you ensure that all data consumption—whether by AI agents, BI tools, or custom applications—goes through a single governed checkpoint. This provides comprehensive oversight and prevents agents from inadvertently exposing sensitive data or violating security policies.

Cube's code-first approach enables data teams to define access control policies with Python or JavaScript, ranging from simple row-level access rules to completely custom data models per tenant backed by different data sources. These policies apply uniformly to all consumers of the semantic layer, ensuring AI agents operate within the same security boundaries as human users.

Caching

Caching enables AI agents to deliver fast, interactive experiences without overwhelming your data infrastructure. For agentic analytics to be effective, AI agents must respond quickly to user questions, iteratively explore data, and generate insights in real-time. Without caching, every agent query would hit your data warehouse directly, creating latency issues and potentially significant costs.

The semantic layer acts as a performance buffer between AI agents and your data sources. Through intelligent caching, it ensures agents can work interactively while protecting your cloud data warehouse from unnecessary and redundant load.

Cube implements caching through an aggregate awareness framework called pre-aggregations. Data teams define pre-aggregates in the data model as rollup tables, including measures and dimensions. Cube builds and refreshes these pre-aggregates in the background by querying your cloud data warehouse and storing results in Cube Store, Cube's purpose-built caching engine backed by distributed file storage such as S3. Pre-aggregations can be refreshed on schedule or as part of workflow orchestration.

When an AI agent sends a query to Cube, the aggregate awareness engine determines if an existing and fresh pre-aggregate can serve that query. This significantly accelerates agent responses and reduces both latency and data warehouse costs—essential for enabling the iterative, exploratory workflows that characterize agentic analytics.

APIs

APIs enable AI agents, applications, and tools to interact with the semantic layer through standard protocols. For agentic analytics to work across diverse use cases—from AI-powered workbooks to embedded analytics to traditional BI—the semantic layer must provide universal interoperability. AI agents need to query data, introspect the data model, and integrate with other systems without requiring custom integrations for every tool or framework.

Rather than inventing proprietary protocols, Cube implements widely adopted standards: REST, GraphQL, and SQL.

REST and GraphQL provide modern API interfaces for building custom applications and enabling programmatic access. These APIs power agentic workflows, allowing AI agents to query data, retrieve results, and build interactive experiences.

SQL is universally adopted across the data stack. Every BI tool, visualization platform, and data application can query a SQL data source. Cube implements Postgres-compatible SQL and extends it to support semantic layer concepts like measures—special types that know how to evaluate themselves based on data model definitions. Any tool that can connect to Postgres or Redshift can connect to Cube, making the semantic layer accessible to both AI agents and traditional analytics tools.

Data model introspection through the meta API is essential for agentic analytics. It enables AI agents to discover available metrics, understand entity relationships, and determine valid queries—providing the context agents need to navigate the semantic layer autonomously. This same introspection capability allows BI tools to automatically map to data model objects and helps applications build dynamic interfaces.