YugabyteDB distributed SQL features

Franck Pachot

Developer Advocate at 🍃 MongoDB 🔶AWS Data Hero,…

Published Jul 22, 2022

There is a new database every day and, because this is clearly where the technology is going, they all claim to be cloud native and distributed. Some are active-passive and, in an attempt to scale, they distribute reads in a non-ACID way, some distribute the storage but not the transactions, with writes still going to a monolithic master. Some distribute reads and writes to multiple shards, but with no global transactions, global indexes, or global integrity constraints. Distributed SQL is the term to use when all SQL features (including ACID transactions, consistent secondary indexes, and foreign keys) are provided in a scale-out infrastructure, as one logical database.

I'm writing this because I've seen a recent publication comparing one of those few real Distributed SQL Databases to one that is not (according to the points mentioned)... but the heading for the latter was "Yugabyte" ! It may be a typo made at some point when a marketing team was comparing multiple databases. I'm quite sure it is a mistake because all points, except one left with a question mark, were 100% wrong. Being wrong on facts is easy to check. I'm not talking about subjective benchmarks here.

I'll explain only the features listed in the article I've read, which are available in YugabyteDB, being a Distributed SQL Database, but there are many more of course. I choose to not link to this article because it will probably be fixed at some point and don't want to link to a webarchive version of it. You will find it if you want, but I don't think there is a good reason for that. I'm writing this post for those who have read those claims first, and want to check the facts, by "googling" the terms. I don't want to start a comparison war on that, because all features mentioned are, in my opinion, the basic features that should be found in all distributed SQL databases.

I'll mention the terms used in the other publication in italic and will link to the documentation for the main point, so that the facts can be checked (it may evolve in future versions)

Database Horizontal Scale

YugabyteDB goes beyond Node based, automated for both reads and writes because distribution, and replication, are done at the tablet level, to be more agile when extending the cluster

https://docs.yugabyte.com/preview/explore/linear-scalability/sharding-data/

Database Load Balancing

YugabyteDB has Detailed options to optimize storage, compute and latency distributing connections, data, and SQL processing to all nodes, and using smart drivers and PostgreSQL tablespaces to be more specific on the placement

https://docs.yugabyte.com/preview/explore/multi-region-deployments/

Failover

YugabyteDB is Fully automated for both reads and writes with the Raft protocol ensuring that there's only one leader per tablet, and electing a new leader in 3 seconds.

https://docs.yugabyte.com/preview/explore/fault-tolerance/

Automated Repair and RPO

YugabyteDB is Automated Repair RPO = zero with RTO in seconds (3 seconds for raft leader election, plus tunable TCP timeouts) for the impacted tablets. This is for one distributed cluster. YugabyteDB adds, on top of it, asynchronous replication between clusters which can provide RPO <10 sec for regions that you don't want to participate in the Raft quorum.

https://docs.yugabyte.com/preview/architecture/docdb-replication/replication/#rpo-and-rto-on-zone-outage

Distributed Reads, Distributed Transactions, and no potential data issues

Those are the basics of distributed SQL, the article got them right

https://www.yugabyte.com/tech/distributed-acid-transactions/

Database Isolation Levels

YugabyteDB provides all SQL isolation levels, from Read Committed to the Highest: Serializable. It would not be PostgreSQL compatible without Read Committed, which is the default in PG, and, by consequence, the most used by existing applications. Serializable has the advantage of not requiring additional code and tests to prevent anomalies but also requires more code to implement retry logic. Read Committed is free from serializable errors.

https://docs.yugabyte.com/preview/architecture/transactions/isolation-levels/

SQL

YugabyteDB is PostgreSQL compatible with one of its API (YSQL) in addition to the Cassandra compatible (YCQL) one. This goes far beyond being wire compatible with PostgreSQL because YugabyteDB actually reuses PostgreSQL: stored procedures, extensions, read committed...

https://docs.yugabyte.com/preview/explore/ysql-language-features/#sql-features-in-ysql

Data Geo-Partitoning

YugabyteDB can partition to zones, regions, or clouds, at the Row level, thanks to PostgreSQL declarative partitioning, and tablespaces

https://docs.yugabyte.com/preview/explore/multi-region-deployments/row-level-geo-partitioning/#example-scenario

Multi-region and Multi-cloud

YugabyteDB can be deployed anywhere, with sync replication for both reads and writes, across regions, cloud providers, and even hybrid with some nodes on-premises. The optimizer (which is the PostgreSQL query planner enhanced to be cluster-aware) knows the topology. And, because multi-region adds some latency, there are many possibilities to favor local reads: Local reads in geo-distributed YugabyteDB Series' Articles

Upgrade Method

YugabyteDB upgrade is Online, Rolling and, even if the managed services make it single-click, this, as all features, is free and open source. There's nothing like copyleft licenses or commercial editions for enterprise feature. YugabyteDB is Apache License v2.

https://docs.yugabyte.com/preview/manage/upgrade-deployment/

https://blog.yugabyte.com/why-we-changed-yugabyte-db-licensing-to-100-open-source/

I've left the two points where YugabyteDB support it is not a 100% "yes" for the end:

Database Schema Change

To be fully Online there's still work in progress for schema migrations. That's another reason why I think the article I've read just made a mistake mentioning Yugabyte because this was flagged as Online, Active, Dynamic which is not correct (at least for Online because I don't really know what means Active, Dynamic here). As all are open-source, even the roadmap and architecture choices, the status is easy to check on Github:

https://github.com/yugabyte/yugabyte-db/issues/4192

Cost Based Optimization

Here again, a lot of work is in progress. YugabyteDB is primarily optimized for OLTP but the performance of analytic queries requires continuous improvement to the query planner (https://github.com/yugabyte/yugabyte-db/issues/11842) and executor (https://github.com/yugabyte/yugabyte-db/issues/12294)

Facts vs guesses

🤔 I tried to guess with which database the article I read was mixing up. It looks like they compared with one that does not distribute writes (limited write, multi-region but not for writes), async streaming (Automated Repair RPO <10 sec), not PostgreSQL compatible (SQL with limitations). Well... there are plenty of possibilities. Actually, all traditional SQL databases which support async streaming replication to standby could apply to what they have put, incorrectly, under a "Yugabyte" header. And the question mark (?) for Cost Based Optimization may be a clue that they don't compare with one where all is documented and open source, easy to check.

I'm Developer Advocate at Yugabyte and this is why I know pretty well the features 🚀 but anyone can do their own research before reading or publishing comparisons. The beauty of Open Source is that you can base your opinion on facts, by testing it on the latest version. This is the docker-compose environment I use to demo those features: https://github.com/FranckPachot/ybdemo/tree/main/docker/yb-lab

You may also learn a lot by listening to the engineers at the YFTT live event each Friday, starting with the first episode where the founders explain the key concepts:

10 Comments

SlashDB

It is so extremely hard to bring to market yet alone profitably a better database. Good luck Yugabyte. We are rooting for you. Good thing it is wire compatible with PostgreSQL. It should work like a charm with SlashDB.

4 Reactions

Mark Callaghan

Databases & math

For MyRocks, table stats are updated during compaction. Eventually there were a few perf problems that required getting stats for new tables or small tables that have many rows in the memtable. But AFAIK all of that is automated. Hopefully other LSM-based DBMS use a similar approach.

1 Reaction

Mark Callaghan

Databases & math

Nice article although I hope the name changes from distributed SQL back to NewSQL - less typing.

2 Reactions

Hussein Nasser

Software Engineer | Talks about backend, databases and operating systems

So refreshing to read a non-marketing and purely technical article. These database terms have been overloaded and stretched out to oblivion. We need more people like you Franck who are in the weeds with databases to put out more articles like this. Good stuff! My favorite part was the intro.

30 Reactions

Scott McNealy

Founder, Advisor | Sun Microsystems

Going to the source is always a good move. Thanks Franck.

6 Reactions

See more comments

To view or add a comment, sign in

See all

YugabyteDB distributed SQL features

Franck Pachot

Developer Advocate at 🍃 MongoDB 🔶AWS Data Hero,…

Database Horizontal Scale

Database Load Balancing

Failover

Automated Repair and RPO

Distributed Reads, Distributed Transactions, and no potential data issues

Database Isolation Levels

SQL

Data Geo-Partitoning

Multi-region and Multi-cloud

Upgrade Method

Database Schema Change

Cost Based Optimization

Facts vs guesses

More articles by this author

Explore topics

Database Horizontal Scale

Database Load Balancing

Failover

Automated Repair and RPO

Distributed Reads, Distributed Transactions, and no potential data issues

Database Isolation Levels

SQL

Data Geo-Partitoning

Multi-region and Multi-cloud

Upgrade Method

Database Schema Change

Cost Based Optimization

Facts vs guesses

PostgreSQL Extensions and the "Anarchy in the Database" VLDB paper

Jul 1, 2025

DB-Engines Ranking Method Reviewed

Apr 29, 2025

A benchmark is published by a database vendor and guess what? 🥁 their database is faster 👏🏼

Apr 21, 2025

❝ The World Is Too Messy for SQL to Work ❞

Mar 31, 2025

Relational and Document Data Modeling (50 years ago, 25 years ago, and 2025)

Mar 11, 2025

2025: I'm joining MongoDB

Feb 6, 2025

Where is the database schema? #SQL #NoSQL

Jan 31, 2025

SQL Alone Isn’t Enough: Why Modern Applications Need More Than Just SQL

Nov 11, 2024

No Vacuum, No Bloat, No Downtime on Failover, No Lock Escalation, No Manual Sharding, No Delays in Cloning or Backup, No Outage for Database Upgrades

Nov 4, 2024

SQL database replication: Logical or Physical?

Sep 26, 2024

Explore topics