YugabyteDB distributed SQL features
There is a new database every day and, because this is clearly where the technology is going, they all claim to be cloud native and distributed. Some are active-passive and, in an attempt to scale, they distribute reads in a non-ACID way, some distribute the storage but not the transactions, with writes still going to a monolithic master. Some distribute reads and writes to multiple shards, but with no global transactions, global indexes, or global integrity constraints. Distributed SQL is the term to use when all SQL features (including ACID transactions, consistent secondary indexes, and foreign keys) are provided in a scale-out infrastructure, as one logical database.
I'm writing this because I've seen a recent publication comparing one of those few real Distributed SQL Databases to one that is not (according to the points mentioned)... but the heading for the latter was "Yugabyte" ! It may be a typo made at some point when a marketing team was comparing multiple databases. I'm quite sure it is a mistake because all points, except one left with a question mark, were 100% wrong. Being wrong on facts is easy to check. I'm not talking about subjective benchmarks here.
I'll explain only the features listed in the article I've read, which are available in YugabyteDB, being a Distributed SQL Database, but there are many more of course. I choose to not link to this article because it will probably be fixed at some point and don't want to link to a webarchive version of it. You will find it if you want, but I don't think there is a good reason for that. I'm writing this post for those who have read those claims first, and want to check the facts, by "googling" the terms. I don't want to start a comparison war on that, because all features mentioned are, in my opinion, the basic features that should be found in all distributed SQL databases.
I'll mention the terms used in the other publication in italic and will link to the documentation for the main point, so that the facts can be checked (it may evolve in future versions)
Database Horizontal Scale
YugabyteDB goes beyond Node based, automated for both reads and writes because distribution, and replication, are done at the tablet level, to be more agile when extending the cluster
https://docs.yugabyte.com/preview/explore/linear-scalability/sharding-data/
Database Load Balancing
YugabyteDB has Detailed options to optimize storage, compute and latency distributing connections, data, and SQL processing to all nodes, and using smart drivers and PostgreSQL tablespaces to be more specific on the placement
https://docs.yugabyte.com/preview/explore/multi-region-deployments/
Failover
YugabyteDB is Fully automated for both reads and writes with the Raft protocol ensuring that there's only one leader per tablet, and electing a new leader in 3 seconds.
https://docs.yugabyte.com/preview/explore/fault-tolerance/
Automated Repair and RPO
YugabyteDB is Automated Repair RPO = zero with RTO in seconds (3 seconds for raft leader election, plus tunable TCP timeouts) for the impacted tablets. This is for one distributed cluster. YugabyteDB adds, on top of it, asynchronous replication between clusters which can provide RPO <10 sec for regions that you don't want to participate in the Raft quorum.
Distributed Reads, Distributed Transactions, and no potential data issues
Those are the basics of distributed SQL, the article got them right
https://www.yugabyte.com/tech/distributed-acid-transactions/
Database Isolation Levels
YugabyteDB provides all SQL isolation levels, from Read Committed to the Highest: Serializable. It would not be PostgreSQL compatible without Read Committed, which is the default in PG, and, by consequence, the most used by existing applications. Serializable has the advantage of not requiring additional code and tests to prevent anomalies but also requires more code to implement retry logic. Read Committed is free from serializable errors.
https://docs.yugabyte.com/preview/architecture/transactions/isolation-levels/
SQL
YugabyteDB is PostgreSQL compatible with one of its API (YSQL) in addition to the Cassandra compatible (YCQL) one. This goes far beyond being wire compatible with PostgreSQL because YugabyteDB actually reuses PostgreSQL: stored procedures, extensions, read committed...
https://docs.yugabyte.com/preview/explore/ysql-language-features/#sql-features-in-ysql
Data Geo-Partitoning
YugabyteDB can partition to zones, regions, or clouds, at the Row level, thanks to PostgreSQL declarative partitioning, and tablespaces
Multi-region and Multi-cloud
YugabyteDB can be deployed anywhere, with sync replication for both reads and writes, across regions, cloud providers, and even hybrid with some nodes on-premises. The optimizer (which is the PostgreSQL query planner enhanced to be cluster-aware) knows the topology. And, because multi-region adds some latency, there are many possibilities to favor local reads: Local reads in geo-distributed YugabyteDB Series' Articles
Upgrade Method
YugabyteDB upgrade is Online, Rolling and, even if the managed services make it single-click, this, as all features, is free and open source. There's nothing like copyleft licenses or commercial editions for enterprise feature. YugabyteDB is Apache License v2.
https://docs.yugabyte.com/preview/manage/upgrade-deployment/
https://blog.yugabyte.com/why-we-changed-yugabyte-db-licensing-to-100-open-source/
I've left the two points where YugabyteDB support it is not a 100% "yes" for the end:
Database Schema Change
To be fully Online there's still work in progress for schema migrations. That's another reason why I think the article I've read just made a mistake mentioning Yugabyte because this was flagged as Online, Active, Dynamic which is not correct (at least for Online because I don't really know what means Active, Dynamic here). As all are open-source, even the roadmap and architecture choices, the status is easy to check on Github:
https://github.com/yugabyte/yugabyte-db/issues/4192
Cost Based Optimization
Here again, a lot of work is in progress. YugabyteDB is primarily optimized for OLTP but the performance of analytic queries requires continuous improvement to the query planner (https://github.com/yugabyte/yugabyte-db/issues/11842) and executor (https://github.com/yugabyte/yugabyte-db/issues/12294)
Facts vs guesses
🤔 I tried to guess with which database the article I read was mixing up. It looks like they compared with one that does not distribute writes (limited write, multi-region but not for writes), async streaming (Automated Repair RPO <10 sec), not PostgreSQL compatible (SQL with limitations). Well... there are plenty of possibilities. Actually, all traditional SQL databases which support async streaming replication to standby could apply to what they have put, incorrectly, under a "Yugabyte" header. And the question mark (?) for Cost Based Optimization may be a clue that they don't compare with one where all is documented and open source, easy to check.
I'm Developer Advocate at Yugabyte and this is why I know pretty well the features 🚀 but anyone can do their own research before reading or publishing comparisons. The beauty of Open Source is that you can base your opinion on facts, by testing it on the latest version. This is the docker-compose environment I use to demo those features: https://github.com/FranckPachot/ybdemo/tree/main/docker/yb-lab
You may also learn a lot by listening to the engineers at the YFTT live event each Friday, starting with the first episode where the founders explain the key concepts:
It is so extremely hard to bring to market yet alone profitably a better database. Good luck Yugabyte. We are rooting for you. Good thing it is wire compatible with PostgreSQL. It should work like a charm with SlashDB.
Databases & math
2yFor MyRocks, table stats are updated during compaction. Eventually there were a few perf problems that required getting stats for new tables or small tables that have many rows in the memtable. But AFAIK all of that is automated. Hopefully other LSM-based DBMS use a similar approach.
Databases & math
2yNice article although I hope the name changes from distributed SQL back to NewSQL - less typing.
Software Engineer | Talks about backend, databases and operating systems
2ySo refreshing to read a non-marketing and purely technical article. These database terms have been overloaded and stretched out to oblivion. We need more people like you Franck who are in the weeds with databases to put out more articles like this. Good stuff! My favorite part was the intro.
Founder, Advisor | Sun Microsystems
2yGoing to the source is always a good move. Thanks Franck.