Bambang Purnomosidi D. P. for Zimera Corporation

Posted on Aug 28, 2021

Using YugabyteDB in Python App Development

#python #yugabytedb #cassandra #postgres

As a long time PostgreSQL user, it's normal that YugabyteDB got my attention. It's a distributed SQL database which features-compatible with PostgreSQL so that I don't need to lose my investation. YugabyteDB reuses PostgreSQL's query layer, so I can use my development tools as usual. It gives me more since there is also Cassandra query layer. Using YugabyteDB I can reuse my SQL skill and development tools, enhanced with NOSQL data model (especially wide-column store), while I have NewSQL's resiliency, scalability, and high performance.

In this article, I am going to setup YugabyteDB for local cluster, populate data, and access the data using Python3.

Install and Configure YugabyteDB

Installation is very easy and straightforward. No source code compilation (well, unless you really want to dig that deep). Just download YugabyteDB, extract, then execute bin/post-install.sh. When this step has finished, make sure that path/to/extracted/yugabytedb/bin is in the $PATH. Here's what I did:

Fish Shell

$ set -x path/to/extracted/yugabytedb/bin $PATH

Bash Shell

$ export PATH=path/to/extracted/yugabytedb/bin:$PATH

Note: to avoid writing the same thing over and over again, I usually put the set or export statements above inside a file and then whenever I want to use YugabyteDB, I just source the-file.sh.

Next, we need to configure YugabyteDB. To configure YugabyteDB, see YugabyteDB configuration. The documentation is complete but to have everything works, all I need to do is changing /etc/security/limits.conf to:

# /etc/security/limits.conf # #Each line describes a limit for a user in the form: ... ... ... #<domain> <type> <item> <value> # * - core unlimited * - data unlimited * - fsize unlimited * - sigpending 119934 * - memlock 64 * - rss unlimited * - nofile 1048576 * - msgqueue 819200 * - stack 8192 * - cpu unlimited * - nproc 12000 * - locks unlimited # End of file

Depends on your situation, maybe you need to restart or just logout-login back.

All data, logs, configurations, etc for YugabyteDB reside in $HOME/var/. Do check $HOME/var/conf/yugabytedb.conf for more configuration:

{ "tserver_webserver_port": 9000, "master_rpc_port": 7100, "universe_uuid": "dabc3d28-6982-4585-8b10-5faa7352da02", "webserver_port": 7200, "ysql_enable_auth": false, "cluster_member": true, "ycql_port": 9042, "data_dir": "/home/bpdp/var/data", "tserver_uuid": "71ad70b8eef149ae945842572e0fff75", "use_cassandra_authentication": false, "log_dir": "/home/bpdp/var/logs", "polling_interval": "5", "listen": "0.0.0.0", "callhome": true, "master_webserver_port": 7000, "master_uuid": "1ef618e573a04e1d835f4ed4364825d7", "master_flags": "", "node_uuid": "6ae31951-7199-4c22-b30b-e8f235cef7db", "join": "", "ysql_port": 5433, "tserver_flags": "", "tserver_rpc_port": 9100 }

Note: PostgreSQL usually uses port 5432, but YugabyteDB default port is 5433. Pay attention to this since we are going to use this when we write our code.

So many things we can do with YugabyteDB, but for this article, I will concentrate more on Python app development. Therefore, it's enough now to have local cluster. Let's set it up!.

Let's run YugabyteDB:

$ yugabyted start Starting yugabyted... ✅ System checks +--------------------------------------------------------------------------------------------------+ | yugabyted | +--------------------------------------------------------------------------------------------------+ | Status : Running. Leader Master is present | | Web console : http://127.0.0.1:7000 | | JDBC : jdbc:postgresql://127.0.0.1:5433/yugabyte?user=yugabyte&password=yugabyte | | YSQL : bin/ysqlsh -U yugabyte -d yugabyte | | YCQL : bin/ycqlsh -u cassandra | | Data Dir : /home/bpdp/var/data | | Log Dir : /home/bpdp/var/logs | | Universe UUID : dabc3d28-6982-4585-8b10-5faa7352da02 | +--------------------------------------------------------------------------------------------------+ 🚀 yugabyted started successfully! To load a sample dataset, try 'yugabyted demo'. 🎉 Join us on Slack at https://www.yugabyte.com/slack 👕 Claim your free t-shirt at https://www.yugabyte.com/community-rewards/ $

Check the status:

$ yugabyted status +--------------------------------------------------------------------------------------------------+ | yugabyted | +--------------------------------------------------------------------------------------------------+ | Status : Running. Leader Master is present | | Web console : http://127.0.0.1:7000 | | JDBC : jdbc:postgresql://127.0.0.1:5433/yugabyte?user=yugabyte&password=yugabyte | | YSQL : bin/ysqlsh -U yugabyte -d yugabyte | | YCQL : bin/ycqlsh -u cassandra | | Data Dir : /home/bpdp/var/data | | Log Dir : /home/bpdp/var/logs | | Universe UUID : dabc3d28-6982-4585-8b10-5faa7352da02 | +--------------------------------------------------------------------------------------------------+

Just in case you need to shutdown YugabyteDB:

$ yugabyted stop Stopped yugabyted using config /home/bpdp/var/conf/yugabyted.conf. $

Ok, now let YugabyteDB runs. We will use that for later processes.

Data Preparation

Now, it gets more interesting. Using one YugabyteDB server instance, we can use both SQL and Cassandra data model. Let's aggregate some data into PostgreSQL layer and Cassandra layer. For this purpose, we still use default user and password. Later on, you can manage the security side of YugabyteDB.

Note: default username and password for PostgreSQL layer: yugabyte:yugabyte, while for Cassandra: cassandra:cassandra.

SQL Data

YugabyteDB provides sample datasets for SQL data. We are going to use Northwind Traders Database. Get the DDL and Data scripts from the Northwind sample datasets URL. Follow this sceen dump to prepare the database, tables, and populate the data. The # pompt is the place to write command.

$ ysqlsh -U yugabyte ysqlsh (11.2-YB-2.7.2.0-b0) Type "help" for help. yugabyte=# create database northwind; CREATE DATABASE yugabyte=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------------+----------+----------+---------+-------------+----------------------- northwind | yugabyte | UTF8 | C | en_US.UTF-8 | postgres | postgres | UTF8 | C | en_US.UTF-8 | system_platform | postgres | UTF8 | C | en_US.UTF-8 | template0 | postgres | UTF8 | C | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | C | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres yugabyte | postgres | UTF8 | C | en_US.UTF-8 | (6 rows) yugabyte=# \c northwind You are now connected to database "northwind" as user "yugabyte". northwind=# \i northwind_ddl.sql  SET SET SET SET SET SET SET SET DROP TABLE DROP TABLE DROP TABLE DROP TABLE DROP TABLE DROP TABLE DROP TABLE DROP TABLE DROP TABLE DROP TABLE DROP TABLE DROP TABLE DROP TABLE DROP TABLE CREATE TABLE CREATE TABLE CREATE TABLE CREATE TABLE CREATE TABLE CREATE TABLE CREATE TABLE CREATE TABLE CREATE TABLE CREATE TABLE CREATE TABLE CREATE TABLE CREATE TABLE CREATE TABLE northwind=# \d List of relations Schema | Name | Type | Owner --------+------------------------+-------+---------- public | categories | table | yugabyte public | customer_customer_demo | table | yugabyte public | customer_demographics | table | yugabyte public | customers | table | yugabyte public | employee_territories | table | yugabyte public | employees | table | yugabyte public | order_details | table | yugabyte public | orders | table | yugabyte public | products | table | yugabyte public | region | table | yugabyte public | shippers | table | yugabyte public | suppliers | table | yugabyte public | territories | table | yugabyte public | us_states | table | yugabyte (14 rows) northwind=# \i northwind_data.sql ... ... ... INSERT 0 1 INSERT 0 1 INSERT 0 1 northwind=# select * from products; product_id | product_name | supplier_id | category_id | quantity_per_unit | unit_price | units_in_s tock | units_on_order | reorder_level | discontinued ------------+----------------------------------+-------------+-------------+----------------------+------------+----------- -----+----------------+---------------+-------------- 4 | Chef Anton's Cajun Seasoning | 2 | 2 | 48 - 6 oz jars | 22 | 53 | 0 | 0 | 0 46 | Spegesild | 21 | 8 | 4 - 450 g glasses | 12 | 95 | 0 | 0 | 0 73 | Röd Kaviar | 17 | 8 | 24 - 150 g jars | 15 | 101 | 0 | 5 | 0 29 | Thüringer Rostbratwurst | 12 | 6 | 50 bags x 30 sausgs. | 123.79 | 0 | 0 | 0 | 1 70 | Outback Lager | 7 | 1 | 24 - 355 ml bottles | 15 | 15 | 10 | 30 | 0 25 | NuNuCa Nuß-Nougat-Creme | 11 | 3 | 20 - 450 g glasses | 14 | 76 | 0 | 30 | 0 54 | Tourtière | 25 | 6 | 16 pies | 7.45 | 21 | 0 | 10 | 0 ... ... ... 17 | Alice Mutton | 7 | 6 | 20 - 1 kg tins | 39 | 0 | 0 | 0 | 1 59 | Raclette Courdavault | 28 | 4 | 5 kg pkg. | 55 | 79 | 0 | 0 | 0 (77 rows) northwind=#

Finish with SQL data, it's about the time to populate column-wide - Cassandra data.

NOSQL Column-wide Data - Apache Cassandra

We will use a simple keyspace: one keyspace, consists of one table. Create a CQL script file (here, the file name is zimera-employees.cql):

CREATE KEYSPACE zimera WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 3}; USE zimera; CREATE TABLE employees( e_id int PRIMARY KEY, e_fullname text, e_address text, e_dept text, e_role text ); INSERT INTO employees (e_id, e_fullname, e_address, e_dept, e_role) VALUES(1,'Zaky A. Aditya', 'Dusun Medelan, RT 01, Umbulmartani, Ngemplak, Sleman, DIY, Indonesia', 'Information Technology', 'Machine Learning Developer');

Execute the script:

$ ycqlsh -f zimera-employees.cql $

Check whether succeed or not:

$ ycqlsh -u cassandra Password: Connected to local cluster at 127.0.0.1:9042. [ycqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4] Use HELP for help. cassandra@ycqlsh> use zimera; cassandra@ycqlsh:zimera> select * from employees; e_id | e_fullname | e_address | e_dept | e_role ------+----------------+----------------------------------------------------------------------+------------------------+---------------------------- 1 | Zaky A. Aditya | Dusun Medelan, RT 01, Umbulmartani, Ngemplak, Sleman, DIY, Indonesia | Information Technology | Machine Learning Developer (1 rows) cassandra@ycqlsh:zimera> $

Python - Drivers

Since we are going to use both PostgreSQL and Apache Cassandra data model, we need to install those two drivers: psycopg2 for PostgreSQL and Python Driver for Apache Cassandra.

Note: currently, psycopg is still developing psycopg3. We do not use psycopg3 since it is still in early development stage, but once psycopg3 is released, there should be easy to migrate.

Install Cassandra Driver

$ conda install cassandra-driver ... ... ... added / updated specs: - cassandra-driver The following packages will be downloaded: package | build ---------------------------|----------------- cassandra-driver-3.25.0 | py39h27cfd23_0 3.0 MB ------------------------------------------------------------ Total: 3.0 MB The following NEW packages will be INSTALLED: blas pkgs/main/linux-64::blas-1.0-mkl cassandra-driver pkgs/main/linux-64::cassandra-driver-3.25.0-py39h27cfd23_0 intel-openmp pkgs/main/linux-64::intel-openmp-2021.3.0-h06a4308_3350 libev pkgs/main/linux-64::libev-4.33-h7b6447c_0 mkl pkgs/main/linux-64::mkl-2021.3.0-h06a4308_520 mkl-service pkgs/main/linux-64::mkl-service-2.4.0-py39h7f8727e_0 mkl_fft pkgs/main/linux-64::mkl_fft-1.3.0-py39h42c9631_2 mkl_random pkgs/main/linux-64::mkl_random-1.2.2-py39h51133e4_0 numpy pkgs/main/linux-64::numpy-1.20.3-py39hf144106_0 numpy-base pkgs/main/linux-64::numpy-base-1.20.3-py39h74d4b33_0 six pkgs/main/noarch::six-1.16.0-pyhd3eb1b0_0 Proceed ([y]/n)? y ... ... ... $

Install PostgreSQL Driver

$ conda install psycopg2 ... ... ... added / updated specs: - psycopg2 The following NEW packages will be INSTALLED: krb5 pkgs/main/linux-64::krb5-1.19.2-hac12032_0 libedit pkgs/main/linux-64::libedit-3.1.20210714-h7f8727e_0 libpq pkgs/main/linux-64::libpq-12.2-h553bfba_1 psycopg2 pkgs/main/linux-64::psycopg2-2.8.6-py39h3c74f83_1 Proceed ([y]/n)? y ... ... $

Note: I use conda for package management, but you don't need to. You can also use pip, in this case just pip install psycopg2 and pip install cassandra-driver.

Let's Code!

Accessing SQL Data

# code-sql.py import psycopg2 conn = psycopg2.connect( host="localhost", database="northwind", user="yugabyte", port="5433", password="yugabyte") # Open a cursor to perform database operations cur = conn.cursor() # Execute a query cur.execute("SELECT * FROM products") # Retrieve query results records = cur.fetchall() print(records[0])

Results:

$ python code-sql.py (4, "Chef Anton's Cajun Seasoning", 2, 2, '48 - 6 oz jars', 22.0, 53, 0, 0, 0) $

Accessing Cassandra Data Model

from cassandra.cluster import Cluster cluster = Cluster() session = cluster.connect('zimera') rows = session.execute('SELECT e_fullname, e_address, e_dept, e_role FROM employees') for row in rows: print(row.e_fullname, row.e_address, row.e_dept, row.e_role)

Results:

$ python code-cassandra.py Zaky A. Aditya Dusun Medelan, RT 01, Umbulmartani, Ngemplak, Sleman, DIY, Indonesia Information Technology Machine Learning Developer $

What if we want to use both data model in one python source code? Here you go:

# code-all.py import psycopg2 from cassandra.cluster import Cluster conn = psycopg2.connect( host="localhost", database="northwind", user="yugabyte", port="5433", password="yugabyte") # Open a cursor to perform database operations cur = conn.cursor() # Execute a query cur.execute("SELECT * FROM products") # Retrieve query results records = cur.fetchall() print(records[0]) cluster = Cluster() session = cluster.connect('zimera') rows = session.execute('SELECT e_fullname, e_address, e_dept, e_role FROM employees') for row in rows: print(row.e_fullname, row.e_address, row.e_dept, row.e_role)

Results:

$ python code-all.py (4, "Chef Anton's Cajun Seasoning", 2, 2, '48 - 6 oz jars', 22.0, 53, 0, 0, 0) Zaky A. Aditya Dusun Medelan, RT 01, Umbulmartani, Ngemplak, Sleman, DIY, Indonesia Information Technology Machine Learning Developer $