Cassandra backup and recovery

This section discusses how to configure data backup and recovery for the Apache Cassandra database ring installed in the Apigee hybrid runtime plane. See also Cassandra database.

What you need to know about Cassandra backups

Cassandra is a replicated database that is configured to have at least 3 copies of your data in each region or data center. Cassandra uses streaming replication and read repairs to maintain the data replicas in each region or data center at any given point.

In hybrid, Cassandra backups are not enabled by default. It's a good practice, however, to enable Cassandra backups in case your data is accidentally deleted.

What is backed up?

The backup configuration described in this topic backs up the following entities:

  • Cassandra schema including the user schema (Apigee keyspace definitions)
  • Cassandra partition token information per node
  • A snapshot of the Cassandra data

Where is backup data stored?

Backed up data is stored in a Google Cloud Storage (GCS) bucket that you must create. Bucket creation and configuration is covered in this topic.

Scheduling Cassandra backups

Backups are scheduled as cron jobs in the runtime plane. To schedule Cassandra backups:

  1. Run the following create-service-account command to create a GCP service account (SA) with the standard roles/storage.objectAdmin role. This SA role allows you to write backup data to Google Cloud Storage (GCS). Execute the following command in the hybrid installation root directory:
    ./tools/create-service-account apigee-cassandra output-dir
    For example:
    ./tools/create-service-account apigee-cassandra ./service-accounts
    For more information about GCP service accounts, see Creating and managing service accounts.
  2. The create-service-account command saves a JSON file containing the service account private key. The file is saved in the same directory where the command executes. You will need the path to this file in the following steps.
  3. Create a GCS bucket. Specify a reasonable data retention policy for the bucket. Apigee recommends a data retention policy of 15 days.
  4. Open your overrides.yaml file.
  5. Add the following cassandra.backup properties to enable backup. Do not remove any of the properties that are already configured.
    cassandra: ... backup: enabled: true serviceAccountPath: sa_json_file_path dbStorageBucket: gcs_bucket_path schedule: backup_schedule_code ...
    Where:
    Property Description
    enabled Backup is disabled by default. You must set this property to true
    serviceAccountPath The path on your filesystem to the service account JSON file that was downloaded when you ran ./tools/create-service-account
    dbStorageBucket GCS storage bucket path in this format: gs://bucket_name. The gs:// is required.
    schedule The time when the backup starts, specified in standard crontab syntax. Default: 0 2 * * *

    Note: Avoid scheduling a backup that starts a short time after you apply the backup configuration to your cluster. When you apply the backup configuration, Kubernetes recreates the Cassandra nodes. If the backup starts before the nodes restart (possibly several minutes) the backup will fail.

    For example:
    ... cassandra:  storage:  type: gcepd  capacity: 50Gi  gcepd:  replicationType: regional-pd  sslRootCAPath: "/Users/myhome/ssh/cassandra.crt"  sslCertPath: "/Users/myhome/ssh/cassandra.crt"  sslKeyPath: "/Users/myhome/ssh/cassandra.key"  auth:  default:  password: "abc123"  admin:  password: "abc234"  ddl:  password: "abc345"  dml:  password: "abc456"  nodeSelector:  key: cloud.google.com/gke-nodepool  value: apigee-data  backup:  enabled: true  serviceAccountPath: "/Users/myhome/.ssh/my_cassandra_backup.json"  dbStorageBucket: "gs://myname-cassandra-backup"  schedule: "45 23 * * 6"  ... 
  6. Apply the configuration changes to the new cluster. For example:
    ./apigeectl apply -c 2_cassandra -v beta2

Restoring backups

Restoration takes the data from the backup location and restores it into a new Cassandra cluster with the same number of pods. The new cluster must have a namespace that is different than your runtime plane cluster.

To restore Cassandra backups:

  1. Create a new Kubernetes cluster with a new namespace. You cannot use the same cluster/namespace that you used for the original hybrid installation.
  2. In the root hybrid installation directory, create a new overrides-restore.yaml file.
  3. Copy the complete Cassandra configuration from your original overrides.yaml file into the new one.
  4. Add a namespace element. Do not use the same namespace you used for your original cluster.
  5. namespace: your-restore-namespace cassandra:  storage:  type: gcepd  capacity: 50Gi  gcepd:  replicationType: regional-pd  nodeSelector:  key: cloud.google.com/gke-nodepool  value: apigee-data  sslRootCAPath: path_to_root_ca_file  sslCertPath: path_to_ssl_cert_file  sslKeyPath: path_to_ssl_key_file  auth:  default:  password: your_cassandra_password  admin:  password: admin_password  ddl:  password: ddl_password  dml:  password: dml_password  restore:  enabled: true  snapshotTimestamp: timestamp  serviceAccountPath: sa_json_file_path  dbStorageBucket: gcs_bucket_path  image:  pullPolicy: Always
    Where:
    Property Description
    ssl*Path, auth.* Use the same TLS auth credentials you used to create the original Cassandra database.
    snapshotTimestamp The timestamp of the backup snapshot to restore.
    serviceAccountPath The path on your filesystem to the service account you created for the backup.
    dbStorageBucket GCS storage bucket path where your backup is stored, in this format: gs://bucket_name. The gs:// is required.
    For example:
    namespace: cassandra-restore cassandra:  storage:  type: gcepd  capacity: 50Gi  gcepd:  replicationType: regional-pd  sslRootCAPath: "/Users/myhome/ssh/cassandra.crt"  sslCertPath: "/Users/myhome/ssh/cassandra.crt"  sslKeyPath: "/Users/myhome/ssh/cassandra.key"  auth:  default:  password: "abc123"  admin:  password: "abc234"  ddl:  password: "abc345"  dml:  password: "abc456"  nodeSelector:  key: cloud.google.com/gke-nodepool  value: apigee-data  restore:  enabled: true  snapshotTimestamp: "20190417002207"  serviceAccountPath: "/Users/myhome/.ssh/my_cassandra_backup.json"  dbStorageBucket: "gs://myname-cassandra-backup"  image:  pullPolicy: Always

    Where snapshotTimestamp is the timestamp associated with the backup you are restoring.

  6. Create the new Cassandra cluster:
     ./apigeectl apply -c 2_cassandra -v beta2 -f ./overrides-restore.yaml ./apigeectl apply -c 2_cassandra-role -v beta2 

Viewing the restore logs

You can check the restore job logs and grep for error to make sure the restore log has no errors.

Verify the restore completed

To check if the restore operation completed:

 kubectl get pods NAME READY STATUS RESTARTS AGE apigee-cassandra-0 1/1 Running 0 1h apigee-cassandra-1 1/1 Running 0 1h apigee-cassandra-2 1/1 Running 0 59m apigee-cassandra-restore-b4lgf 0/1 Completed 0 51m

View the restore logs

To view the restore logs:

 kubectl logs -f apigee-cassandra-restore-b4lgf Restore Logs: Activated service account credentials for: [apigee-cassandra-backup-svc@gce-myusername.iam.gserviceaccount.com] to download file gs://gce-myusername-apigee-cassandra-backup/apigeecluster/dc-1/backup_20190405011309_schema.tgz INFO: download sucessfully extracted the backup files from gs://gce-myusername-apigee-cassandra-backup/apigeecluster/dc-1 finished downloading schema.cql to create schema from 10.32.0.28 Warnings : dclocal_read_repair_chance table option has been deprecated and will be removed in version 4.0 dclocal_read_repair_chance table option has been deprecated and will be removed in version 4.0 Warnings : dclocal_read_repair_chance table option has been deprecated and will be removed in version 4.0 dclocal_read_repair_chance table option has been deprecated and will be removed in version 4.0 INFO: the schema has been restored starting apigee-cassandra-0 in default starting apigee-cassandra-1 in default starting apigee-cassandra-2 in default 84 95 106 waiting on waiting nodes $pid to finish 84 Activated service account credentials for: [apigee-cassandra-backup-svc@gce-myusername.iam.gserviceaccount.com] Activated service account credentials for: [apigee-cassandra-backup-svc@gce-myusername.iam.gserviceaccount.com] Activated service account credentials for: [apigee-cassandra-backup-svc@gce-myusername.iam.gserviceaccount.com] INFO: restore downloaded tarball and extracted the file from gs://gce-myusername-apigee-cassandra-backup/apigeecluster/dc-1 INFO: restore downloaded tarball and extracted the file from gs://gce-myusername-apigee-cassandra-backup/apigeecluster/dc-1 INFO: restore downloaded tarball and extracted the file from gs://gce-myusername-apigee-cassandra-backup/apigeecluster/dc-1 INFO 12:02:28 Configuration location: file:/etc/cassandra/cassandra.yaml …... INFO 12:02:41 [Stream #e013ee80-5863-11e9-8458-353e9e3cb7f9] All sessions completed Summary statistics: Connections per host : 3 Total files transferred : 2 Total bytes transferred : 0.378KiB Total duration : 5048 ms Average transfer rate : 0.074KiB/s Peak transfer rate : 0.075KiB/s progress: [/10.32.1.155]0:1/1 100% 1:1/1 100% [/10.32.0.28]1:1/1 100% 0:1/1 100% [/10.32.3.220]0:1/1 100% 1:1/1 100% total: 100% 0.000KiB/s (avg: 0.074KiB/s) INFO 12:02:41 [Stream #e013ee80-5863-11e9-8458-353e9e3cb7f9] All sessions completed progress: [/10.32.1.155]0:1/1 100% 1:1/1 100% [/10.32.0.28]1:1/1 100% 0:1/1 100% [/10.32.3.220]0:1/1 100% 1:1/1 100% total: 100% 0.000KiB/s (avg: 0.074KiB/s) INFO 12:02:41 [Stream #e013ee80-5863-11e9-8458-353e9e3cb7f9] All sessions completed INFO 12:02:41 [Stream #e013ee80-5863-11e9-8458-353e9e3cb7f9] All sessions completed INFO: ./apigee/data/cassandra/data/ks1/user-9fbae960571411e99652c7b15b2db6cc restored successfully INFO: Restore 20190405011309 completed INFO: ./apigee/data/cassandra/data/ks1/user-9fbae960571411e99652c7b15b2db6cc restored successfully INFO: Restore 20190405011309 completed waiting on waiting nodes $pid to finish 106 Restore finished

Verify backup job

You can also verify your backup job after your backup cronjob is scheduled. After the cronjob has been scheduled, you should see something like this:

 kubectl get pods NAME READY STATUS RESTARTS AGE apigee-cassandra-0 1/1 Running 0 2h apigee-cassandra-1 1/1 Running 0 2h apigee-cassandra-2 1/1 Running 0 2h apigee-cassandra-backup-1554515580-pff6s 0/1 Running 0 54s

Check the backup logs

The backup job:

  • Creates a schema.cql file.
  • Uploads it to your storage bucket.
  • Echoes the node to backup the data and uploads it at the same time.
  • Waits until all of the data is uploaded.
 kubectl logs -f apigee-cassandra-backup-1554515580-pff6s myusername-macbookpro:cassandra-backup-utility myusername$ kubectl logs -f apigee-cassandra-backup-1554577680-f9sc4 starting apigee-cassandra-0 in default starting apigee-cassandra-1 in default starting apigee-cassandra-2 in default 35 46 57 waiting on process 35 Activated service account credentials for: [apigee-cassandra-backup-svc@gce-myusername.iam.gserviceaccount.com] Activated service account credentials for: [apigee-cassandra-backup-svc@gce-myusername.iam.gserviceaccount.com] Activated service account credentials for: [apigee-cassandra-backup-svc@gce-myusername.iam.gserviceaccount.com] Requested creating snapshot(s) for [all keyspaces] with snapshot name [20190406190808] and options {skipFlush=false} Snapshot directory: 20190406190808 INFO: backup created cassandra snapshot 20190406190808 tar: Removing leading `/' from member names /apigee/data/cassandra/data/ks1/mytest3-37bc2df0587811e98e8d875b0ed64754/snapshots/ /apigee/data/cassandra/data/ks1/mytest3-37bc2df0587811e98e8d875b0ed64754/snapshots/20190406190808/ /apigee/data/cassandra/data/ks1/mytest3-37bc2df0587811e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Data.db Requested creating snapshot(s) for [all keyspaces] with snapshot name [20190406190808] and options {skipFlush=false} Requested creating snapshot(s) for [all keyspaces] with snapshot name [20190406190808] and options {skipFlush=false} Snapshot directory: 20190406190808 INFO: backup created cassandra snapshot 20190406190808 tar: Removing leading `/' from member names /apigee/data/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/snapshots/ /apigee/data/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/snapshots/20190406190808/ /apigee/data/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/snapshots/20190406190808/manifest.json /apigee/data/cassandra/data/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/snapshots/ /apigee/data/cassandra/data/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/snapshots/20190406190808/ /apigee/data/cassandra/data/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/snapshots/20190406190808/manifest.json /apigee/data/cassandra/data/system/range_xfers-55d764384e553f8b9f6e676d4af3976d/snapshots/ /apigee/data/cassandra/data/system/range_xfers-55d764384e553f8b9f6e676d4af3976d/snapshots/20190406190808/ /apigee/data/cassandra/data/system/range_xfers-55d764384e553f8b9f6e676d4af3976d/snapshots/20190406190808/manifest.json /apigee/data/cassandra/data/system/peer_events-59dfeaea8db2334191ef109974d81484/snapshots/ /apigee/data/cassandra/data/system/peer_events-59dfeaea8db2334191ef109974d81484/snapshots/20190406190808/ /apigee/data/cassandra/data/system/peer_events-59dfeaea8db2334191ef109974d81484/snapshots/20190406190808/manifest.json /apigee/data/cassandra/data/system/built_views-4b3c50a9ea873d7691016dbc9c38494a/snapshots/ /apigee/data/cassandra/data/system/built_views-4b3c50a9ea873d7691016dbc9c38494a/snapshots/20190406190808/ /apigee/data/cassandra/data/system/built_views-4b3c50a9ea873d7691016dbc9c38494a/snapshots/20190406190808/manifest.json …… /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/ /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/ /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-Filter.db /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-CompressionInfo.db /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-Index.db /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-Statistics.db /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-Data.db /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Index.db /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Statistics.db /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-TOC.txt /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-Statistics.db /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Summary.db /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Filter.db /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-Summary.db /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-Index.db /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/manifest.json /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-Filter.db /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-Digest.crc32 /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-Summary.db /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-Data.db /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-TOC.txt /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/schema.cql /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-CompressionInfo.db /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Digest.crc32 /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-TOC.txt /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Data.db /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-Digest.crc32 /apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-CompressionInfo.db …… /tmp/tokens.txt / [1 files][ 0.0 B/ 0.0 B] Operation completed over 1 objects. / [1 files][ 0.0 B/ 0.0 B] Operation completed over 1 objects. INFO: backup created tarball and transfered the file to gs://gce-myusername-apigee-cassandra-backup/apigeecluster/dc-1 INFO: removing cassandra snapshot INFO: backup created tarball and transfered the file to gs://gce-myusername-apigee-cassandra-backup/apigeecluster/dc-1 INFO: removing cassandra snapshot Requested clearing snapshot(s) for [all keyspaces] INFO: Backup 20190406190808 completed waiting on process 46 Requested clearing snapshot(s) for [all keyspaces] INFO: Backup 20190406190808 completed Requested clearing snapshot(s) for [all keyspaces] waiting on process 57 INFO: Backup 20190406190808 completed waiting result to get schema from 10.32.0.28 INFO: /tmp/schema.cql has been generated Activated service account credentials for: [apigee-cassandra-backup-svc@gce-myusername.iam.gserviceaccount.com] tar: removing leading '/' from member names tmp/schema.cql Copying from ... / [1 files][ 0.0 B/ 0.0 B] Operation completed over 1 objects. INFO: backup created tarball and transfered the file to gs://gce-myusername-apigee-cassandra-backup/apigeecluster/dc-1 finished uploading schema.cql