The Crypto Shredding sample app demonstrates how you can make use of MongoDB's Client-Side Field Level Encryption (CSFLE) to strengthen procedures for removing sensitive data.
About the Sample Application
The right to erasure, also known as the right to be forgotten, is a right granted to individuals under laws and regulations such as GDPR. This means that companies storing an individual's personal data must be able to delete it on request. Because data can be spread across several systems, it can be technically challenging for these companies to identify and remove it from all places. Even when properly executed, there is the risk that deleted data can be restored from backups in the future, potentially creating legal and financial risks.
Warning
MongoDB provides no guarantees that the solution and techniques described in this article fulfill all regulatory requirements around the right to erasure. Your organization must determine appropriate, sufficient measures to comply with regulatory requirements such as GDPR.
The Crypto Shredding sample application demonstrates one way to implement a right to erasure. The demo application is a Python (Flask) Web application with a front end for adding users, logging in, and entering data. It also includes an "Admin" page to showcase crypto shredding functionality.
You can install and run the application by following the instructions in the GitHub repository.
What is Crypto-Shredding?
Crypto-shredding, also called cryptographic erasure, is a data destruction technique where, instead of destroying encrypted data, you destroy the encryption keys necessary to decrypt it. This makes the data indecipherable.
For example, imagine you are storing data for multiple users. You start by giving each user their own unique Data Encryption Key (DEK), and mapping it to that customer.
In the diagram, "User A" and "User B" each have their own unique DEK in the key store. Each key is used to encrypt or decrypt data for its respective user:

Let's assume that you want to remove all data for User B. If you remove User B's DEK, you can no longer decrypt their data. Everything in the datastore becomes indecipherable ciphertext. User A's data is unaffected, since their DEK still exists:

What is CSFLE?
With CSFLE, applications can encrypt sensitive fields in documents prior to transmitting data to the server. Even when data is being used by the database in memory, it is never in plain text. The database stores and transmits encrypted data that is only deciphered by the client.
CSFLE uses envelope encryption, which is the practice of encrypting plaintext data with a data key, that is itself encrypted by a top level envelope key (also known as a "Customer Master Key", or CMK).

Encryption Key Management
CMKs are usually managed by a Key Management Service (KMS). CSFLE supports multiple KMSs, including Amazon Web Services (AWS), Azure Key Vault, Google Cloud Platform (GCP), and keystores that support the KMIP standard, such as Hashicorp Keyvault. The sample app uses Amazon Web Services as the KMS.
Automatic and Explicit Encryption
CSFLE can be used in automatic or explicit mode, or a combination of both. The sample app uses explicit encryption.
With automatic encryption, you perform encrypted read and write operations based on a defined encryption schema, so you don't need application code to specify how to encrypt or decrypt fields.
With explicit encryption, you use the MongoDB driver's encryption library to manually encrypt or decrypt fields in your application.
Sample App Walkthrough
The sample app uses CSFLE with explicit encryption, and Amazon Web Services as the KMS:

Adding Users
The app instantiates the ClientEncryption
class by initializing an app.mongodb_encryption_client
object. This encryption client is responsible for generating DEKs, and then encrypting them using a CMK from the AWS KMS.
When a user signs up, the application generates a unique DEK for them using the create_data_key
method, then returns the data_key_id
:
# flaskapp/db_queries.py def create_key(userId): data_key_id = \app.mongodb_encryption_client.create_data_key (kms_provider, master_key, key_alt_names=[userId]) return data_key_id
The app then uses this method when saving user information:
# flaskapp/user.py def save(self): dek_id = db_queries.create_key(self.username) result = app.mongodb[db_name].user.insert_one( { "username": self.username, "password_hash": self.password_hash, "dek_id": dek_id, "createdAt": datetime.now(), } ) if result: self.id = result.inserted_id return True else: return False
Adding and Encrypting Data
Once registered, a user can log in and enter data as key-value pairs via an input form:

The database stores this data in a MongoDB collection named “data,” where each document includes the username and the key-value pair:
{ "name": "shoe size", "value": "10", "username": "tom" }
The sample app encrypts the value
and username
fields, but not the name
. The app encrypts fields with the user's DEK and a specified encryption algorithm:
# flaskapp/db_queries.py # Fields to encrypt, and the algorithm to encrypt them with ENCRYPTED_FIELDS = { # Deterministic encryption for username, because we need to search on it "username": Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Deterministic, # Random encryption for value, as we don't need to search on it "value": Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Random, }
The insert_data
function takes an unencrypted document and loops over the ENCRYPTED_FIELDS
to encrypt them:
# flaskapp/db_queries.py def insert_data(document): document["username"] = current_user.username # Loop over the field names (and associated algorithm) we want to encrypt for field, algo in ENCRYPTED_FIELDS.items(): # if the field exists in the document, encrypt it if document.get(field): document[field] = encrypt_field(document[field], algo) # Insert document (now with encrypted fields) to the data collection app.data_collection.insert_one(document)
If the specified field exists in the document, the function calls encrypt_field
to encrypt it using the specified algorithm:
# flaskapp/db_queries.py # Encrypt a single field with the given algorithm def encrypt_field(field, algorithm): try: field = app.mongodb_encryption_client.encrypt( field, algorithm, key_alt_name=current_user.username, ) return field except pymongo.errors.EncryptionError as ex: # Catch this error in case the DEK doesn't exist. Log a warning and # re-raise the exception if "not all keys requested were satisfied" in ex._message: app.logger.warn( f"Encryption failed: could not find data encryption key for user: {current_user.username}" ) raise ex
After adding data, you can see it in the Web app:

Deleting an Encryption Key
Now let's see what happens when you delete the DEK. The sample app does this from an admin page, which should be restricted to only those individuals with authorization to manage keys:

The "Delete data encryption key" option removes the DEK, but leaves the user's encrypted data in place. After that, the application can no longer decrypt the data. Trying to retrieve the data for the logged in user throws an error:

Note
After deleting the DEK, the application may still be able to decrypt and show data until its cache expires, up to 60 seconds later.
But what is actually left in the database? You can review the information by returning to the Admin page and clicking Fetch data for all users. This view doesn't throw an exception if the application can't decrypt the data. Instead, it shows exactly what is stored in the database.
Even though you haven't actually deleted the user's data, because the data encryption key no longer exists, the application can only show the ciphertext for the encrypted fields "username" and "value".

Here is the code used to fetch this data. It uses similar logic to the encrypt
method shown earlier. The application runs a find
operation without any filters to retrieve all data, then loops over the ENCRYPTED_FIELDS
dictionary to decrypt fields:
# flaskapp/db_queries.py def fetch_all_data_unencrypted(decrypt=False): results = list(app.data_collection.find()) if decrypt: for field in ENCRYPTED_FIELDS.keys(): for result in results: if result.get(field): result[field], result["encryption_succeeded"] = decrypt_field(result[field]) return results
The decrypt_field
function is called for each field to be decrypted, but in this case the application catches the error if it can't successfully decrypt it due to a missing DEK:
# flaskapp/db_queries.py # Try to decrypt a field, returning a tuple of (value, status). This will be either (decrypted_value, True), or (raw_cipher_text, False) if we couldn't decrypt def decrypt_field(field): try: # We don't need to pass the DEK or algorithm to decrypt a field field = app.mongodb_encryption_client.decrypt(field) return field, True # Catch this error in case the DEK doesn't exist. except pymongo.errors.EncryptionError as ex: if "not all keys requested were satisfied" in ex._message: app.logger.warn( "Decryption failed: could not find data encryption key to decrypt the record." ) # If we can't decrypt due to missing DEK, return the "raw" value. return field, False raise ex
You can also use the mongosh shell to check directly in the database, to prove that there's nothing readable:

At this point, the user's encrypted data is still present. Someone could gain access to it by restoring their encryption key, such as from a database backup.
To prevent this, the sample application uses two separate database clusters: one for storing data, and one for storing DEKs (the "key vault"). Using separate clusters decouples the restoration of backups for application data and the key vault. Restoring the data cluster from a backup doesn't restore any DEKs that were deleted from the key vault cluster.
Conclusion
Client-Side Field Level Encryption can simplify the task of "forgetting" certain data. By deleting data keys, you can effectively forget data that exists across different databases, collections, backups, and logs.
In a production application, you might also delete the encrypted data itself, on top of removing the encryption key. This "defense in depth" approach helps ensure that data is really gone. Implementing crypto shredding on top of data deletion minimizes the impact if a delete operation fails, or doesn't include data that should have been wiped.