Skip to content

INTPYTHON-527 Add Queryable Encryption support #329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

aclark4life
Copy link
Collaborator

@aclark4life aclark4life commented Jun 27, 2025

(see previous attempts in #318, #319 and #323 for additional context)

With this PR I am able to get Django to create an encrypted collection when the schema code is running create_model on an EncryptedModel containing an EncryptedCharField e.g. see db.enxcol_.encryption__person.ecoc below

Enterprise a49c6bfb-b6b3-4711-bd5d-c6ecf0611a4c [direct: secondary] test> use test_djangotests switched to db test_djangotests Enterprise a49c6bfb-b6b3-4711-bd5d-c6ecf0611a4c [direct: secondary] test_djangotests> db. db.__proto__ db.constructor db.hasOwnProperty db.isPrototypeOf db.propertyIsEnumerable db.toLocaleString db.toString db.valueOf db.getMongo db.getName db.getCollectionNames db.getCollectionInfos db.runCommand db.adminCommand db.aggregate db.getSiblingDB db.getCollection db.dropDatabase db.createUser db.updateUser db.changeUserPassword db.logout db.dropUser db.dropAllUsers db.auth db.grantRolesToUser db.revokeRolesFromUser db.getUser db.getUsers db.createCollection db.createEncryptedCollection db.createView db.createRole db.updateRole db.dropRole db.dropAllRoles db.grantRolesToRole db.revokeRolesFromRole db.grantPrivilegesToRole db.revokePrivilegesFromRole db.getRole db.getRoles db.currentOp db.killOp db.shutdownServer db.fsyncLock db.fsyncUnlock db.version db.serverBits db.isMaster db.hello db.serverBuildInfo db.serverStatus db.stats db.hostInfo db.serverCmdLineOpts db.rotateCertificates db.printCollectionStats db.getProfilingStatus db.setProfilingLevel db.setLogLevel db.getLogComponents db.commandHelp db.listCommands db.printSecondaryReplicationInfo db.getReplicationInfo db.printReplicationInfo db.watch db.sql db.auth_group_permissions db.django_session db.auth_user db.enxcol_.encryption__person.ecoc db.auth_group db.django_site db.django_migrations db.django_content_type db.auth_user_groups db.enxcol_.encryption__person.esc db.auth_permission db.auth_user_user_permissions db.django_admin_log 

Questions

  • To manage both encrypted and unencrypted connections, keep the _nodb_cursor functionality in this PR or do something in init_connection_state as @timgraham suggests, or do something else?
  • As @ShaneHarvey suggests, ask encryption folks about command not supported for auto encryption: buildinfo which happens when Django attempts to get the server version via encrypted connection, thus necessitating the need to manage both encrypted and unencrypted connections. Are most commands supported for auto encryption or not?
  • What does EncryptedModel support for EmbeddedModel look like? What are the specific use cases for integration of EncryptedModel and EmbeddedModel? Should we be able to mixin EncryptedModel and EmbeddedModel then include that model in an EmbeddedModelField ?

Todo

  • Helpers need a home
  • Add additional encrypted fields
    • EncryptedModel
    • EncryptedCharField
  • Migrations
  • Querying
  • Docs
    • Limitations
    • Mention pymongocrypt wheel includes crypt_shared library!
  • More tests
  • More KMS support ("local" only in this PR)

Helpers

Included helpers are also used by the test runner e.g.

import os from django_mongodb_backend import encryption, parse_uri kms_providers = encryption.get_kms_providers() auto_encryption_opts = encryption.get_auto_encryption_opts( kms_providers=kms_providers, ) DATABASE_URL = os.environ.get("MONGODB_URI", "mongodb://localhost:27017/djangotests") DATABASES = { "default": parse_uri( DATABASE_URL, options={"auto_encryption_opts": auto_encryption_opts} ), } DEFAULT_AUTO_FIELD = "django_mongodb_backend.fields.ObjectIdAutoField" PASSWORD_HASHERS = ("django.contrib.auth.hashers.MD5PasswordHasher",) SECRET_KEY = "django_tests_secret_key" USE_TZ = False 

# Build a map of encrypted fields
encrypted_fields = {
"fields": {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add query conditions

return ClientEncryption(kms_providers, key_vault_namespace, encrypted_connection, codec_options)


def get_auto_encryption_opts(crypt_shared_lib_path=None, kms_providers=None):
Copy link
Collaborator Author

@aclark4life aclark4life Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crypt_shared library is in the pymongocrypt wheel, which is much easier than downloading separately and telling MongoClient where it is.

Copy link
Collaborator Author

@aclark4life aclark4life Jun 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More to this story:

  • libmongocrypt is in the pymongocrypt wheel, not crypt_shared which must always be downloaded and configured manually.
  • libmongocrypt works because mongocryptd is running on enterprise.

We should document this.

(via @ShaneHarvey, thanks!)

Comment on lines 434 to 435
self.connection.features.supports_encryption
and self.connection._settings_dict.get("OPTIONS", {}).get("auto_encryption_opts")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want encrypted models to silently fallback to working as unencrypted models.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we don't but I'm not sure why you are making that comment here … as of 65bd15a I'm creating two connections and using the encrypted_connection only when needed. Is there a fallback scenario I'm missing? Seems like with two connections we're going to have to check every use of self.connection to make sure we're using the right one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _create_collection() you're guarding the creation of an encrypted model based on this method, so if features.supports_encryption = False but the model has encrypted fields, it's going to incorrectly use create_collection() instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class EncryptedCharField(models.CharField):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.encrypted = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd think this could be a class-level variable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Use the encrypted connection and auto_encryption_opts to create an encrypted client
encrypted_client = get_encrypted_client(auto_encryption_opts, encrypted_connection)

with contextlib.suppress(EncryptedCollectionError):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a comment about why the error should be suppressed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be a case where we're trying to create a collection that already exists. It would be correct to surface that error to the user because their migrations are out of sync with their database.

@aclark4life
Copy link
Collaborator Author

Wrong commit message for 65bd15a and I don't want to force push yet. It should have said:

"Only create an encrypted connection once then reuse it."

I'm aware that _nodb_cursor is slated for removal but in the meantime I can keep going with other fixes with this approach, and it does satisfy the design we all agree on (I think) of maintaining two simultaneous connections:

  • Unencrypted connection unless we need it
  • Encrypted connection when we need that can be used.
@timgraham
Copy link
Collaborator

timgraham commented Jun 27, 2025

I'm aware that _nodb_cursor is slated for removal but in the meantime I can keep going with other fixes with this approach, and it does satisfy the design we all agree on (I think) of maintaining two simultaneous connections:

It's not working as you think it is. As I said elsewhere, _nodb_cursor is not used by this backend.

Does this fix the "command not supported for auto encryption: buildinfo" error? If so, it's perhaps because self.settings_dict["OPTIONS"].pop("auto_encryption_opts") is having the side effect of altering settings_dict before DatabaseWrapper.connection is initialized.

I'd suggest to use my patch is as a starting point for maintaining two connections. self.connection should be the encrypted version (secure by default) with a fallback to a non-encrypted connection only as needed (e.g. for commands like buildInfo). At least it will help us understand whether that's a viable approach. As I mentioned in the design doc, I'm not sure if using an encrypted connection for non-encrypted collections is problematic. If so, we'll have to go back to the drawing board on the design.

@aclark4life
Copy link
Collaborator Author

It's not working as you think it is. As I said elsewhere, _nodb_cursor is not used by this backend.

I don't disagree, but it feels a lot like _start_transaction_under_autocommit which gets called by start_transaction_under_autocommit because autocommit is False. Django appears to stumble into _nodb_cursor when the encrypted connection fails to get the database version and while we don't use a cursor in this backend, we do have a "nosql" cursor that has __enter__ and __exit__ (I assume) to meet Django's expectations and we get an opportunity to modify the connection. @Jibola mentioned this design is suspect yesterday and I agree with both of you, particularly with regard to the desire to start with and maintain an encrypted connection first.

Does this fix the "command not supported for auto encryption: buildinfo" error? If so, it's perhaps because self.settings_dict["OPTIONS"].pop("auto_encryption_opts") is having the side effect of altering settings_dict before DatabaseWrapper.connection is initialized.

Yes it works by design, not a side effect. I'm deep.copying settings_dict when DatabaseWrapper is initialized and so when DatabaseWrapper.connection is initialized it's unencrypted. When the schema needs encryption later, it's retrieved from _settings_dict.

I'd suggest to use my patch is as a starting point for maintaining two connections. self.connection should be the encrypted version (secure by default) with a fallback to a non-encrypted connection only as needed (e.g. for commands like buildInfo). At least it will help us understand whether that's a viable approach. As I mentioned in the design doc, I'm not sure if using an encrypted connection for non-encrypted collections is problematic. If so, we'll have to go back to the drawing board on the design.

I'd make a few passes at it but did not get anywhere, I'll try again though.

@timgraham
Copy link
Collaborator

Your "stumble" theory of how it's working isn't correct. _nodb_cursor is only used on one place: to create the test database. As I said, I could imagine that perhaps this method causes connection to later be initialized without auto_encryption_opts because of self.settings_dict["OPTIONS"].pop("auto_encryption_opts"). The connection that's created in your _nodb_cursor is never used.

@aclark4life
Copy link
Collaborator Author

aclark4life commented Jun 28, 2025

Your "stumble" theory of how it's working isn't correct. _nodb_cursor is only used on one place: to create the test database. As I said, I could imagine that perhaps this method causes connection to later be initialized without auto_encryption_opts because of self.settings_dict["OPTIONS"].pop("auto_encryption_opts"). The connection that's created in your _nodb_cursor is never used.

Copy that, thanks!

I've removed _nodb_cursor in 8e83ada and discovered the version check is the only time that error occurs. I now get errors like:

Traceback (most recent call last): File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 124, in _wrap_encryption_errors yield File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 466, in encrypt encrypted_cmd = self._auto_encrypter.encrypt(database, encoded_cmd) File "/Users/alexclark/Developer/django-mongodb-cli/.venv/lib/python3.13/site-packages/pymongocrypt/synchronous/auto_encrypter.py", line 44, in encrypt return run_state_machine(ctx, self.callback) File "/Users/alexclark/Developer/django-mongodb-cli/.venv/lib/python3.13/site-packages/pymongocrypt/synchronous/state_machine.py", line 136, in run_state_machine result = callback.mark_command(ctx.database, mongocryptd_cmd) File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 286, in mark_command res = self.mongocryptd_client[database].command( inflated_cmd, codec_options=DEFAULT_RAW_BSON_OPTIONS ) File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/_csot.py", line 125, in csot_wrapper return func(self, *args, **kwargs) File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 930, in command return self._command( ~~~~~~~~~~~~~^ connection, ^^^^^^^^^^^ ...<7 lines>... **kwargs, ^^^^^^^^^ ) ^ File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 770, in _command return conn.command( ~~~~~~~~~~~~^ self._name, ^^^^^^^^^^^ ...<8 lines>... client=self._client, ^^^^^^^^^^^^^^^^^^^^ ) ^ File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/helpers.py", line 47, in inner return func(*args, **kwargs) File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/pool.py", line 414, in command return command( self, ...<20 lines>... write_concern=write_concern, ) File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/network.py", line 212, in command helpers_shared._check_command_response( ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ response_doc, ^^^^^^^^^^^^^ ...<2 lines>... parse_write_concern_error=parse_write_concern_error, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ) ^ File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/helpers_shared.py", line 250, in _check_command_response raise OperationFailure(errmsg, code, response, max_wire_version) pymongo.errors.OperationFailure: Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection., full error: RawBSONDocument(b"\xa7\x00\x00\x00\x01ok\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02errmsg\x00d\x00\x00\x00Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection.\x00\x10code\x00\x08\xc8\x00\x00\x02codeName\x00\x0e\x00\x00\x00Location51208\x00\x00", codec_options=CodecOptions(document_class=<class 'bson.raw_bson.RawBSONDocument'>, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME)) 

Still working on an unencrypted connection, but perhaps the only time we need it is for the version check.

@cached_property
def supports_encryption(self):
"""
Encryption is supported if the server is Atlas or Enterprise
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: this doesn't check for Atlas (at least on my Atlas VM, build_info.get("modules") == [])

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use supports_atlas_search for that ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, though we'd probably want to rename it or add a new more semantic name (e.g. is_atlas? Similarly, we might add is_enterprise as a separate property in case it can be reused later. )

It seems that a check for MongoDB 7.0+ is also needed: https://www.mongodb.com/docs/manual/core/queryable-encryption/reference/compatibility/#queryable-encryption-compatibility (and perhaps supports_encryption should be renamed to supports_automatic_encryption).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to supports_queryable_encryption because that's what I originally called it and automatic is only 1/2 of QE and we will eventually support explicit too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the compatibility linked table, the requirements are different for explicit encryption (it's supported by community edition too), thus we'll presumably need a separate supports_explicit_encryption feature flag.

Comment on lines 21 to 44
class EncryptionRouter:
"""
Routes database operations for 'encrypted' models to the 'encryption' DB.
"""

def db_for_read(self, model, **hints):
if getattr(model, "encrypted_fields_map", False):
return "encryption"
return None

def db_for_write(self, model, **hints):
if getattr(model, "encrypted_fields_map", False):
return "encryption"
return None

def allow_migrate(self, db, app_label, model_name=None, **hints):
"""
Ensure that the 'encrypted' models only appear in the 'encryption' DB,
and not in the default DB.
"""
model = hints.get("model")
if model and getattr(model, "encrypted_fields_map", False):
return db == "encryption"
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should provide an example in the documentation but probably not here as a public API since making a sufficiently generic router seems questionable (for example, hardcoding "encryption" as the name of the alias). I'd suggest putting this code in tests/encryption_/routers.py and using it there with @override_settings(DATABASE_ROUTERS=.... (And as discussed in the design doc, stop using encrypted_fields_map to detect encrypted models. At the least, adding a encrypted = True class attribute on EncryptedModel would be consistent with EncryptedField and wouldn't be so ugly until we decide on a final solution.)

Copy link
Collaborator Author

@aclark4life aclark4life Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where in schema do I move the encrypted_fields_map ? I'm not that familiar with that code yet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add SchemaEditor._get_encrypted_fields_map(self, model). You have access to the connection to pass to field.db_type() using self.connection.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should provide an example in the documentation but probably not here as a public API since making a sufficiently generic router seems questionable (for example, hardcoding "encryption" as the name of the alias).

I think we can make the alias configurable and I definitely want to make a public API because I'm fairly certain we are not going to tell users to create a custom DatabaseRouter to use this feature. That said we have some time to discuss, but in the short term I may see one added to the helpers. It could also end up in the project template.

if not hasattr(model, "encrypted"):
self.get_database().create_collection(model._meta.db_table)
else:
# TODO: Route to the encrypted database connection.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A correctly configured database router will take care of it.

@aclark4life
Copy link
Collaborator Author

aclark4life commented Jul 2, 2025

@ShaneHarvey @Jibola @timgraham FYI here is the pipeline that causes the let error:

(Pdb) pprint.pprint(pipeline) [{'$lookup': {'as': 'django_content_type', 'from': 'django_content_type', 'let': {'parent__field__0': '$content_type_id'}, 'pipeline': [{'$match': {'$expr': {'$and': [{'$eq': ['$$parent__field__0', '$_id']}]}}}]}}, {'$unwind': '$django_content_type'}, {'$match': {'$expr': {'$in': ['$content_type_id', (ObjectId('6864933ec7cf8179e3ef1f8d'),)]}}}, {'$project': {'codename': 1, 'content_type_id': 1, 'django_content_type': {'app_label': 1, 'model': 1}}}, {'$sort': SON([('django_content_type.app_label', 1), ('django_content_type.model', 1), ('codename', 1)])}] 

And here is the error again with some additional debug:

(Pdb) errmsg "Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection." (Pdb) code 51208 (Pdb) response RawBSONDocument(b"\xa7\x00\x00\x00\x01ok\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02errmsg\x00d\x00\x00\x00Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection.\x00\x10code\x00\x08\xc8\x00\x00\x02codeName\x00\x0e\x00\x00\x00Location51208\x00\x00", codec_options=CodecOptions(document_class=<class 'bson.raw_bson.RawBSONDocument'>, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME)) (Pdb) max_wire_version 26 

And the full traceback:

 Running post-migrate handlers for application contenttypes Traceback (most recent call last): File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 124, in _wrap_encryption_errors yield File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 466, in encrypt encrypted_cmd = self._auto_encrypter.encrypt(database, encoded_cmd) File "/Users/alexclark/Developer/django-mongodb-cli/.venv/lib/python3.13/site-packages/pymongocrypt/synchronous/auto_encrypter.py", line 44, in encrypt return run_state_machine(ctx, self.callback) File "/Users/alexclark/Developer/django-mongodb-cli/.venv/lib/python3.13/site-packages/pymongocrypt/synchronous/state_machine.py", line 136, in run_state_machine result = callback.mark_command(ctx.database, mongocryptd_cmd) File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 286, in mark_command res = self.mongocryptd_client[database].command( inflated_cmd, codec_options=DEFAULT_RAW_BSON_OPTIONS ) File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/_csot.py", line 125, in csot_wrapper return func(self, *args, **kwargs) File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 930, in command return self._command( ~~~~~~~~~~~~~^ connection, ^^^^^^^^^^^ ...<7 lines>... **kwargs, ^^^^^^^^^ ) ^ File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 770, in _command return conn.command( ~~~~~~~~~~~~^ self._name, ^^^^^^^^^^^ ...<8 lines>... client=self._client, ^^^^^^^^^^^^^^^^^^^^ ) ^ File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/helpers.py", line 47, in inner return func(*args, **kwargs) File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/pool.py", line 414, in command return command( self, ...<20 lines>... write_concern=write_concern, ) File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/network.py", line 212, in command helpers_shared._check_command_response( ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ response_doc, ^^^^^^^^^^^^^ ...<2 lines>... parse_write_concern_error=parse_write_concern_error, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ) ^ File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/helpers_shared.py", line 250, in _check_command_response raise OperationFailure(errmsg, code, response, max_wire_version) pymongo.errors.OperationFailure: Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection., full error: RawBSONDocument(b"\xa7\x00\x00\x00\x01ok\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02errmsg\x00d\x00\x00\x00Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection.\x00\x10code\x00\x08\xc8\x00\x00\x02codeName\x00\x0e\x00\x00\x00Location51208\x00\x00", codec_options=CodecOptions(document_class=<class 'bson.raw_bson.RawBSONDocument'>, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME)) 

Test settings:

import os from django_mongodb_backend import encryption, parse_uri kms_providers = encryption.get_kms_providers() auto_encryption_opts = encryption.get_auto_encryption_opts( kms_providers=kms_providers, ) DATABASE_URL = os.environ.get("MONGODB_URI", "mongodb://localhost:27017") DATABASES = { "default": parse_uri( DATABASE_URL, db_name="djangotests", ), "encrypted": parse_uri( DATABASE_URL, options={"auto_encryption_opts": auto_encryption_opts}, db_name="encrypted_djangotests", ), } DEFAULT_AUTO_FIELD = "django_mongodb_backend.fields.ObjectIdAutoField" PASSWORD_HASHERS = ("django.contrib.auth.hashers.MD5PasswordHasher",) SECRET_KEY = "django_tests_secret_key" USE_TZ = False 

This is happening in the encryption_ tests with a database router configured to use the encrypted database, but it happens before any tests are run or any routing occurs. I've confirmed that the encrypted database is created, so it appears that something needs to be done to address this issue in either our backend or PyMongo with the ideal candidate, perhaps, being a change to the MQL in the pipeline if possible.

@aclark4life
Copy link
Collaborator Author

This is happening in the encryption_ tests with a database router configured to use the encrypted database, but it happens before any tests are run or any routing occurs. I've confirmed that the encrypted database is created, so it appears that something needs to be done to address this issue in either our backend or PyMongo with the ideal candidate, perhaps, being a change to the MQL in the pipeline if possible.

Confirming that test settings suggested by @timgraham allow me to proceed with testing:

class TestRouter: def allow_migrate(self, db, app_label, model_name=None, **hints): if db == "encrypted": if app_label != "encryption_": return False return None DATABASE_ROUTERS = [TestRouter()] 

Unfortunately that still brings me back to this:

Running post-migrate handlers for application encryption_ Traceback (most recent call last): File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 124, in _wrap_encryption_errors yield File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 466, in encrypt encrypted_cmd = self._auto_encrypter.encrypt(database, encoded_cmd) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/alex.clark/Developer/django-mongodb-cli/.venv/lib/python3.12/site-packages/pymongocrypt/synchronous/auto_encrypter.py", line 43, in encrypt with self.mongocrypt.encryption_context(database, cmd) as ctx: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/alex.clark/Developer/django-mongodb-cli/.venv/lib/python3.12/site-packages/pymongocrypt/mongocrypt.py", line 228, in encryption_context return EncryptionContext( ^^^^^^^^^^^^^^^^^^ File "/Users/alex.clark/Developer/django-mongodb-cli/.venv/lib/python3.12/site-packages/pymongocrypt/mongocrypt.py", line 426, in __init__ self._raise_from_status() File "/Users/alex.clark/Developer/django-mongodb-cli/.venv/lib/python3.12/site-packages/pymongocrypt/mongocrypt.py", line 355, in _raise_from_status raise exc pymongocrypt.errors.MongoCryptError: command not supported for auto encryption: buildinfo 

But I can work around that issue by returning (8, 1, 1)

Since auto_encryption_opts is provided in test settings, that means we get a key vault database that persists whether we like it or not. Would be nice if that were not the case, but probably OK for now.
Via Django settings. With this change we don't need to provide helpers for `kms_providers` and `key_vault_namespace` because they can be configured in Django settings and retrieved by the schema during `client_encryption` and `create_encrypted_collection`.
I was unable to do this in `init_connection_state` so I tried to do the next best thing.
Comment on lines +8 to +9
def __init__(self, *args, **kwargs):
self.queries = kwargs.pop("queries", [])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __init__(self, *args, **kwargs):
self.queries = kwargs.pop("queries", [])
def __init__(self, *args, queries=None, **kwargs):
self.queries = queries

When adding new arguments, you'll also need to add a deconstruct() method. Here's an example from CharField which has a extra db_collation argument:

 def deconstruct(self): name, path, args, kwargs = super().deconstruct() if self.db_collation: kwargs["db_collation"] = self.db_collation return name, path, args, kwargs 
Comment on lines +26 to +28
with connection.schema_editor() as editor:
encrypted_field_names = editor._get_encrypted_fields_map(self.person).get("fields")
self.assertNotIn("name", encrypted_field_names)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By asserting the expected mapping in the above test, it also confirms that non-encrypted fields aren't included.

Comment on lines +12 to +13
def allow_migrate(self, db, app_label, model_name=None, **hints):
return "encrypted"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is expected to return a boolean.


class Person(EncryptedModel):
name = models.CharField("name", max_length=100)
ssn = EncryptedCharField("ssn", max_length=11, queries=["equality"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the syntax for queries correct? According to https://www.mongodb.com/docs/manual/core/queryable-encryption/qe-create-encrypted-collection/ it looks like [{"queryType": "equality"}] or [{ "queryType": "range", "sparsity": 1, "min": 100, "max": 2000, "trimFactor": 4 }]. I guess the user provides those sort of values here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting we implement a list syntax for Django users, then in _get_encrypted_fields_map the dictionary syntax is used along with the values provided by the user in the list.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you propose the optional parameters be supplied? (I suggest clarifying this in the design doc.)

Comment on lines +50 to +52
# TODO: Add setUp? `del connection.features.supports_queryable_encryption` returns
# `AttributeError: 'DatabasesFeatures' object has no attribute 'supports_queryable_encryption'`
# even though it does have it upon inspection in `pdb`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generated on first access, thus why you see it using pdb. Since it may not exist in the first setUp(), you can use connection.features.__dict__.pop("supports_queryable_encryption", None).

"""

db = self.get_database()
if not hasattr(model, "encrypted"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't do the correct thing if model.encrypted = False. ;-)

aclark4life and others added 3 commits July 7, 2025 09:02
Co-authored-by: Tim Graham <timograham@gmail.com>
Co-authored-by: Tim Graham <timograham@gmail.com>
Co-authored-by: Tim Graham <timograham@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants