- Notifications
You must be signed in to change notification settings - Fork 23
INTPYTHON-527 Add Queryable Encryption support #329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
django_mongodb_backend/models.py Outdated
| ||
# Build a map of encrypted fields | ||
encrypted_fields = { | ||
"fields": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add query conditions
django_mongodb_backend/encryption.py Outdated
return ClientEncryption(kms_providers, key_vault_namespace, encrypted_connection, codec_options) | ||
| ||
| ||
def get_auto_encryption_opts(crypt_shared_lib_path=None, kms_providers=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crypt_shared
library is in the pymongocrypt
wheel, which is much easier than downloading separately and telling MongoClient
where it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More to this story:
libmongocrypt
is in thepymongocrypt
wheel, notcrypt_shared
which must always be downloaded and configured manually.libmongocrypt
works becausemongocryptd
is running on enterprise.
We should document this.
(via @ShaneHarvey, thanks!)
django_mongodb_backend/schema.py Outdated
self.connection.features.supports_encryption | ||
and self.connection._settings_dict.get("OPTIONS", {}).get("auto_encryption_opts") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want encrypted models to silently fallback to working as unencrypted models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we don't but I'm not sure why you are making that comment here … as of 65bd15a I'm creating two connections and using the encrypted_connection
only when needed. Is there a fallback scenario I'm missing? Seems like with two connections we're going to have to check every use of self.connection
to make sure we're using the right one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In _create_collection()
you're guarding the creation of an encrypted model based on this method, so if features.supports_encryption = False
but the model has encrypted fields, it's going to incorrectly use create_collection()
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class EncryptedCharField(models.CharField): | ||
def __init__(self, *args, **kwargs): | ||
super().__init__(*args, **kwargs) | ||
self.encrypted = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd think this could be a class-level variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
django_mongodb_backend/schema.py Outdated
# Use the encrypted connection and auto_encryption_opts to create an encrypted client | ||
encrypted_client = get_encrypted_client(auto_encryption_opts, encrypted_connection) | ||
| ||
with contextlib.suppress(EncryptedCollectionError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a comment about why the error should be suppressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be a case where we're trying to create a collection that already exists. It would be correct to surface that error to the user because their migrations are out of sync with their database.
Wrong commit message for 65bd15a and I don't want to force push yet. It should have said:
I'm aware that
|
It's not working as you think it is. As I said elsewhere, Does this fix the "command not supported for auto encryption: buildinfo" error? If so, it's perhaps because I'd suggest to use my patch is as a starting point for maintaining two connections. |
I don't disagree, but it feels a lot like
Yes it works by design, not a side effect. I'm
I'd make a few passes at it but did not get anywhere, I'll try again though. |
Your "stumble" theory of how it's working isn't correct. |
Copy that, thanks! I've removed
Still working on an unencrypted connection, but perhaps the only time we need it is for the version check. |
django_mongodb_backend/features.py Outdated
@cached_property | ||
def supports_encryption(self): | ||
""" | ||
Encryption is supported if the server is Atlas or Enterprise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: this doesn't check for Atlas (at least on my Atlas VM, build_info.get("modules") == []
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use supports_atlas_search
for that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, though we'd probably want to rename it or add a new more semantic name (e.g. is_atlas
? Similarly, we might add is_enterprise
as a separate property in case it can be reused later. )
It seems that a check for MongoDB 7.0+ is also needed: https://www.mongodb.com/docs/manual/core/queryable-encryption/reference/compatibility/#queryable-encryption-compatibility (and perhaps supports_encryption
should be renamed to supports_automatic_encryption
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it to supports_queryable_encryption
because that's what I originally called it and automatic is only 1/2 of QE and we will eventually support explicit too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the compatibility linked table, the requirements are different for explicit encryption (it's supported by community edition too), thus we'll presumably need a separate supports_explicit_encryption
feature flag.
django_mongodb_backend/routers.py Outdated
class EncryptionRouter: | ||
""" | ||
Routes database operations for 'encrypted' models to the 'encryption' DB. | ||
""" | ||
| ||
def db_for_read(self, model, **hints): | ||
if getattr(model, "encrypted_fields_map", False): | ||
return "encryption" | ||
return None | ||
| ||
def db_for_write(self, model, **hints): | ||
if getattr(model, "encrypted_fields_map", False): | ||
return "encryption" | ||
return None | ||
| ||
def allow_migrate(self, db, app_label, model_name=None, **hints): | ||
""" | ||
Ensure that the 'encrypted' models only appear in the 'encryption' DB, | ||
and not in the default DB. | ||
""" | ||
model = hints.get("model") | ||
if model and getattr(model, "encrypted_fields_map", False): | ||
return db == "encryption" | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should provide an example in the documentation but probably not here as a public API since making a sufficiently generic router seems questionable (for example, hardcoding "encryption" as the name of the alias). I'd suggest putting this code in tests/encryption_/routers.py
and using it there with @override_settings(DATABASE_ROUTERS=...
. (And as discussed in the design doc, stop using encrypted_fields_map
to detect encrypted models. At the least, adding a encrypted = True
class attribute on EncryptedModel
would be consistent with EncryptedField
and wouldn't be so ugly until we decide on a final solution.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where in schema do I move the encrypted_fields_map
? I'm not that familiar with that code yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add SchemaEditor._get_encrypted_fields_map(self, model)
. You have access to the connection to pass to field.db_type()
using self.connection
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should provide an example in the documentation but probably not here as a public API since making a sufficiently generic router seems questionable (for example, hardcoding "encryption" as the name of the alias).
I think we can make the alias configurable and I definitely want to make a public API because I'm fairly certain we are not going to tell users to create a custom DatabaseRouter
to use this feature. That said we have some time to discuss, but in the short term I may see one added to the helpers. It could also end up in the project template.
django_mongodb_backend/schema.py Outdated
if not hasattr(model, "encrypted"): | ||
self.get_database().create_collection(model._meta.db_table) | ||
else: | ||
# TODO: Route to the encrypted database connection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A correctly configured database router will take care of it.
@ShaneHarvey @Jibola @timgraham FYI here is the
And here is the error again with some additional debug:
And the full traceback:
Test settings:
This is happening in the |
Confirming that test settings suggested by @timgraham allow me to proceed with testing:
Unfortunately that still brings me back to this:
But I can work around that issue by returning |
Since auto_encryption_opts is provided in test settings, that means we get a key vault database that persists whether we like it or not. Would be nice if that were not the case, but probably OK for now.
Via Django settings. With this change we don't need to provide helpers for `kms_providers` and `key_vault_namespace` because they can be configured in Django settings and retrieved by the schema during `client_encryption` and `create_encrypted_collection`.
I was unable to do this in `init_connection_state` so I tried to do the next best thing.
def __init__(self, *args, **kwargs): | ||
self.queries = kwargs.pop("queries", []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def __init__(self, *args, **kwargs): | |
self.queries = kwargs.pop("queries", []) | |
def __init__(self, *args, queries=None, **kwargs): | |
self.queries = queries |
When adding new arguments, you'll also need to add a deconstruct()
method. Here's an example from CharField
which has a extra db_collation
argument:
def deconstruct(self): name, path, args, kwargs = super().deconstruct() if self.db_collation: kwargs["db_collation"] = self.db_collation return name, path, args, kwargs
with connection.schema_editor() as editor: | ||
encrypted_field_names = editor._get_encrypted_fields_map(self.person).get("fields") | ||
self.assertNotIn("name", encrypted_field_names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By asserting the expected mapping in the above test, it also confirms that non-encrypted fields aren't included.
def allow_migrate(self, db, app_label, model_name=None, **hints): | ||
return "encrypted" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is expected to return a boolean.
| ||
class Person(EncryptedModel): | ||
name = models.CharField("name", max_length=100) | ||
ssn = EncryptedCharField("ssn", max_length=11, queries=["equality"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the syntax for queries
correct? According to https://www.mongodb.com/docs/manual/core/queryable-encryption/qe-create-encrypted-collection/ it looks like [{"queryType": "equality"}]
or [{ "queryType": "range", "sparsity": 1, "min": 100, "max": 2000, "trimFactor": 4 }]
. I guess the user provides those sort of values here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm suggesting we implement a list syntax for Django users, then in _get_encrypted_fields_map
the dictionary syntax is used along with the values provided by the user in the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you propose the optional parameters be supplied? (I suggest clarifying this in the design doc.)
# TODO: Add setUp? `del connection.features.supports_queryable_encryption` returns | ||
# `AttributeError: 'DatabasesFeatures' object has no attribute 'supports_queryable_encryption'` | ||
# even though it does have it upon inspection in `pdb`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's generated on first access, thus why you see it using pdb. Since it may not exist in the first setUp()
, you can use connection.features.__dict__.pop("supports_queryable_encryption", None)
.
""" | ||
| ||
db = self.get_database() | ||
if not hasattr(model, "encrypted"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't do the correct thing if model.encrypted = False
. ;-)
Co-authored-by: Tim Graham <timograham@gmail.com>
Co-authored-by: Tim Graham <timograham@gmail.com>
Co-authored-by: Tim Graham <timograham@gmail.com>
(see previous attempts in #318, #319 and #323 for additional context)
With this PR I am able to get Django to create an encrypted collection when the schema code is running
create_model
on anEncryptedModel
containing anEncryptedCharField
e.g. seedb.enxcol_.encryption__person.ecoc
belowQuestions
_nodb_cursor
functionality in this PR or do something ininit_connection_state
as @timgraham suggests, or do something else?command not supported for auto encryption: buildinfo
which happens when Django attempts to get the server version via encrypted connection, thus necessitating the need to manage both encrypted and unencrypted connections. Are most commands supported for auto encryption or not?EncryptedModel
support forEmbeddedModel
look like? What are the specific use cases for integration ofEncryptedModel
andEmbeddedModel
? Should we be able to mixinEncryptedModel
andEmbeddedModel
then include that model in anEmbeddedModelField
?Todo
pymongocrypt
wheel includescrypt_shared
library!Helpers
Included helpers are also used by the test runner e.g.