You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/utilities/data_masking.md
+86-38Lines changed: 86 additions & 38 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,31 @@ description: Utility
5
5
6
6
<!-- markdownlint-disable MD051 -->
7
7
8
-
The data masking utility provides a simple solution to conceal incoming data so that sensitive information is not passed downstream or logged.
8
+
The data masking utility provides a simple solution to obfuscate (mask or encrypt) incoming data so that sensitive information is not passed downstream or logged.
9
+
10
+
```mermaid
11
+
stateDiagram-v2
12
+
direction LR
13
+
Source: Customer information <br/><br/> Sensitive data <br/><br/> PII <br/><br/>
14
+
LambdaInit: Lambda invocation
15
+
Processor: Data Masker
16
+
Handler: Your function
17
+
YourLogic: Your logic to mask or encrypt data
18
+
LambdaResponse: Logs
19
+
20
+
Source --> LambdaInit
21
+
22
+
LambdaInit --> Processor
23
+
Processor --> Handler
24
+
25
+
state Processor {
26
+
[*] --> Handler
27
+
Handler --> YourLogic
28
+
}
29
+
30
+
Handler --> Processor: Collect results
31
+
Processor --> LambdaResponse: Obfuscated data
32
+
```
9
33
10
34
## Key features
11
35
@@ -15,39 +39,67 @@ The data masking utility provides a simple solution to conceal incoming data so
15
39
16
40
## Terminology
17
41
18
-
**Mask**: This refers to concealing or partially replacing sensitive information with a non-sensitive placeholder or mask. The key characteristic of this operation is that it is irreversible, meaning the original sensitive data cannot be retrieved from the masked data. Masking is commonly applied when displaying data to users or for anonymizing data in non-reversible scenarios. For example, display the last four digits of a credit card number as "************ 1234".
42
+
**Masking** irreversibly replaces sensitive information with a non-sensitive placeholder or mask. For example, display the last four digits of a credit card number as `"**** **** **** 1234"`.
19
43
20
-
**Encrypt**: This is the process of transforming plaintext data into a ciphertext format using an encryption algorithm and a cryptographic key. Encryption is a reversible process, meaning the original data can be retrieved (decrypted) using the appropriate decryption key. You can use this, for instance, to encrypt any PII (personally identifiable information) of your customers and make sure only the people with the right permissions are allowed to decrypt and view the plaintext PII data, in accordance with GDPR.
44
+
**Encrypting** transforms plaintext into ciphertext using an encryption algorithm and a cryptographic key. Encryption can be reversed with the correct decryption key. This allows you to encrypt any PII (personally identifiable information) and make sure only the users with appropirate permissions can decrypt it to view the plaintext.
21
45
22
-
**Decrypt**: This is the process of reversing the encryption process, converting ciphertext back into its original plaintext using a decryption algorithm and the correct decryption key that only authorized personnel should have access to.
46
+
**Decrypting** reverses the encryption process, converting ciphertext back into its original plaintext using a decryption algorithm and the correct decryption key.
23
47
24
48
## Getting started
25
49
26
-
### IAM Permissions
50
+
### Install
51
+
52
+
If not using any encryption services and only masking data, your Lambda function does not need any additional permissions or resources to use this utility.
53
+
54
+
#### Using AWS Encryption SDK
27
55
28
56
To use the AWS Encryption SDK, your Lambda function IAM Role must have `kms:Decrypt` and `kms:GenerateDataKey` IAM permissions.
29
57
58
+
You must also have an AWS KMS key with full read/write permissions. You can create one and learn more on the [AWS KMS console](https://us-east-1.console.aws.amazon.com/kms/home?region=us-east-1#/kms/home){target="_blank" rel="nofollow"}.
59
+
60
+
#### Using a custom encryption provider
61
+
30
62
For any other encryption provider, make sure to have the permissions for your role that it requires.
31
63
32
-
If not using any encryption services and only masking data, your Lambda does not need any additional permissions to use this utility.
64
+
### Working with nested data
33
65
34
-
### Required resources
66
+
#### JSON
67
+
When using the data masking utility with dictionaries or JSON strings, you can provide a list of keys to obfuscate the corresponding values. If no fields are provided, the entire data object will be masked or encrypted. You can obfuscate values of nested keys by using dot notation.
35
68
36
-
To use the AWS Encryption SDK, you must have an AWS KMS key with full read/write permissions. You can create one and learn more on the [AWS KMS console](https://us-east-1.console.aws.amazon.com/kms/home?region=us-east-1#/kms/home){target="_blank" rel="nofollow"}.
69
+
???+ note
70
+
If you're using our example [AWS Serverless Application Model (SAM) template](#using-a-custom-encryption-provider), you will notice we have configured the Lambda function to use a memory size of 1024 MB. We compared the performances of Lambda functions of several different memory sizes and concluding 1024 MB was the most optimal size for this feature. For more information, you can see the full reports of our [load tests](https://github.com/aws-powertools/powertools-lambda-python/pull/2197#issuecomment-1730571597) and [traces](https://github.com/aws-powertools/powertools-lambda-python/pull/2197#issuecomment-1732060923).
37
71
38
-
For any other encryption provider, you must have the resources required for that provider.
72
+
=== "AWS Serverless Application Model (SAM) example"
When using the data masking utility with dictionaries or JSON objects, you can provide a list of keys to conceal the corresponding values. If no fields are provided, the entire data object will be masked or encrypted. You can conceal values of nested keys by using dot notation.
@@ -60,8 +112,13 @@ You can mask data without having to install any encryption library.
60
112
61
113
In order to encrypt data, you must use either our out-of-the-box integration with the AWS Encryption SDK, or install another encryption provider of your own. You can still use the masking feature while using any encryption provider.
### Adjusting configurations for AWS Encryption SDK
98
138
99
-
You have the option to modify some of the configurations we have set as defaults when connecting to the AWS Encryption SDK. You can find and modify these values at `utilities/data_masking/constants.py`.
139
+
You have the option to modify some of the configurations we have set as defaults when connecting to the AWS Encryption SDK. You can find and modify the following values in `utilities/data_masking/provider/kms/aws_encryption_sdk.py`.
140
+
141
+
#### Caching
100
142
101
-
The `CACHE_CAPACITY` value is currently set at`100`. This value represents the maximum number of entries that can be retained in the local cryptographic materials cache. Please see the [AWS Encryption SDK documentation](https://aws-encryption-sdk-python.readthedocs.io/en/latest/generated/aws_encryption_sdk.caches.local.html){target="_blank" rel="nofollow"} for more information.
143
+
The `CACHE_CAPACITY` value is currently set to`100`. This value represents the maximum number of entries that can be retained in the local cryptographic materials cache. Please see the [AWS Encryption SDK documentation](https://aws-encryption-sdk-python.readthedocs.io/en/latest/generated/aws_encryption_sdk.caches.local.html){target="_blank" rel="nofollow"} for more information.
102
144
103
-
The `MAX_CACHE_AGE_SECONDS` value is currently set at`300`. It represents the maximum time (in seconds) that a cache entry may be kept in the cache.
145
+
The `MAX_CACHE_AGE_SECONDS` value is currently set to`300`. It represents the maximum time (in seconds) that a cache entry may be kept in the cache. Please see the [AWS Encryption SDK documentation](https://aws-encryption-sdk-python.readthedocs.io/en/latest/generated/aws_encryption_sdk.materials_managers.caching.html#module-aws_encryption_sdk.materials_managers.caching){target="_blank" rel="nofollow"} for more information about this.
104
146
105
-
The `MAX_MESSAGES_ENCRYPTED` value is currently set at `200`. It represents the maximum number of messages that may be encrypted under a cache entry. Please see the [AWS Encryption SDK documentation](https://aws-encryption-sdk-python.readthedocs.io/en/latest/generated/aws_encryption_sdk.materials_managers.caching.html#module-aws_encryption_sdk.materials_managers.caching){target="_blank" rel="nofollow"} for more information about this and `MAX_CACHE_AGE_SECONDS`.
147
+
#### Limit messages
106
148
149
+
The `MAX_MESSAGES_ENCRYPTED` value is currently set to `200`. It represents the maximum number of messages that may be encrypted under a cache entry. Please see the [AWS Encryption SDK documentation](https://aws-encryption-sdk-python.readthedocs.io/en/latest/generated/aws_encryption_sdk.materials_managers.caching.html#module-aws_encryption_sdk.materials_managers.caching){target="_blank" rel="nofollow"} for more information about this.
107
150
108
151
### Create your own encryption provider
109
152
@@ -140,14 +183,19 @@ You can then use this custom encryption provider class as the `provider` argumen
140
183
141
184
Here is an example of implementing a custom encryption using an external encryption library like [ItsDangerous](https://itsdangerous.palletsprojects.com/en/2.1.x/){target="_blank" rel="nofollow"}, a widely popular encryption library.
0 commit comments