Skip to content

Commit 9f0acb2

Browse files
committed
Reorganized data masking docs
1 parent e87b05a commit 9f0acb2

File tree

9 files changed

+157
-131
lines changed

9 files changed

+157
-131
lines changed

docs/utilities/data_masking.md

Lines changed: 86 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,31 @@ description: Utility
55

66
<!-- markdownlint-disable MD051 -->
77

8-
The data masking utility provides a simple solution to conceal incoming data so that sensitive information is not passed downstream or logged.
8+
The data masking utility provides a simple solution to obfuscate (mask or encrypt) incoming data so that sensitive information is not passed downstream or logged.
9+
10+
```mermaid
11+
stateDiagram-v2
12+
direction LR
13+
Source: Customer information <br/><br/> Sensitive data <br/><br/> PII <br/><br/>
14+
LambdaInit: Lambda invocation
15+
Processor: Data Masker
16+
Handler: Your function
17+
YourLogic: Your logic to mask or encrypt data
18+
LambdaResponse: Logs
19+
20+
Source --> LambdaInit
21+
22+
LambdaInit --> Processor
23+
Processor --> Handler
24+
25+
state Processor {
26+
[*] --> Handler
27+
Handler --> YourLogic
28+
}
29+
30+
Handler --> Processor: Collect results
31+
Processor --> LambdaResponse: Obfuscated data
32+
```
933

1034
## Key features
1135

@@ -15,39 +39,67 @@ The data masking utility provides a simple solution to conceal incoming data so
1539

1640
## Terminology
1741

18-
**Mask**: This refers to concealing or partially replacing sensitive information with a non-sensitive placeholder or mask. The key characteristic of this operation is that it is irreversible, meaning the original sensitive data cannot be retrieved from the masked data. Masking is commonly applied when displaying data to users or for anonymizing data in non-reversible scenarios. For example, display the last four digits of a credit card number as "**** **** **** 1234".
42+
**Masking** irreversibly replaces sensitive information with a non-sensitive placeholder or mask. For example, display the last four digits of a credit card number as `"**** **** **** 1234"`.
1943

20-
**Encrypt**: This is the process of transforming plaintext data into a ciphertext format using an encryption algorithm and a cryptographic key. Encryption is a reversible process, meaning the original data can be retrieved (decrypted) using the appropriate decryption key. You can use this, for instance, to encrypt any PII (personally identifiable information) of your customers and make sure only the people with the right permissions are allowed to decrypt and view the plaintext PII data, in accordance with GDPR.
44+
**Encrypting** transforms plaintext into ciphertext using an encryption algorithm and a cryptographic key. Encryption can be reversed with the correct decryption key. This allows you to encrypt any PII (personally identifiable information) and make sure only the users with appropirate permissions can decrypt it to view the plaintext.
2145

22-
**Decrypt**: This is the process of reversing the encryption process, converting ciphertext back into its original plaintext using a decryption algorithm and the correct decryption key that only authorized personnel should have access to.
46+
**Decrypting** reverses the encryption process, converting ciphertext back into its original plaintext using a decryption algorithm and the correct decryption key.
2347

2448
## Getting started
2549

26-
### IAM Permissions
50+
### Install
51+
52+
If not using any encryption services and only masking data, your Lambda function does not need any additional permissions or resources to use this utility.
53+
54+
#### Using AWS Encryption SDK
2755

2856
To use the AWS Encryption SDK, your Lambda function IAM Role must have `kms:Decrypt` and `kms:GenerateDataKey` IAM permissions.
2957

58+
You must also have an AWS KMS key with full read/write permissions. You can create one and learn more on the [AWS KMS console](https://us-east-1.console.aws.amazon.com/kms/home?region=us-east-1#/kms/home){target="_blank" rel="nofollow"}.
59+
60+
#### Using a custom encryption provider
61+
3062
For any other encryption provider, make sure to have the permissions for your role that it requires.
3163

32-
If not using any encryption services and only masking data, your Lambda does not need any additional permissions to use this utility.
64+
### Working with nested data
3365

34-
### Required resources
66+
#### JSON
67+
When using the data masking utility with dictionaries or JSON strings, you can provide a list of keys to obfuscate the corresponding values. If no fields are provided, the entire data object will be masked or encrypted. You can obfuscate values of nested keys by using dot notation.
3568

36-
To use the AWS Encryption SDK, you must have an AWS KMS key with full read/write permissions. You can create one and learn more on the [AWS KMS console](https://us-east-1.console.aws.amazon.com/kms/home?region=us-east-1#/kms/home){target="_blank" rel="nofollow"}.
69+
???+ note
70+
If you're using our example [AWS Serverless Application Model (SAM) template](#using-a-custom-encryption-provider), you will notice we have configured the Lambda function to use a memory size of 1024 MB. We compared the performances of Lambda functions of several different memory sizes and concluding 1024 MB was the most optimal size for this feature. For more information, you can see the full reports of our [load tests](https://github.com/aws-powertools/powertools-lambda-python/pull/2197#issuecomment-1730571597) and [traces](https://github.com/aws-powertools/powertools-lambda-python/pull/2197#issuecomment-1732060923).
3771

38-
For any other encryption provider, you must have the resources required for that provider.
72+
=== "AWS Serverless Application Model (SAM) example"
73+
```yaml hl_lines="11-23 30 33-39 46"
74+
--8<-- "examples/data_masking/sam/template.yaml"
75+
```
76+
77+
=== "input.json"
78+
```json
79+
--8<-- "examples/data_masking/src/large_data_input.json"
80+
```
3981

40-
## Using the utility
82+
=== "data_masking_function_example.py"
83+
```python hl_lines="8 20-22"
84+
--8<-- "examples/data_masking/src/data_masking_function_example.py"
85+
```
4186

42-
#### Working with JSON
43-
When using the data masking utility with dictionaries or JSON objects, you can provide a list of keys to conceal the corresponding values. If no fields are provided, the entire data object will be masked or encrypted. You can conceal values of nested keys by using dot notation.
87+
=== "output.json"
88+
```json
89+
--8<-- "examples/data_masking/src/data_masking_function_example_output.json"
90+
```
4491

4592
### Masking data
4693

4794
You can mask data without having to install any encryption library.
4895

96+
=== "input.json"
97+
```json
98+
--8<-- "examples/data_masking/src/generic_data_input.json"
99+
```
100+
49101
=== "getting_started_mask_data.py"
50-
```python hl_lines="1 6 27"
102+
```python hl_lines="1 6 10"
51103
--8<-- "examples/data_masking/src/getting_started_mask_data.py"
52104
```
53105

@@ -60,8 +112,13 @@ You can mask data without having to install any encryption library.
60112

61113
In order to encrypt data, you must use either our out-of-the-box integration with the AWS Encryption SDK, or install another encryption provider of your own. You can still use the masking feature while using any encryption provider.
62114

115+
=== "input.json"
116+
```json
117+
--8<-- "examples/data_masking/src/generic_data_input.json"
118+
```
119+
63120
=== "getting_started_encrypt_data.py"
64-
```python hl_lines="3-4 6 29 32 34"
121+
```python hl_lines="3-4 12-13"
65122
--8<-- "examples/data_masking/src/getting_started_encrypt_data.py"
66123
```
67124

@@ -75,35 +132,21 @@ In order to encrypt data, you must use either our out-of-the-box integration wit
75132
--8<-- "examples/data_masking/src/decrypt_data_output.json"
76133
```
77134

78-
79-
### SAM template example
80-
=== "template.yaml"
81-
```yaml hl_lines="11-23 30 33-39 46"
82-
--8<-- "examples/data_masking/sam/template.yaml"
83-
```
84-
85-
=== "data_masking_function_example.py"
86-
```python hl_lines="8 47-50"
87-
--8<-- "examples/data_masking/src/data_masking_function_example.py"
88-
```
89-
90-
=== "output.json"
91-
```json
92-
--8<-- "examples/data_masking/src/data_masking_function_example_output.json"
93-
```
94-
95135
## Advanced
96136

97137
### Adjusting configurations for AWS Encryption SDK
98138

99-
You have the option to modify some of the configurations we have set as defaults when connecting to the AWS Encryption SDK. You can find and modify these values at `utilities/data_masking/constants.py`.
139+
You have the option to modify some of the configurations we have set as defaults when connecting to the AWS Encryption SDK. You can find and modify the following values in `utilities/data_masking/provider/kms/aws_encryption_sdk.py`.
140+
141+
#### Caching
100142

101-
The `CACHE_CAPACITY` value is currently set at `100`. This value represents the maximum number of entries that can be retained in the local cryptographic materials cache. Please see the [AWS Encryption SDK documentation](https://aws-encryption-sdk-python.readthedocs.io/en/latest/generated/aws_encryption_sdk.caches.local.html){target="_blank" rel="nofollow"} for more information.
143+
The `CACHE_CAPACITY` value is currently set to `100`. This value represents the maximum number of entries that can be retained in the local cryptographic materials cache. Please see the [AWS Encryption SDK documentation](https://aws-encryption-sdk-python.readthedocs.io/en/latest/generated/aws_encryption_sdk.caches.local.html){target="_blank" rel="nofollow"} for more information.
102144

103-
The `MAX_CACHE_AGE_SECONDS` value is currently set at `300`. It represents the maximum time (in seconds) that a cache entry may be kept in the cache.
145+
The `MAX_CACHE_AGE_SECONDS` value is currently set to `300`. It represents the maximum time (in seconds) that a cache entry may be kept in the cache. Please see the [AWS Encryption SDK documentation](https://aws-encryption-sdk-python.readthedocs.io/en/latest/generated/aws_encryption_sdk.materials_managers.caching.html#module-aws_encryption_sdk.materials_managers.caching){target="_blank" rel="nofollow"} for more information about this.
104146

105-
The `MAX_MESSAGES_ENCRYPTED` value is currently set at `200`. It represents the maximum number of messages that may be encrypted under a cache entry. Please see the [AWS Encryption SDK documentation](https://aws-encryption-sdk-python.readthedocs.io/en/latest/generated/aws_encryption_sdk.materials_managers.caching.html#module-aws_encryption_sdk.materials_managers.caching){target="_blank" rel="nofollow"} for more information about this and `MAX_CACHE_AGE_SECONDS`.
147+
#### Limit messages
106148

149+
The `MAX_MESSAGES_ENCRYPTED` value is currently set to `200`. It represents the maximum number of messages that may be encrypted under a cache entry. Please see the [AWS Encryption SDK documentation](https://aws-encryption-sdk-python.readthedocs.io/en/latest/generated/aws_encryption_sdk.materials_managers.caching.html#module-aws_encryption_sdk.materials_managers.caching){target="_blank" rel="nofollow"} for more information about this.
107150

108151
### Create your own encryption provider
109152

@@ -140,14 +183,19 @@ You can then use this custom encryption provider class as the `provider` argumen
140183

141184
Here is an example of implementing a custom encryption using an external encryption library like [ItsDangerous](https://itsdangerous.palletsprojects.com/en/2.1.x/){target="_blank" rel="nofollow"}, a widely popular encryption library.
142185

186+
=== "input.json"
187+
```json
188+
--8<-- "examples/data_masking/src/generic_data_input.json"
189+
```
190+
143191
=== "working_with_own_provider.py"
144-
```python hl_lines="1-2 25 28 30"
192+
```python hl_lines="1-2 9-10"
145193
--8<-- "examples/data_masking/src/working_with_own_provider.py"
146194
```
147195

148196
=== "custom_provider.py"
149-
```python hl_lines="1 3 6 8 11 16"
150-
--8<-- "examples/data_masking/src/custom_provider.py"
197+
```python hl_lines="1 3 8"
198+
--8<-- "examples/data_masking/src/custom_data_masking_provider.py"
151199
```
152200

153201
=== "encrypted_output.json"

examples/data_masking/sam/template.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Description: >
66
Globals: # https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-specification-template-anatomy-globals.html
77
Function:
88
Timeout: 5
9-
Runtime: python3.10
9+
Runtime: python3.11
1010
Tracing: Active
1111
Resources:
1212
MyKMSKey:
@@ -27,7 +27,7 @@ Resources:
2727
Handler: data_masking_function_example.lambda_handler
2828
CodeUri: ../src
2929
Description: Data Masking Function Example
30-
MemorySize: 128
30+
MemorySize: 1024
3131
Architectures:
3232
- x86_64
3333
Policies:
Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,20 @@
1-
import json
1+
from itsdangerous.url_safe import URLSafeSerializer
2+
23
from aws_lambda_powertools.utilities._data_masking.provider import BaseProvider
34

45

56
class MyCustomEncryption(BaseProvider):
67
def __init__(self, secret):
78
super().__init__()
89
self.secret = secret
10+
self.serializer = URLSafeSerializer(self.secret)
911

1012
def encrypt(self, data: str) -> str:
1113
if data is None:
1214
return data
13-
return json.dumps(data)
15+
return self.serializer.dumps(data)
1416

1517
def decrypt(self, data: str) -> str:
1618
if data is None:
1719
return data
18-
return json.loads(data)
20+
return self.serializer.loads(data)

examples/data_masking/src/data_masking_function_example.py

Lines changed: 5 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -5,37 +5,7 @@
55
from aws_lambda_powertools.utilities._data_masking.provider.kms.aws_encryption_sdk import AwsEncryptionSdkProvider
66
from aws_lambda_powertools.utilities.typing import LambdaContext
77

8-
KMS_KEY_ARN = os.environ["KMS_KEY_ARN"]
9-
10-
json_blob = {
11-
"id": 1,
12-
"name": "John Doe",
13-
"age": 30,
14-
"email": "johndoe@example.com",
15-
"address": {"street": "123 Main St", "city": "Anytown", "state": "CA", "zip": "12345"},
16-
"phone_numbers": ["+1-555-555-1234", "+1-555-555-5678"],
17-
"interests": ["Hiking", "Traveling", "Photography", "Reading"],
18-
"job_history": {
19-
"company": {
20-
"company_name": "Acme Inc.",
21-
"company_address": "5678 Interview Dr.",
22-
},
23-
"position": "Software Engineer",
24-
"start_date": "2015-01-01",
25-
"end_date": "2017-12-31",
26-
},
27-
"about_me": """
28-
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla tincidunt velit quis
29-
sapien mollis, at egestas massa tincidunt. Suspendisse ultrices arcu a dolor dapibus,
30-
ut pretium turpis volutpat. Vestibulum at sapien quis sapien dignissim volutpat ut a enim.
31-
Praesent fringilla sem eu dui convallis luctus. Donec ullamcorper, sapien ut convallis congue,
32-
risus mauris pretium tortor, nec dignissim arcu urna a nisl. Vivamus non fermentum ex. Proin
33-
interdum nisi id sagittis egestas. Nam sit amet nisi nec quam pharetra sagittis. Aliquam erat
34-
volutpat. Donec nec luctus sem, nec ornare lorem. Vivamus vitae orci quis enim faucibus placerat.
35-
Nulla facilisi. Proin in turpis orci. Donec imperdiet velit ac tellus gravida, eget laoreet tellus
36-
malesuada. Praesent venenatis tellus ac urna blandit, at varius felis posuere. Integer a commodo nunc.
37-
""",
38-
}
8+
KMS_KEY_ARN = os.getenv("KMS_KEY_ARN")
399

4010
tracer = Tracer()
4111
logger = Logger()
@@ -44,7 +14,10 @@
4414
@tracer.capture_lambda_handler
4515
def lambda_handler(event: dict, context: LambdaContext) -> dict:
4616
logger.info("Hello world function - HTTP 200")
17+
18+
data = event["body"]
19+
4720
data_masker = DataMasking(provider=AwsEncryptionSdkProvider(keys=[KMS_KEY_ARN]))
48-
encrypted = data_masker.encrypt(json_blob, fields=["address.street", "job_history.company.company_name"])
21+
encrypted = data_masker.encrypt(data, fields=["address.street", "job_history.company.company_name"])
4922
decrypted = data_masker.decrypt(encrypted, fields=["address.street", "job_history.company.company_name"])
5023
return {"Decrypted_json": decrypted}
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{
2+
"body":
3+
{
4+
"id": 1,
5+
"name": "John Doe",
6+
"age": 30,
7+
"email": "johndoe@example.com",
8+
"address": {
9+
"street": "123 Main St",
10+
"city": "Anytown",
11+
"state": "CA",
12+
"zip": "12345"
13+
},
14+
"company_address": {
15+
"street": "456 ACME Ave",
16+
"city": "Anytown",
17+
"state": "CA",
18+
"zip": "12345"
19+
}
20+
}
21+
}

examples/data_masking/src/getting_started_encrypt_data.py

Lines changed: 2 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -3,28 +3,11 @@
33
from aws_lambda_powertools.utilities._data_masking import DataMasking
44
from aws_lambda_powertools.utilities._data_masking.provider.kms.aws_encryption_sdk import AwsEncryptionSdkProvider
55

6-
KMS_KEY_ARN = os.environ["KMS_KEY_ARN"]
6+
KMS_KEY_ARN = os.getenv("KMS_KEY_ARN")
77

88
def lambda_handler(event, context):
99

10-
data = {
11-
"id": 1,
12-
"name": "John Doe",
13-
"age": 30,
14-
"email": "johndoe@example.com",
15-
"address": {
16-
"street": "123 Main St",
17-
"city": "Anytown",
18-
"state": "CA",
19-
"zip": "12345",
20-
},
21-
"company_address": {
22-
"street": "456 ACME Ave",
23-
"city": "Anytown",
24-
"state": "CA",
25-
"zip": "12345",
26-
},
27-
}
10+
data = event["body"]
2811

2912
encryption_provider = AwsEncryptionSdkProvider(keys=[KMS_KEY_ARN])
3013
data_masker = DataMasking(provider=encryption_provider)

examples/data_masking/src/getting_started_mask_data.py

Lines changed: 1 addition & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -5,23 +5,6 @@ def lambda_handler(event, context):
55

66
data_masker = DataMasking()
77

8-
data = {
9-
"id": 1,
10-
"name": "John Doe",
11-
"age": 30,
12-
"email": "johndoe@example.com",
13-
"address": {
14-
"street": "123 Main St",
15-
"city": "Anytown",
16-
"state": "CA",
17-
"zip": "12345",
18-
},
19-
"company_address": {
20-
"street": "456 ACME Ave",
21-
"city": "Anytown",
22-
"state": "CA",
23-
"zip": "12345",
24-
},
25-
}
8+
data = event["body"]
269

2710
data_masker.mask(data=data, fields=["email", "address.street", "company_address"])
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
{
2+
"body":
3+
{
4+
"id": 1,
5+
"name": "John Doe",
6+
"age": 30,
7+
"email": "johndoe@example.com",
8+
"address": {"street": "123 Main St", "city": "Anytown", "state": "CA", "zip": "12345"},
9+
"phone_numbers": ["+1-555-555-1234", "+1-555-555-5678"],
10+
"interests": ["Hiking", "Traveling", "Photography", "Reading"],
11+
"job_history": {
12+
"company": {
13+
"company_name": "Acme Inc.",
14+
"company_address": "5678 Interview Dr."
15+
},
16+
"position": "Software Engineer",
17+
"start_date": "2015-01-01",
18+
"end_date": "2017-12-31"
19+
},
20+
"about_me": """
21+
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla tincidunt velit quis
22+
sapien mollis, at egestas massa tincidunt. Suspendisse ultrices arcu a dolor dapibus,
23+
ut pretium turpis volutpat. Vestibulum at sapien quis sapien dignissim volutpat ut a enim.
24+
Praesent fringilla sem eu dui convallis luctus. Donec ullamcorper, sapien ut convallis congue,
25+
risus mauris pretium tortor, nec dignissim arcu urna a nisl. Vivamus non fermentum ex. Proin
26+
interdum nisi id sagittis egestas. Nam sit amet nisi nec quam pharetra sagittis. Aliquam erat
27+
volutpat. Donec nec luctus sem, nec ornare lorem. Vivamus vitae orci quis enim faucibus placerat.
28+
Nulla facilisi. Proin in turpis orci. Donec imperdiet velit ac tellus gravida, eget laoreet tellus
29+
malesuada. Praesent venenatis tellus ac urna blandit, at varius felis posuere. Integer a commodo nunc.
30+
"""
31+
}
32+
}

0 commit comments

Comments
 (0)