DEV Community

selvakumar palanisamy
selvakumar palanisamy

Posted on

Data sharing using lake AWS lake formation

Lake formation enables organisations to securely share data between business units and scale the solution without causing headaches.Data can be stored in different AWS accounts belonging to different teams.

AWS services which enable data sharing
1) AWS Glue
2) Lake formation

AWS Glue

AWS Glue is a managed service that allows for the crawling of data repositories to aid in the creation of a data catalogue.
Jobs are Extract, Transform, and Load (ETL) tools provided by Glue. One of the difficulties is that you can share access in Glue using role-based access control with IAM roles and policies, but this necessitates knowledge of the underlying storage mechanism.
You must also create policies for both the Glue Catalog and the S3 Bucket.

AWS Lake formation

AWS Lake Formation simplifies access management and resource sharing across accounts.Lake Formation offers a straightforward granting mechanism that SQL experts will recognise.
These grants can be made to IAM identities, AWS accounts, or an entire AWS Organisation or OU.

Lake Formation integrates with AWS Resource Access Manager after creating a grant to create a cross-account resource share.
The shared catalogue resources will then be visible in the local data catalogue of Lake Formation administrators in the target account.

Solution Overview

Share data across AWS accounts to enable a multi-source data analytics solution.

Solution Components

Centralized Datalake account

1) Store the data in S3
2) catalog that data, so that the data is visible, and schema is known
3) share that data to other AWS accounts

Consuming Data account
1) Query the data in the source account

The diagram below shows how those components can work together to provide this solution:

Image description

Setting up Lake formation

1) A Lake Formation Administrator should be assigned.
The administrator will then be able to manage access to data catalogue resources both within and across accounts.
Lake Formation administrators can be either IAM users or IAM roles.

2) Change the Lake Formation permission model from IAM to Lake Formation native grants

Image description

3) Establish centralized datalake
4) Upload the file using AWS CLI or from the S3 console

You can upload the file using AWS CLI or from the S3 console:
aws s3 sync . s3://my-source-bucket

5) Add the crawler and give it permission to read the bucket and write to the catalog
Use this cfn stack to deploy the resources mentioned in the above steps

AWSTemplateFormatVersion: '2010-09-09' Description: My data lake source Resources: LakeformationSettings: Type: AWS::LakeFormation::DataLakeSettings Properties: Admins: - DataLakePrincipalIdentifier: arn:aws:iam::XXXXXXXXX:role/aws-reserved/sso.amazonaws.com/ap-southeast-2/AWSReservedSSO_AWSAdministratorAccess_85c5426c350156b8 MySourceDataStore: Type: AWS::S3::Bucket DeletionPolicy: Delete Properties: AccessControl: Private BucketName: !Sub 'my-source-data-store-${AWS::Region}-${AWS::AccountId}' BucketEncryption: ServerSideEncryptionConfiguration: - ServerSideEncryptionByDefault: SSEAlgorithm: AES256 MySourceGlueDatabase: Type: AWS::Glue::Database Properties: CatalogId: !Ref AWS::AccountId DatabaseInput: Name: my-source-glue-database-demo Description: String MySourceCrawlerRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: 'Allow' Principal: Service: - 'glue.amazonaws.com' Action: - 'sts:AssumeRole' Path: '/' Policies: - PolicyName: 'root' PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: - 'glue:*' Resource: '*' - Effect: Allow Action: - 'logs:CreateLogGroup' - 'logs:CreateLogStream' - 'logs:PutLogEvents' - 'logs:AssociateKmsKey' Resource: '*' - Effect: Allow Action: 's3:ListBucket' Resource: !GetAtt MySourceDataStore.Arn - Effect: Allow Action: 's3:GetObject' Resource: !Sub - '${Bucket}/*' - { Bucket: !GetAtt MySourceDataStore.Arn } MySourceCrawler: Type: AWS::Glue::Crawler Properties: Name: my-source-data-crawler Role: !GetAtt MySourceCrawlerRole.Arn DatabaseName: !Ref MySourceGlueDatabase Targets: S3Targets: - Path: !Ref MySourceDataStore SchemaChangePolicy: UpdateBehavior: 'UPDATE_IN_DATABASE' DeleteBehavior: 'LOG' SourceCrawlerLakeGrants: Type: AWS::LakeFormation::Permissions Properties: DataLakePrincipal: DataLakePrincipalIdentifier: !GetAtt MySourceCrawlerRole.Arn Permissions: - ALTER - DROP - CREATE_TABLE Resource: DatabaseResource: Name: !Ref MySourceGlueDatabase DatalakeLocation: Type: AWS::LakeFormation::Resource Properties: ResourceArn: !GetAtt MySourceDataStore.Arn RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/aws-service-role/lakeformation.amazonaws.com/AWSServiceRoleForLakeFormationDataAccess UseServiceLinkedRole: true 
Enter fullscreen mode Exit fullscreen mode

6) When you have deployed the crawler, you should be able to see and run it in the Glue Console

Image description

7)Cross account grant
To enable cross-account access, you will need to add a Lake Formation grant and specify the consumer account number.

CrossAccountLakeGrants: Type: AWS::LakeFormation::Permissions Properties: DataLakePrincipal: DataLakePrincipalIdentifier: "XXXXXXXXXX" # Consumer account number Permissions: - SELECT PermissionsWithGrantOption: - SELECT Resource: TableResource: DatabaseName: !Ref MySourceGlueDatabase Name: !Sub 'my_source_data_store_ap_southeast_2_${AWS::AccountId}' 
Enter fullscreen mode Exit fullscreen mode

8) Setup permissions in the consumer account

Login into the consumer account and setup Lake Formation base settings:

1) Setup a Lake Formation administrator

AWSTemplateFormatVersion: '2010-09-09' Description: My consumer data lake setup Resources: LakeformationSettings: Type: AWS::LakeFormation::DataLakeSettings Properties: Admins: - DataLakePrincipalIdentifier: arn:aws:iam::XXXXXXXXXX:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_AWSAdministratorAccess_56cabj890003333 
Enter fullscreen mode Exit fullscreen mode

2) Turn on Lake Formation grants:

Image description

Create a resource link to the database in the data lake account. Unfortunately, this is not available via CloudFormation yet. In Lake Formation console, click on databases -> create database button.

Image description

Image description

3) Open Athena console, you should be able to see your like database and table schema. Now all there is to do is to query the table and make sure it returns the result.

Image description

Top comments (0)