CICD Pipeline and delivery of Apache Spark Applications on the cloud using AWS Maria Fung Data Warehouse Development Lead California State University, Office of the Chancellor mfung@calstate.edu Babu Repaka Business Intelligence Solution Architect California State University, Office of the Chancellor brepaka@calstate.edu October 25, 2020
• The Largest and most diverse system of 4- year higher education in the U.S. • 23 Campuses • ~50K Employees • Nearly 500k enrolled students this year
CICD Overview  CSU Past technologies  CSU Present & Future Cloud with Agile methodology - Devops - Dataops - CICD Develop Build Package Test Deploy Operate
AWS CICD Components  S3  CODE COMMIT  CODE BUILD  CODE PIPELINE  CLOUDFORMATION  CLOUDWATCH  EMR  LAMBDA  SNS  SQS
Current CICD Process Overview S3 bucket Start Developer A CodeCommit Check in code to Git Notify Peers to review pull request Automatically merge Pull Request Developers B,C,D CloudWatch Event Rule Alarm SNS Topic email CodeBuild Code Testing Pytest.py Code Coverage Coverage.py Submit EMR steps to Dev EMR cluster Build and package dependencies from Git CodeBuild CodePipeline EMR Source Test Build Deploy CloudFormation Production Deployment Create Pull Request and resolve conflicts Test Result ? review comments and changes Approve ? Yes No Failed Passed Lambda
Security for the CI/CD Pipeline Develop Build Package Test Deploy Operate Security
 Multi-factor authentication (MFA)  Role assignments and segregation of duties  Test Cases and acceptable outcomes  Secured code repositories  Secured Build and package environment  Sensitive information  Weekly Security Assessment Review using AWS Trust Advisor, AWS inspector and AWS Guard Duty Security Checklist
AWS Trust Advisor is being used to review the following: Any exceptions raised by the tool will be investigated and address right away Checklist cost Checklist fault tolerant Checklist performance Checklist security Checklist Management and Governance
AWS guard duty Security, Identity, & Compliance AWS Inspector
Development Pipeline Start Developer create Pull Request for changes from developer’s branch to dev branch Notify Approvers to approval pull request Submit EMR steps on dev EMR Cluster Test Result? Auto Update Pull Request comment and merge from developer’s changes to dev branch Failed Success Run unit testing and code coverage Pull Request Approve? Developer review comments and commit changes No Yes CloudWatch SNS Topic email Event Rule S3 Lambda CodeCommit CodeBuild No  Code Coverage  Pytest  EMR
Pytest Code coverage Development Pipeline
EMR Development Pipeline
Production Deployment Devops admin Submit Pull Request to merge from Dev to master branch Notify Manager for approval Update Pull Request comment and merge to master branch Pull Request Approve? No Yes Build all dependencies and packaging all files to production s3 bucket Deploy cloudformation stack to provision production EMR cluster and run steps Developer review comments and suggested changes CloudWatch SNS Topic emailEvent Rule Lambda KMS key CodeCommit CodeBuild CodePipeline EMR CloudFormation S3 S3 Dev Account Prod Account Lambda IAM Role Permissions Cross-account Role Permissions STS Assume Role PROD stack Create change set Execute change set
 Lesson Learned Continuous learning and exploration  Challenges Lack of jobs orchestration in AWS EMR. AWS Code Pipeline Cross Account Roles are not transparent on the AWS console  Next Steps Explore better jobs orchestration Analyzing & planning to implement AWS EKS
Thank you for watching Maria Fung mfung@calstate.edu Babu Repaka brepaka@calstate.edu maria-fung-7931001b4 babu-repaka-3167534

CICD Pipeline and delivery of Apache Spark Applications on the cloud using AWS

  • 1.
    CICD Pipeline anddelivery of Apache Spark Applications on the cloud using AWS Maria Fung Data Warehouse Development Lead California State University, Office of the Chancellor mfung@calstate.edu Babu Repaka Business Intelligence Solution Architect California State University, Office of the Chancellor brepaka@calstate.edu October 25, 2020
  • 2.
    • The Largestand most diverse system of 4- year higher education in the U.S. • 23 Campuses • ~50K Employees • Nearly 500k enrolled students this year
  • 3.
    CICD Overview  CSUPast technologies  CSU Present & Future Cloud with Agile methodology - Devops - Dataops - CICD Develop Build Package Test Deploy Operate
  • 4.
    AWS CICD Components S3  CODE COMMIT  CODE BUILD  CODE PIPELINE  CLOUDFORMATION  CLOUDWATCH  EMR  LAMBDA  SNS  SQS
  • 5.
    Current CICD ProcessOverview S3 bucket Start Developer A CodeCommit Check in code to Git Notify Peers to review pull request Automatically merge Pull Request Developers B,C,D CloudWatch Event Rule Alarm SNS Topic email CodeBuild Code Testing Pytest.py Code Coverage Coverage.py Submit EMR steps to Dev EMR cluster Build and package dependencies from Git CodeBuild CodePipeline EMR Source Test Build Deploy CloudFormation Production Deployment Create Pull Request and resolve conflicts Test Result ? review comments and changes Approve ? Yes No Failed Passed Lambda
  • 6.
    Security for theCI/CD Pipeline Develop Build Package Test Deploy Operate Security
  • 7.
     Multi-factor authentication(MFA)  Role assignments and segregation of duties  Test Cases and acceptable outcomes  Secured code repositories  Secured Build and package environment  Sensitive information  Weekly Security Assessment Review using AWS Trust Advisor, AWS inspector and AWS Guard Duty Security Checklist
  • 8.
    AWS Trust Advisoris being used to review the following: Any exceptions raised by the tool will be investigated and address right away Checklist cost Checklist fault tolerant Checklist performance Checklist security Checklist Management and Governance
  • 9.
    AWS guard duty Security,Identity, & Compliance AWS Inspector
  • 10.
    Development Pipeline Start Developer createPull Request for changes from developer’s branch to dev branch Notify Approvers to approval pull request Submit EMR steps on dev EMR Cluster Test Result? Auto Update Pull Request comment and merge from developer’s changes to dev branch Failed Success Run unit testing and code coverage Pull Request Approve? Developer review comments and commit changes No Yes CloudWatch SNS Topic email Event Rule S3 Lambda CodeCommit CodeBuild No  Code Coverage  Pytest  EMR
  • 11.
  • 12.
  • 13.
    Production Deployment Devops admin SubmitPull Request to merge from Dev to master branch Notify Manager for approval Update Pull Request comment and merge to master branch Pull Request Approve? No Yes Build all dependencies and packaging all files to production s3 bucket Deploy cloudformation stack to provision production EMR cluster and run steps Developer review comments and suggested changes CloudWatch SNS Topic emailEvent Rule Lambda KMS key CodeCommit CodeBuild CodePipeline EMR CloudFormation S3 S3 Dev Account Prod Account Lambda IAM Role Permissions Cross-account Role Permissions STS Assume Role PROD stack Create change set Execute change set
  • 14.
     Lesson Learned Continuouslearning and exploration  Challenges Lack of jobs orchestration in AWS EMR. AWS Code Pipeline Cross Account Roles are not transparent on the AWS console  Next Steps Explore better jobs orchestration Analyzing & planning to implement AWS EKS
  • 15.
    Thank you for watching MariaFung mfung@calstate.edu Babu Repaka brepaka@calstate.edu maria-fung-7931001b4 babu-repaka-3167534

Editor's Notes

  • #3 The CSU awards nearly half of the state’s baccalaureate degrees. 1 in 10 employed graduates came from CSU… representing 1 in 20 college degree holders nationwide! We produce in the neighborhood of 120,000 graduates per year and our more than 3.4 MILLION living alumni are employed in every field across the world
  • #16 Image source: https://www.shutterstock.com/image-photo/abstract-source-code-background-writing-program-1475214392