BIG DATA WORKSHOP Comprehensive developer workshop for all Spark based certifications
Agenda ■ About ITVersity ■ Introduction to Big Data ■ Big Data Developer Certifications ■ Curriculum ■ Course Details ■ Resources
About ITVersity ■ A Dallas based company focusing on – Engineering – Infrastructure – Training – Staffing ■ We have operations in Dallas,US as well as Hyderabad, IN ■ Training – focus areas – Product Engineering using FullStack Development – Data Engineering using Big Data eco system – DevOps Engineering including Cloud
Introduction to Big Data ■ Please go through this video at your leisure to get brief introduction about Data Engineering using Big Data – https://www.youtube.com/watch?v=Do-c4HeyLEI
Big Data Developer Certifications ■ Following are the popular Big Data Developer Certifications – CCA 175 Spark and Hadoop Developer – HDPCD:Spark – CCA 159 DataAnalyst (Hive and Sqoop) – Oreilly/Databricks Certified Developer – MapR Certified Spark Developer
Curriculum ■ Linux Essentials ■ Database Essentials (SQL) ■ Basics of Python Programming ■ Overview of Big Data eco system ■ Apache Sqoop ■ Core Spark ■ Spark SQL and Data Frames (includes Hive) ■ Streaming analytics using Flume, Kafka and Spark Streaming ■ Spark MLLib
Course Details ■ Start Date: November 7th (India) and November 6th (US) tentatively ■ 4 days a week, it can take up to 8 weeks ■ Timings: – 8 AM to 9:30 AM India time (Tuesday to Friday) – 9:30 PM to 11:00 PM US Eastern time (Monday toThursday) ■ Course Fee – 495$ per person for those who are based outside India – 25000 INR +GST per person for those who are in India – College going students can attend live sessions for free (if they have student plan for lab for 74.95$)
Resources ■ Videos will be recorded and streamed toYouTube ■ Pre-recorded courses for all certifications will be available on Udemy as well as on YouTube ■ 3 to 4 months lab access for those who paid in full ■ Certification simulator ■ Forum to discuss any issues related as part of the training.A new group will be created and tracked for the batch.
DATABASE ESSENTIALS SQL using Oracle and application express
Agenda ■ Introduction ■ Setup Environment ■ Basics of Normalization ■ Data Modeling concepts and key words ■ Interacting with database ■ SQL – Structured Query Language ■ Types of Databases ■ Dimensional Modeling for DSS
Introduction ■ About me - https://www.linkedin.com/in/durga0gadiraju/ ■ Why Database/RDBMS and SQL? – RDBMS stands for Relational Database Management System – eg: Oracle, MySQL, SQL Server, DB2 and more – All mission critical systems are built using RDBMS – SQL is the most popular way to interact with RDBMS – One of the essential for any IT professional (programming and Linux are others) ■ Why Oracle and Application Express? – Oracle is the market leader in RDBMS – Application Express is the web application which can be accessed from any where – We can practice SQL using Oracle from any where with out setting up locally
Introduction ■ About the course – Normalization principles - Briefly – DDL and DML – SQL – DataWarehousing principles – Introduction to NoSQL
Setup Environment ■ Using apex.oracle.com – Create workspace – Brief overview of application express – Setup HR schema in the workspace – Validate by running basic query
Basics of Normalization ■ First normal form – Data have structure but redundant ■ Second normal form – Redundancy is removed ■ Third normal form – No transitive dependency ■ BCNF – Boyce Codd Normal form ■ Most of the data models will be either inThird normal form or BCNF
Data Modeling ■ Logical Modeling – ER Diagram ■ Physical Modeling – DDL Scripts ■ Entity Table ■ Attribute  Column ■ Relationships – one to many, one to one, many to many – Parent Key – Foreign Key
Sample Data Model– HR
Sample Data Model – RETAIL_DB
Interacting with database ■ DDL – Creating physical structures (often tables) ■ DML – Manipulating data in tables (CUD of CRUD operations) ■ SQL – Reading/Querying data from tables (R of CRUD operations) ■ We will be reviewing DDL and DML from the scripts used to setup HR Schema ■ We will explore SQL a bit in detail
SQL – Structured Query Language ■ Different Clauses – SELECT, FROM,WHERE, GROUP BY, HAVING, JOIN, ORDER BY ■ Simple Query using DUAL (Oracle specific) ■ Functions (string manipulation as well as date manipulation) ■ NVL, DECODE and CASE ■ Filtering – usingWHERE clause ■ Aggregations – using GROUP BY and HAVING ■ Joining data sets – using JOIN ■ Sorting data – using ORDER BY ■ Analytics functions – aggregation, ranking as well as lead/lag functions
Types of Databases ■ OLTP – Primarily implemented using RDBMS ■ OLAP and DSS –Teradata, Greenplum,Vertica etc ■ NoSQL ■ Search based ■ Graph based ■ In memory ■ And more
Dimensional Modeling for DSS ■ Dimension ■ Fact ■ Measure ■ Hierarchy ■ Star Schema ■ Snowflake Schema ■ Inmon methodology ■ Kimball methodology

Big Data Certifications Workshop - 201711 - Introduction and Database Essentials

  • 1.
    BIG DATA WORKSHOP Comprehensive developerworkshop for all Spark based certifications
  • 2.
    Agenda ■ About ITVersity ■Introduction to Big Data ■ Big Data Developer Certifications ■ Curriculum ■ Course Details ■ Resources
  • 3.
    About ITVersity ■ ADallas based company focusing on – Engineering – Infrastructure – Training – Staffing ■ We have operations in Dallas,US as well as Hyderabad, IN ■ Training – focus areas – Product Engineering using FullStack Development – Data Engineering using Big Data eco system – DevOps Engineering including Cloud
  • 4.
    Introduction to BigData ■ Please go through this video at your leisure to get brief introduction about Data Engineering using Big Data – https://www.youtube.com/watch?v=Do-c4HeyLEI
  • 5.
    Big Data DeveloperCertifications ■ Following are the popular Big Data Developer Certifications – CCA 175 Spark and Hadoop Developer – HDPCD:Spark – CCA 159 DataAnalyst (Hive and Sqoop) – Oreilly/Databricks Certified Developer – MapR Certified Spark Developer
  • 6.
    Curriculum ■ Linux Essentials ■Database Essentials (SQL) ■ Basics of Python Programming ■ Overview of Big Data eco system ■ Apache Sqoop ■ Core Spark ■ Spark SQL and Data Frames (includes Hive) ■ Streaming analytics using Flume, Kafka and Spark Streaming ■ Spark MLLib
  • 7.
    Course Details ■ StartDate: November 7th (India) and November 6th (US) tentatively ■ 4 days a week, it can take up to 8 weeks ■ Timings: – 8 AM to 9:30 AM India time (Tuesday to Friday) – 9:30 PM to 11:00 PM US Eastern time (Monday toThursday) ■ Course Fee – 495$ per person for those who are based outside India – 25000 INR +GST per person for those who are in India – College going students can attend live sessions for free (if they have student plan for lab for 74.95$)
  • 8.
    Resources ■ Videos willbe recorded and streamed toYouTube ■ Pre-recorded courses for all certifications will be available on Udemy as well as on YouTube ■ 3 to 4 months lab access for those who paid in full ■ Certification simulator ■ Forum to discuss any issues related as part of the training.A new group will be created and tracked for the batch.
  • 9.
    DATABASE ESSENTIALS SQL using Oracleand application express
  • 10.
    Agenda ■ Introduction ■ SetupEnvironment ■ Basics of Normalization ■ Data Modeling concepts and key words ■ Interacting with database ■ SQL – Structured Query Language ■ Types of Databases ■ Dimensional Modeling for DSS
  • 11.
    Introduction ■ About me- https://www.linkedin.com/in/durga0gadiraju/ ■ Why Database/RDBMS and SQL? – RDBMS stands for Relational Database Management System – eg: Oracle, MySQL, SQL Server, DB2 and more – All mission critical systems are built using RDBMS – SQL is the most popular way to interact with RDBMS – One of the essential for any IT professional (programming and Linux are others) ■ Why Oracle and Application Express? – Oracle is the market leader in RDBMS – Application Express is the web application which can be accessed from any where – We can practice SQL using Oracle from any where with out setting up locally
  • 12.
    Introduction ■ About thecourse – Normalization principles - Briefly – DDL and DML – SQL – DataWarehousing principles – Introduction to NoSQL
  • 13.
    Setup Environment ■ Usingapex.oracle.com – Create workspace – Brief overview of application express – Setup HR schema in the workspace – Validate by running basic query
  • 14.
    Basics of Normalization ■First normal form – Data have structure but redundant ■ Second normal form – Redundancy is removed ■ Third normal form – No transitive dependency ■ BCNF – Boyce Codd Normal form ■ Most of the data models will be either inThird normal form or BCNF
  • 15.
    Data Modeling ■ LogicalModeling – ER Diagram ■ Physical Modeling – DDL Scripts ■ Entity Table ■ Attribute  Column ■ Relationships – one to many, one to one, many to many – Parent Key – Foreign Key
  • 16.
  • 17.
    Sample Data Model– RETAIL_DB
  • 18.
    Interacting with database ■DDL – Creating physical structures (often tables) ■ DML – Manipulating data in tables (CUD of CRUD operations) ■ SQL – Reading/Querying data from tables (R of CRUD operations) ■ We will be reviewing DDL and DML from the scripts used to setup HR Schema ■ We will explore SQL a bit in detail
  • 19.
    SQL – StructuredQuery Language ■ Different Clauses – SELECT, FROM,WHERE, GROUP BY, HAVING, JOIN, ORDER BY ■ Simple Query using DUAL (Oracle specific) ■ Functions (string manipulation as well as date manipulation) ■ NVL, DECODE and CASE ■ Filtering – usingWHERE clause ■ Aggregations – using GROUP BY and HAVING ■ Joining data sets – using JOIN ■ Sorting data – using ORDER BY ■ Analytics functions – aggregation, ranking as well as lead/lag functions
  • 20.
    Types of Databases ■OLTP – Primarily implemented using RDBMS ■ OLAP and DSS –Teradata, Greenplum,Vertica etc ■ NoSQL ■ Search based ■ Graph based ■ In memory ■ And more
  • 21.
    Dimensional Modeling forDSS ■ Dimension ■ Fact ■ Measure ■ Hierarchy ■ Star Schema ■ Snowflake Schema ■ Inmon methodology ■ Kimball methodology