Big Data Certifications Workshop - 201711 - Introduction and Database Essentials

BIG DATA WORKSHOP Comprehensive developer workshop for all Spark based certifications

Agenda ■ About ITVersity ■ Introduction to Big Data ■ Big Data Developer Certifications ■ Curriculum ■ Course Details ■ Resources

About ITVersity ■ A Dallas based company focusing on – Engineering – Infrastructure – Training – Staffing ■ We have operations in Dallas,US as well as Hyderabad, IN ■ Training – focus areas – Product Engineering using FullStack Development – Data Engineering using Big Data eco system – DevOps Engineering including Cloud

Introduction to Big Data ■ Please go through this video at your leisure to get brief introduction about Data Engineering using Big Data – https://www.youtube.com/watch?v=Do-c4HeyLEI

Big Data Developer Certifications ■ Following are the popular Big Data Developer Certifications – CCA 175 Spark and Hadoop Developer – HDPCD:Spark – CCA 159 DataAnalyst (Hive and Sqoop) – Oreilly/Databricks Certified Developer – MapR Certified Spark Developer

Curriculum ■ Linux Essentials ■ Database Essentials (SQL) ■ Basics of Python Programming ■ Overview of Big Data eco system ■ Apache Sqoop ■ Core Spark ■ Spark SQL and Data Frames (includes Hive) ■ Streaming analytics using Flume, Kafka and Spark Streaming ■ Spark MLLib

Course Details ■ Start Date: November 7th (India) and November 6th (US) tentatively ■ 4 days a week, it can take up to 8 weeks ■ Timings: – 8 AM to 9:30 AM India time (Tuesday to Friday) – 9:30 PM to 11:00 PM US Eastern time (Monday toThursday) ■ Course Fee – 495$ per person for those who are based outside India – 25000 INR +GST per person for those who are in India – College going students can attend live sessions for free (if they have student plan for lab for 74.95$)

Resources ■ Videos will be recorded and streamed toYouTube ■ Pre-recorded courses for all certifications will be available on Udemy as well as on YouTube ■ 3 to 4 months lab access for those who paid in full ■ Certification simulator ■ Forum to discuss any issues related as part of the training.A new group will be created and tracked for the batch.

DATABASE ESSENTIALS SQL using Oracle and application express

Agenda ■ Introduction ■ Setup Environment ■ Basics of Normalization ■ Data Modeling concepts and key words ■ Interacting with database ■ SQL – Structured Query Language ■ Types of Databases ■ Dimensional Modeling for DSS

Introduction ■ About me - https://www.linkedin.com/in/durga0gadiraju/ ■ Why Database/RDBMS and SQL? – RDBMS stands for Relational Database Management System – eg: Oracle, MySQL, SQL Server, DB2 and more – All mission critical systems are built using RDBMS – SQL is the most popular way to interact with RDBMS – One of the essential for any IT professional (programming and Linux are others) ■ Why Oracle and Application Express? – Oracle is the market leader in RDBMS – Application Express is the web application which can be accessed from any where – We can practice SQL using Oracle from any where with out setting up locally

Introduction ■ About the course – Normalization principles - Briefly – DDL and DML – SQL – DataWarehousing principles – Introduction to NoSQL

Setup Environment ■ Using apex.oracle.com – Create workspace – Brief overview of application express – Setup HR schema in the workspace – Validate by running basic query

Basics of Normalization ■ First normal form – Data have structure but redundant ■ Second normal form – Redundancy is removed ■ Third normal form – No transitive dependency ■ BCNF – Boyce Codd Normal form ■ Most of the data models will be either inThird normal form or BCNF

Data Modeling ■ Logical Modeling – ER Diagram ■ Physical Modeling – DDL Scripts ■ Entity Table ■ Attribute  Column ■ Relationships – one to many, one to one, many to many – Parent Key – Foreign Key

Sample Data Model – RETAIL_DB

Interacting with database ■ DDL – Creating physical structures (often tables) ■ DML – Manipulating data in tables (CUD of CRUD operations) ■ SQL – Reading/Querying data from tables (R of CRUD operations) ■ We will be reviewing DDL and DML from the scripts used to setup HR Schema ■ We will explore SQL a bit in detail

SQL – Structured Query Language ■ Different Clauses – SELECT, FROM,WHERE, GROUP BY, HAVING, JOIN, ORDER BY ■ Simple Query using DUAL (Oracle specific) ■ Functions (string manipulation as well as date manipulation) ■ NVL, DECODE and CASE ■ Filtering – usingWHERE clause ■ Aggregations – using GROUP BY and HAVING ■ Joining data sets – using JOIN ■ Sorting data – using ORDER BY ■ Analytics functions – aggregation, ranking as well as lead/lag functions

Types of Databases ■ OLTP – Primarily implemented using RDBMS ■ OLAP and DSS –Teradata, Greenplum,Vertica etc ■ NoSQL ■ Search based ■ Graph based ■ In memory ■ And more

Dimensional Modeling for DSS ■ Dimension ■ Fact ■ Measure ■ Hierarchy ■ Star Schema ■ Snowflake Schema ■ Inmon methodology ■ Kimball methodology

Big Data Certifications Workshop - 201711 - Introduction and Database Essentials

More Related Content

What's hot

Similar to Big Data Certifications Workshop - 201711 - Introduction and Database Essentials

More from Durga Gadiraju

Recently uploaded

Big Data Certifications Workshop - 201711 - Introduction and Database Essentials