What’s in it for you? 1. History of Spark What’s in it for you?
What’s in it for you? 1. History of Spark 2. What is Spark? What’s in it for you?
What’s in it for you? 1. History of Spark 2. What is Spark? 3. Hadoop vs Spark What’s in it for you?
What’s in it for you? 1. History of Spark 2. What is Spark? 3. Hadoop vs Spark 4. Components of Apache Spark What’s in it for you? Spark Core Spark SQL Spark Streaming Spark MLlib GraphX
What’s in it for you? 1. History of Spark 2. What is Spark? 3. Hadoop vs Spark 4. Components of Apache Spark 5. Spark Architecture What’s in it for you?
What’s in it for you? 1. History of Spark 2. What is Spark? 3. Hadoop vs Spark 4. Components of Apache Spark 5. Spark Architecture 6. Applications of Spark What’s in it for you?
What’s in it for you? 1. History of Spark 2. What is Spark? 3. Hadoop vs Spark 4. Components of Apache Spark 5. Spark Architecture 6. Applications of Spark 7. Spark Use Case What’s in it for you?
History of Apache Spark Started as a project at UC Berkley AMPLab 2009
History of Apache Spark Started as a project at UC Berkley AMPLab Open sourced under a BSD license 2009 2010
History of Apache Spark Started as a project at UC Berkley AMPLab Open sourced under a BSD license Spark became an Apache top level project 2009 2010 2013
History of Apache Spark Started as a project at UC Berkley AMPLab Open sourced under a BSD license Spark became an Apache top level project Used by Databricks to sort large-scale datasets and set a new world record 2009 2010 2013 2014
History of Apache Spark What is Apache Spark?
What is Apache Spark? Apache Spark is an open-source data processing engine to store and process data in real-time across various clusters of computers using simple programming constructs
What is Apache Spark? Support various programming languages Apache Spark is an open-source data processing engine to store and process data in real-time across various clusters of computers using simple programming constructs
What is Apache Spark? Support various programming languages Developers and data scientists incorporate Spark into their applications to rapidly query, analyze, and transform data at scale Query Analyze Transform Apache Spark is an open-source data processing engine to store and process data in real-time across various clusters of computers using simple programming constructs
History of Apache Spark Hadoop vs Spark
Hadoop vs Spark Processing data using MapReduce in Hadoop is slow Spark processes data 100 times faster than MapReduce as it is done in- memory
Hadoop vs Spark Processing data using MapReduce in Hadoop is slow Spark processes data 100 times faster than MapReduce as it is done in- memory Performs batch processing of data Performs both batch processing and real-time processing of data
Hadoop vs Spark Processing data using MapReduce in Hadoop is slow Spark processes data 100 times faster than MapReduce as it is done in- memory Performs batch processing of data Performs both batch processing and real-time processing of data Hadoop has more lines of code. Since it is written in Java, it takes more time to execute Spark has fewer lines of code as it is implemented in Scala
Hadoop vs Spark Processing data using MapReduce in Hadoop is slow Spark processes data 100 times faster than MapReduce as it is done in- memory Performs batch processing of data Performs both batch processing and real-time processing of data Hadoop has more lines of code. Since it is written in Java, it takes more time to execute Spark has fewer lines of code as it is implemented in Scala Hadoop supports Kerberos authentication, which is difficult to manage Spark supports authentication via a shared secret. It can also run on YARN leveraging the capability of Kerberos
History of Apache Spark Spark Features
Spark Features Fast processing Spark contains Resilient Distributed Datasets (RDD) which saves time taken in reading, and writing operations and hence, it runs almost ten to hundred times faster than Hadoop
Spark Features In-memory computing In Spark, data is stored in the RAM, so it can access the data quickly and accelerate the speed of analytics Fast processing
Spark Features Flexible Spark supports multiple languages and allows the developers to write applications in Java, Scala, R, or Python In-memory computingFast processing
Spark Features Fault tolerance Spark contains Resilient Distributed Datasets (RDD) that are designed to handle the failure of any worker node in the cluster. Thus, it ensures that the loss of data reduces to zero Flexible In-memory computingFast processing
Spark Features Better analytics Spark has a rich set of SQL queries, machine learning algorithms, complex analytics, etc. With all these functionalities, analytics can be performed better Fault toleranceFlexible In-memory computingFast processing
History of Apache Spark Components of Spark
Components of Apache Spark Spark Core
Components of Apache Spark Spark Core Spark SQL SQL
Components of Apache Spark Spark Streaming Spark Core Spark SQL SQL Streaming
Components of Apache Spark MLlib Spark Streaming Spark Core Spark SQL SQL Streaming MLlib
Components of Apache Spark MLlib Spark Streaming Spark Core Spark SQL GraphX SQL Streaming MLlib
History of Apache Spark Components of Spark – Spark Core
Spark Core Spark Core Spark Core is the base engine for large-scale parallel and distributed data processing
Spark Core Spark Core Spark Core is the base engine for large-scale parallel and distributed data processing It is responsible for: memory management fault recovery scheduling, distributing and monitoring jobs on a cluster interacting with storage systems
Resilient Distributed Dataset Spark Core Spark Core is embedded with RDDs (Resilient Distributed Datasets), an immutable fault-tolerant, distributed collection of objects that can be operated on in parallel RDD Transformation Action These are operations (such as reduce, first, count) that return a value after running a computation on an RDD These are operations (such as map, filter, join, union) that are performed on an RDD that yields a new RDD containing the result
History of Apache Spark Components of Spark – Spark SQL
Spark SQL Spark SQL framework component is used for structured and semi-structured data processing Spark SQL SQL
Spark SQL Spark SQL framework component is used for structured and semi-structured data processing Spark SQL SQL DataFrame DSL Spark SQL and HQL DataFrame API Data Source API CSV JSON JDBC Spark SQL Architecture
History of Apache Spark Components of Spark – Spark Streaming
Spark Streaming Spark Streaming is a lightweight API that allows developers to perform batch processing and real-time streaming of data with ease Spark Streaming Streaming Provides secure, reliable, and fast processing of live data streams
Spark Streaming Spark Streaming is a lightweight API that allows developers to perform batch processing and real-time streaming of data with ease Spark Streaming Streaming Provides secure, reliable, and fast processing of live data streams Streaming Engine Input data stream Batches of input data Batches of processed data
History of Apache Spark Components of Spark – Spark MLlib
Spark MLlib MLlib is a low-level machine learning library that is simple to use, is scalable, and compatible with various programming languages MLlib MLlib MLlib eases the deployment and development of scalable machine learning algorithms
Spark MLlib MLlib is a low-level machine learning library that is simple to use, is scalable, and compatible with various programming languages MLlib MLlib MLlib eases the deployment and development of scalable machine learning algorithms It contains machine learning libraries that have an implementation of various machine learning algorithms Clustering Classification Collaborative Filtering
History of Apache Spark Components of Spark – GraphX
GraphX GraphX is Spark’s own Graph Computation Engine and data store GraphX
GraphX GraphX is Spark’s own Graph Computation Engine and data store GraphX Provides a uniform tool for ETL Exploratory data analysis Interactive graph computations
History of Apache Spark Spark Architecture
Master Node Driver Program SparkContext • Master Node has a Driver Program • The Spark code behaves as a driver program and creates a SparkContext, which is a gateway to all the Spark functionalities Apache Spark uses a master-slave architecture that consists of a driver, that runs on a master node, and multiple executors which run across the worker nodes in the cluster Spark Architecture
Master Node Driver Program SparkContext Cluster Manager • Spark applications run as independent sets of processes on a cluster • The driver program & Spark context takes care of the job execution within the cluster Spark Architecture
Master Node Driver Program SparkContext Cluster Manager Cache Task Task Executor Worker Node Cache Task Task Executor Worker Node • A job is split into multiple tasks that are distributed over the worker node • When an RDD is created in Spark context, it can be distributed across various nodes • Worker nodes are slaves that run different tasks Spark Architecture
Master Node Driver Program SparkContext Cluster Manager Cache Task Task Executor Worker Node Cache Task Task Executor Worker Node • The Executor is responsible for the execution of these tasks • Worker nodes execute the tasks assigned by the Cluster Manager and return the results back to the SparkContext Spark Architecture
Spark Cluster Managers Standalone mode 1 By default, applications submitted to the standalone mode cluster will run in FIFO order, and each application will try to use all available nodes
Spark Cluster Managers Standalone mode 1 2 By default, applications submitted to the standalone mode cluster will run in FIFO order, and each application will try to use all available nodes Apache Mesos is an open-source project to manage computer clusters, and can also run Hadoop applications
Spark Cluster Managers Standalone mode 1 2 3 By default, applications submitted to the standalone mode cluster will run in FIFO order, and each application will try to use all available nodes Apache Mesos is an open-source project to manage computer clusters, and can also run Hadoop applications Apache YARN is the cluster resource manager of Hadoop 2. Spark can be run on YARN
Spark Cluster Managers Standalone mode 1 2 3 4 By default, applications submitted to the standalone mode cluster will run in FIFO order, and each application will try to use all available nodes Apache Mesos is an open-source project to manage computer clusters, and can also run Hadoop applications Apache YARN is the cluster resource manager of Hadoop 2. Spark can be run on YARN Kubernetes is an open- source system for automating deployment, scaling, and management of containerized applications
History of Apache Spark Applications of Spark
Applications of Spark Banking JPMorgan uses Spark to detect fraudulent transactions, analyze the business spends of an individual to suggest offers, and identify patterns to decide how much to invest and where to invest
Applications of Spark Banking E-Commerce JPMorgan uses Spark to detect fraudulent transactions, analyze the business spends of an individual to suggest offers, and identify patterns to decide how much to invest and where to invest Alibaba uses Spark to analyze large sets of data such as real-time transaction details, browsing history, etc. in the form of Spark jobs and provides recommendations to its users
Applications of Spark Banking E-Commerce Healthcare JPMorgan uses Spark to detect fraudulent transactions, analyze the business spends of an individual to suggest offers, and identify patterns to decide how much to invest and where to invest Alibaba uses Spark to analyze large sets of data such as real-time transaction details, browsing history, etc. in the form of Spark jobs and provides recommendations to its users IQVIA is a leading healthcare company that uses Spark to analyze patient’s data, identify possible health issues, and diagnose it based on their medical history
Applications of Spark Banking E-Commerce Healthcare Entertainment JPMorgan uses Spark to detect fraudulent transactions, analyze the business spends of an individual to suggest offers, and identify patterns to decide how much to invest and where to invest Alibaba uses Spark to analyze large sets of data such as real-time transaction details, browsing history, etc. in the form of Spark jobs and provides recommendations to its users IQVIA is a leading healthcare company that uses Spark to analyze patient’s data, identify possible health issues, and diagnose it based on their medical history Entertainment and gaming companies like Netflix and Riot games use Apache Spark to showcase relevant advertisements to their users based on the videos that they watch, share, and like
History of Apache Spark Spark Use Case
Spark Use Case Conviva is one of the world’s leading video streaming companies
Spark Use Case Conviva is one of the world’s leading video streaming companies Video streaming is a challenge, especially with increasing demand for high-quality streaming experiences
Spark Use Case Conviva is one of the world’s leading video streaming companies Video streaming is a challenge, especially with increasing demand for high-quality streaming experiences Conviva collects data about video streaming quality to give their customers visibility into the end- user experience they are delivering
Spark Use Case Conviva is one of the world’s leading video streaming companies Using Apache Spark, Conviva delivers a better quality of service to its customers by removing the screen buffering and learning in detail about the network conditions in real-time
Spark Use Case Conviva is one of the world’s leading video streaming companies Using Apache Spark, Conviva delivers a better quality of service to its customers by removing the screen buffering and learning in detail about the network conditions in real-time This information is stored in the video player to manage live video traffic coming from 4 billion video feeds every month, to ensure maximum retention
Spark Use Case Conviva is one of the world’s leading video streaming companies Using Apache Spark, Conviva has created an auto diagnostics alert
Spark Use Case Conviva is one of the world’s leading video streaming companies Using Apache Spark, Conviva has created an auto diagnostics alert It automatically detects anomalies along the video streaming pipeline and diagnoses the root cause of the issue
Spark Use Case Conviva is one of the world’s leading video streaming companies Using Apache Spark, Conviva has created an auto diagnostics alert It automatically detects anomalies along the video streaming pipeline and diagnoses the root cause of the issue Reduces waiting time before the video starts
Spark Use Case Conviva is one of the world’s leading video streaming companies Using Apache Spark, Conviva has created an auto diagnostics alert It automatically detects anomalies along the video streaming pipeline and diagnoses the root cause of the issue Reduces waiting time before the video starts Avoids buffering and recovers the video from a technical error
Spark Use Case Conviva is one of the world’s leading video streaming companies Using Apache Spark, Conviva has created an auto diagnostics alert It automatically detects anomalies along the video streaming pipeline and diagnoses the root cause of the issue Reduces waiting time before the video starts Avoids buffering and recovers the video from a technical error Goal is to maximize the viewer engagement
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial | Simplilearn

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial | Simplilearn

  • 2.
    What’s in itfor you? 1. History of Spark What’s in it for you?
  • 3.
    What’s in itfor you? 1. History of Spark 2. What is Spark? What’s in it for you?
  • 4.
    What’s in itfor you? 1. History of Spark 2. What is Spark? 3. Hadoop vs Spark What’s in it for you?
  • 5.
    What’s in itfor you? 1. History of Spark 2. What is Spark? 3. Hadoop vs Spark 4. Components of Apache Spark What’s in it for you? Spark Core Spark SQL Spark Streaming Spark MLlib GraphX
  • 6.
    What’s in itfor you? 1. History of Spark 2. What is Spark? 3. Hadoop vs Spark 4. Components of Apache Spark 5. Spark Architecture What’s in it for you?
  • 7.
    What’s in itfor you? 1. History of Spark 2. What is Spark? 3. Hadoop vs Spark 4. Components of Apache Spark 5. Spark Architecture 6. Applications of Spark What’s in it for you?
  • 8.
    What’s in itfor you? 1. History of Spark 2. What is Spark? 3. Hadoop vs Spark 4. Components of Apache Spark 5. Spark Architecture 6. Applications of Spark 7. Spark Use Case What’s in it for you?
  • 9.
    History of ApacheSpark Started as a project at UC Berkley AMPLab 2009
  • 10.
    History of ApacheSpark Started as a project at UC Berkley AMPLab Open sourced under a BSD license 2009 2010
  • 11.
    History of ApacheSpark Started as a project at UC Berkley AMPLab Open sourced under a BSD license Spark became an Apache top level project 2009 2010 2013
  • 12.
    History of ApacheSpark Started as a project at UC Berkley AMPLab Open sourced under a BSD license Spark became an Apache top level project Used by Databricks to sort large-scale datasets and set a new world record 2009 2010 2013 2014
  • 13.
    History of ApacheSpark What is Apache Spark?
  • 14.
    What is ApacheSpark? Apache Spark is an open-source data processing engine to store and process data in real-time across various clusters of computers using simple programming constructs
  • 15.
    What is ApacheSpark? Support various programming languages Apache Spark is an open-source data processing engine to store and process data in real-time across various clusters of computers using simple programming constructs
  • 16.
    What is ApacheSpark? Support various programming languages Developers and data scientists incorporate Spark into their applications to rapidly query, analyze, and transform data at scale Query Analyze Transform Apache Spark is an open-source data processing engine to store and process data in real-time across various clusters of computers using simple programming constructs
  • 17.
    History of ApacheSpark Hadoop vs Spark
  • 18.
    Hadoop vs Spark Processingdata using MapReduce in Hadoop is slow Spark processes data 100 times faster than MapReduce as it is done in- memory
  • 19.
    Hadoop vs Spark Processingdata using MapReduce in Hadoop is slow Spark processes data 100 times faster than MapReduce as it is done in- memory Performs batch processing of data Performs both batch processing and real-time processing of data
  • 20.
    Hadoop vs Spark Processingdata using MapReduce in Hadoop is slow Spark processes data 100 times faster than MapReduce as it is done in- memory Performs batch processing of data Performs both batch processing and real-time processing of data Hadoop has more lines of code. Since it is written in Java, it takes more time to execute Spark has fewer lines of code as it is implemented in Scala
  • 21.
    Hadoop vs Spark Processingdata using MapReduce in Hadoop is slow Spark processes data 100 times faster than MapReduce as it is done in- memory Performs batch processing of data Performs both batch processing and real-time processing of data Hadoop has more lines of code. Since it is written in Java, it takes more time to execute Spark has fewer lines of code as it is implemented in Scala Hadoop supports Kerberos authentication, which is difficult to manage Spark supports authentication via a shared secret. It can also run on YARN leveraging the capability of Kerberos
  • 22.
    History of ApacheSpark Spark Features
  • 23.
    Spark Features Fast processing Sparkcontains Resilient Distributed Datasets (RDD) which saves time taken in reading, and writing operations and hence, it runs almost ten to hundred times faster than Hadoop
  • 24.
    Spark Features In-memory computing In Spark,data is stored in the RAM, so it can access the data quickly and accelerate the speed of analytics Fast processing
  • 25.
    Spark Features Flexible Spark supportsmultiple languages and allows the developers to write applications in Java, Scala, R, or Python In-memory computingFast processing
  • 26.
    Spark Features Fault tolerance Sparkcontains Resilient Distributed Datasets (RDD) that are designed to handle the failure of any worker node in the cluster. Thus, it ensures that the loss of data reduces to zero Flexible In-memory computingFast processing
  • 27.
    Spark Features Better analytics Sparkhas a rich set of SQL queries, machine learning algorithms, complex analytics, etc. With all these functionalities, analytics can be performed better Fault toleranceFlexible In-memory computingFast processing
  • 28.
    History of ApacheSpark Components of Spark
  • 29.
    Components of ApacheSpark Spark Core
  • 30.
    Components of ApacheSpark Spark Core Spark SQL SQL
  • 31.
    Components of ApacheSpark Spark Streaming Spark Core Spark SQL SQL Streaming
  • 32.
    Components of ApacheSpark MLlib Spark Streaming Spark Core Spark SQL SQL Streaming MLlib
  • 33.
    Components of ApacheSpark MLlib Spark Streaming Spark Core Spark SQL GraphX SQL Streaming MLlib
  • 34.
    History of ApacheSpark Components of Spark – Spark Core
  • 35.
    Spark Core Spark Core SparkCore is the base engine for large-scale parallel and distributed data processing
  • 36.
    Spark Core Spark Core SparkCore is the base engine for large-scale parallel and distributed data processing It is responsible for: memory management fault recovery scheduling, distributing and monitoring jobs on a cluster interacting with storage systems
  • 37.
    Resilient Distributed Dataset SparkCore Spark Core is embedded with RDDs (Resilient Distributed Datasets), an immutable fault-tolerant, distributed collection of objects that can be operated on in parallel RDD Transformation Action These are operations (such as reduce, first, count) that return a value after running a computation on an RDD These are operations (such as map, filter, join, union) that are performed on an RDD that yields a new RDD containing the result
  • 38.
    History of ApacheSpark Components of Spark – Spark SQL
  • 39.
    Spark SQL Spark SQLframework component is used for structured and semi-structured data processing Spark SQL SQL
  • 40.
    Spark SQL Spark SQLframework component is used for structured and semi-structured data processing Spark SQL SQL DataFrame DSL Spark SQL and HQL DataFrame API Data Source API CSV JSON JDBC Spark SQL Architecture
  • 41.
    History of ApacheSpark Components of Spark – Spark Streaming
  • 42.
    Spark Streaming Spark Streamingis a lightweight API that allows developers to perform batch processing and real-time streaming of data with ease Spark Streaming Streaming Provides secure, reliable, and fast processing of live data streams
  • 43.
    Spark Streaming Spark Streamingis a lightweight API that allows developers to perform batch processing and real-time streaming of data with ease Spark Streaming Streaming Provides secure, reliable, and fast processing of live data streams Streaming Engine Input data stream Batches of input data Batches of processed data
  • 44.
    History of ApacheSpark Components of Spark – Spark MLlib
  • 45.
    Spark MLlib MLlib isa low-level machine learning library that is simple to use, is scalable, and compatible with various programming languages MLlib MLlib MLlib eases the deployment and development of scalable machine learning algorithms
  • 46.
    Spark MLlib MLlib isa low-level machine learning library that is simple to use, is scalable, and compatible with various programming languages MLlib MLlib MLlib eases the deployment and development of scalable machine learning algorithms It contains machine learning libraries that have an implementation of various machine learning algorithms Clustering Classification Collaborative Filtering
  • 47.
    History of ApacheSpark Components of Spark – GraphX
  • 48.
    GraphX GraphX is Spark’sown Graph Computation Engine and data store GraphX
  • 49.
    GraphX GraphX is Spark’sown Graph Computation Engine and data store GraphX Provides a uniform tool for ETL Exploratory data analysis Interactive graph computations
  • 50.
    History of ApacheSpark Spark Architecture
  • 51.
    Master Node Driver Program SparkContext •Master Node has a Driver Program • The Spark code behaves as a driver program and creates a SparkContext, which is a gateway to all the Spark functionalities Apache Spark uses a master-slave architecture that consists of a driver, that runs on a master node, and multiple executors which run across the worker nodes in the cluster Spark Architecture
  • 52.
    Master Node Driver Program SparkContextCluster Manager • Spark applications run as independent sets of processes on a cluster • The driver program & Spark context takes care of the job execution within the cluster Spark Architecture
  • 53.
    Master Node Driver Program SparkContextCluster Manager Cache Task Task Executor Worker Node Cache Task Task Executor Worker Node • A job is split into multiple tasks that are distributed over the worker node • When an RDD is created in Spark context, it can be distributed across various nodes • Worker nodes are slaves that run different tasks Spark Architecture
  • 54.
    Master Node Driver Program SparkContextCluster Manager Cache Task Task Executor Worker Node Cache Task Task Executor Worker Node • The Executor is responsible for the execution of these tasks • Worker nodes execute the tasks assigned by the Cluster Manager and return the results back to the SparkContext Spark Architecture
  • 55.
    Spark Cluster Managers Standalonemode 1 By default, applications submitted to the standalone mode cluster will run in FIFO order, and each application will try to use all available nodes
  • 56.
    Spark Cluster Managers Standalonemode 1 2 By default, applications submitted to the standalone mode cluster will run in FIFO order, and each application will try to use all available nodes Apache Mesos is an open-source project to manage computer clusters, and can also run Hadoop applications
  • 57.
    Spark Cluster Managers Standalonemode 1 2 3 By default, applications submitted to the standalone mode cluster will run in FIFO order, and each application will try to use all available nodes Apache Mesos is an open-source project to manage computer clusters, and can also run Hadoop applications Apache YARN is the cluster resource manager of Hadoop 2. Spark can be run on YARN
  • 58.
    Spark Cluster Managers Standalonemode 1 2 3 4 By default, applications submitted to the standalone mode cluster will run in FIFO order, and each application will try to use all available nodes Apache Mesos is an open-source project to manage computer clusters, and can also run Hadoop applications Apache YARN is the cluster resource manager of Hadoop 2. Spark can be run on YARN Kubernetes is an open- source system for automating deployment, scaling, and management of containerized applications
  • 59.
    History of ApacheSpark Applications of Spark
  • 60.
    Applications of Spark Banking JPMorganuses Spark to detect fraudulent transactions, analyze the business spends of an individual to suggest offers, and identify patterns to decide how much to invest and where to invest
  • 61.
    Applications of Spark BankingE-Commerce JPMorgan uses Spark to detect fraudulent transactions, analyze the business spends of an individual to suggest offers, and identify patterns to decide how much to invest and where to invest Alibaba uses Spark to analyze large sets of data such as real-time transaction details, browsing history, etc. in the form of Spark jobs and provides recommendations to its users
  • 62.
    Applications of Spark BankingE-Commerce Healthcare JPMorgan uses Spark to detect fraudulent transactions, analyze the business spends of an individual to suggest offers, and identify patterns to decide how much to invest and where to invest Alibaba uses Spark to analyze large sets of data such as real-time transaction details, browsing history, etc. in the form of Spark jobs and provides recommendations to its users IQVIA is a leading healthcare company that uses Spark to analyze patient’s data, identify possible health issues, and diagnose it based on their medical history
  • 63.
    Applications of Spark BankingE-Commerce Healthcare Entertainment JPMorgan uses Spark to detect fraudulent transactions, analyze the business spends of an individual to suggest offers, and identify patterns to decide how much to invest and where to invest Alibaba uses Spark to analyze large sets of data such as real-time transaction details, browsing history, etc. in the form of Spark jobs and provides recommendations to its users IQVIA is a leading healthcare company that uses Spark to analyze patient’s data, identify possible health issues, and diagnose it based on their medical history Entertainment and gaming companies like Netflix and Riot games use Apache Spark to showcase relevant advertisements to their users based on the videos that they watch, share, and like
  • 64.
    History of ApacheSpark Spark Use Case
  • 65.
    Spark Use Case Convivais one of the world’s leading video streaming companies
  • 66.
    Spark Use Case Convivais one of the world’s leading video streaming companies Video streaming is a challenge, especially with increasing demand for high-quality streaming experiences
  • 67.
    Spark Use Case Convivais one of the world’s leading video streaming companies Video streaming is a challenge, especially with increasing demand for high-quality streaming experiences Conviva collects data about video streaming quality to give their customers visibility into the end- user experience they are delivering
  • 68.
    Spark Use Case Convivais one of the world’s leading video streaming companies Using Apache Spark, Conviva delivers a better quality of service to its customers by removing the screen buffering and learning in detail about the network conditions in real-time
  • 69.
    Spark Use Case Convivais one of the world’s leading video streaming companies Using Apache Spark, Conviva delivers a better quality of service to its customers by removing the screen buffering and learning in detail about the network conditions in real-time This information is stored in the video player to manage live video traffic coming from 4 billion video feeds every month, to ensure maximum retention
  • 70.
    Spark Use Case Convivais one of the world’s leading video streaming companies Using Apache Spark, Conviva has created an auto diagnostics alert
  • 71.
    Spark Use Case Convivais one of the world’s leading video streaming companies Using Apache Spark, Conviva has created an auto diagnostics alert It automatically detects anomalies along the video streaming pipeline and diagnoses the root cause of the issue
  • 72.
    Spark Use Case Convivais one of the world’s leading video streaming companies Using Apache Spark, Conviva has created an auto diagnostics alert It automatically detects anomalies along the video streaming pipeline and diagnoses the root cause of the issue Reduces waiting time before the video starts
  • 73.
    Spark Use Case Convivais one of the world’s leading video streaming companies Using Apache Spark, Conviva has created an auto diagnostics alert It automatically detects anomalies along the video streaming pipeline and diagnoses the root cause of the issue Reduces waiting time before the video starts Avoids buffering and recovers the video from a technical error
  • 74.
    Spark Use Case Convivais one of the world’s leading video streaming companies Using Apache Spark, Conviva has created an auto diagnostics alert It automatically detects anomalies along the video streaming pipeline and diagnoses the root cause of the issue Reduces waiting time before the video starts Avoids buffering and recovers the video from a technical error Goal is to maximize the viewer engagement

Editor's Notes