APACHE SPARK PREPARING FOR THE NEXT WAVE OF REACTIVE BIG DATA
74% Developers 8% Data Scientists 7% C-level execs TOP 3 LANGUAGES USED WITH SPARK 88% Scala 44% Java 22% Python 31% are evaluating Spark now are running Spark in production 13% 82% of users chose Spark to replace MapReduce 78% of users need faster processing of larger data sets 62% of users load data into Spark with Hadoop DFS 54% of users run Spark standalone 67% of users need Spark for event stream processing 20% are planning to use Spark in 2015 TOP 3 INDUSTRIES RESPONDENTS Telecoms, Banks, Retail APACHE SPARK SURVEY 2015 - QUICK SNAPSHOT
3 JOB TYPE/ROLE 7.5%Data Scientist 6.5%C-Level Executive 3.5%Software Architect 3.5%Dev Ops 1% Business Analyst 74%Developer 6.5%Other INDUSTRY FOCUS 33%Other 5%Consulting 4%Healthcare / Insurance 9%Advertising 10% Software / Technology 11%Retail 12%Banking / Finance 16% Telecommunications / Networks Including Biotechnology/Chemistry, Machinery, Education, Government and Utilities and other sectors
4 INFRASTRUCTURE TECHNOLOGIES IN USE 53% Amazon EC2 34% Docker 22% Cloudera CDH 16% Ansible 14% Mesos 13% OpenStack 12% Apache.org Builds of Hadoop 10% HortonWorks HDP 10% Heroku 8% Google Compute Engine 7% Core OS 7% MapR Hadoop Distribution 6% Microsoft Azure 5% Marathon 4% Kubernetes 2% Aurora 11% Other XaaS
5 Evaluating Spark now Currently using in production Evaluated, not planning to use Evaluated, will use in 2016 or later Um, what’s Spark? Planning to use in 2015 31% 28% 20% 13% 6% 2% CURRENT RELATIONSHIP WITH SPARK
6 Fast Batch Processing of Large Data Sets 78% Support for Event Stream Processing 60% Fast Data Queries in Real Time 56% Improved Programmer Productivity 55% BUSINESS GOALS IN MIND
7 SPARK FEATURES/MODULES IN DEMAND 25% 59% 65% 82% 51% Core API as a Replacement for MapReduce Streaming Library (Spark Streaming) Machine Learning Library (MLlib) Integrated SQL (SparkSQL) Graph Algorithms Library (GraphX)
8 DATA PROCESSING WITH SPARK 39% 41% 46% 46% 59% 61% Read or Write Data to One or More Databases Static Reports SQL Queries and Business Intelligence Write Data to Hadoop Distributed File System (HDFS) Ad-hoc Queries and Reporting ETL Data from External Sources 67% Event Stream Processing 71% 65% 40% Use Spark as Part of a Larger Data Pipeline Extract Information from Data Sooner Rather than Later Automate Decision Making at Runtime
9 2nd Java 44% 1st Scala 88% 3rd Python 22% WHICH LANGUAGES ARE IMPORTANT TO YOUR SPARK INSTALLATION? Honorable mentions: R, Clojure, Groovy, Ruby & Go
10 HOW DO YOU LOAD DATA INTO SPARK? 62% Hadoop Distributed File System (HDFS) 18% Other Services (e.g. over socket connection) 41% Apache Kafka 46% Databases 29% Amazon S3 12% Other* *Including: Apache Cassandra, Amazon Kinesis and Apache HBase
11 Typesafe (Twitter: @Typesafe) is dedicated to helping developers build Reactive applications on the JVM. Backed by Greylock Partners, Shasta Ventures, Bain Capital Ventures and Juniper Networks, Typesafe is headquartered in San Francisco with offices in Switzerland and Sweden. To start building Reactive applications today, download Typesafe Activator. © 2015 Typesafe Hello, Apache Spark! Typesafe Activator template for devs DOWNLOAD Get the FULL report (PDF) DOWNLOAD

[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data

  • 1.
    APACHE SPARK PREPARING FORTHE NEXT WAVE OF REACTIVE BIG DATA
  • 2.
    74% Developers 8% DataScientists 7% C-level execs TOP 3 LANGUAGES USED WITH SPARK 88% Scala 44% Java 22% Python 31% are evaluating Spark now are running Spark in production 13% 82% of users chose Spark to replace MapReduce 78% of users need faster processing of larger data sets 62% of users load data into Spark with Hadoop DFS 54% of users run Spark standalone 67% of users need Spark for event stream processing 20% are planning to use Spark in 2015 TOP 3 INDUSTRIES RESPONDENTS Telecoms, Banks, Retail APACHE SPARK SURVEY 2015 - QUICK SNAPSHOT
  • 3.
    3 JOB TYPE/ROLE 7.5%Data Scientist 6.5%C-LevelExecutive 3.5%Software Architect 3.5%Dev Ops 1% Business Analyst 74%Developer 6.5%Other INDUSTRY FOCUS 33%Other 5%Consulting 4%Healthcare / Insurance 9%Advertising 10% Software / Technology 11%Retail 12%Banking / Finance 16% Telecommunications / Networks Including Biotechnology/Chemistry, Machinery, Education, Government and Utilities and other sectors
  • 4.
    4 INFRASTRUCTURE TECHNOLOGIES INUSE 53% Amazon EC2 34% Docker 22% Cloudera CDH 16% Ansible 14% Mesos 13% OpenStack 12% Apache.org Builds of Hadoop 10% HortonWorks HDP 10% Heroku 8% Google Compute Engine 7% Core OS 7% MapR Hadoop Distribution 6% Microsoft Azure 5% Marathon 4% Kubernetes 2% Aurora 11% Other XaaS
  • 5.
    5 Evaluating Spark now Currently using inproduction Evaluated, not planning to use Evaluated, will use in 2016 or later Um, what’s Spark? Planning to use in 2015 31% 28% 20% 13% 6% 2% CURRENT RELATIONSHIP WITH SPARK
  • 6.
    6 Fast Batch Processing of LargeData Sets 78% Support for Event Stream Processing 60% Fast Data Queries in Real Time 56% Improved Programmer Productivity 55% BUSINESS GOALS IN MIND
  • 7.
    7 SPARK FEATURES/MODULES INDEMAND 25% 59% 65% 82% 51% Core API as a Replacement for MapReduce Streaming Library (Spark Streaming) Machine Learning Library (MLlib) Integrated SQL (SparkSQL) Graph Algorithms Library (GraphX)
  • 8.
    8 DATA PROCESSING WITHSPARK 39% 41% 46% 46% 59% 61% Read or Write Data to One or More Databases Static Reports SQL Queries and Business Intelligence Write Data to Hadoop Distributed File System (HDFS) Ad-hoc Queries and Reporting ETL Data from External Sources 67% Event Stream Processing 71% 65% 40% Use Spark as Part of a Larger Data Pipeline Extract Information from Data Sooner Rather than Later Automate Decision Making at Runtime
  • 9.
    9 2nd Java 44% 1st Scala 88% 3rd Python22% WHICH LANGUAGES ARE IMPORTANT TO YOUR SPARK INSTALLATION? Honorable mentions: R, Clojure, Groovy, Ruby & Go
  • 10.
    10 HOW DO YOULOAD DATA INTO SPARK? 62% Hadoop Distributed File System (HDFS) 18% Other Services (e.g. over socket connection) 41% Apache Kafka 46% Databases 29% Amazon S3 12% Other* *Including: Apache Cassandra, Amazon Kinesis and Apache HBase
  • 11.
    11 Typesafe (Twitter: @Typesafe)is dedicated to helping developers build Reactive applications on the JVM. Backed by Greylock Partners, Shasta Ventures, Bain Capital Ventures and Juniper Networks, Typesafe is headquartered in San Francisco with offices in Switzerland and Sweden. To start building Reactive applications today, download Typesafe Activator. © 2015 Typesafe Hello, Apache Spark! Typesafe Activator template for devs DOWNLOAD Get the FULL report (PDF) DOWNLOAD