This document provides an introduction to Apache Spark, including its core components, architecture, and programming model. Some key points: - Spark uses Resilient Distributed Datasets (RDDs) as its fundamental data structure, which are immutable distributed collections that allow in-memory computing across a cluster. - RDDs support transformations like map, filter, reduce, and actions like collect that return results. Transformations are lazy while actions trigger computation. - Spark's execution model involves a driver program that coordinates tasks on worker nodes using an optimized scheduler. - Spark SQL, MLlib, GraphX, and Spark Streaming extend the core Spark API for structured data, machine learning, graph processing, and stream processing