This document discusses scaling machine learning using Apache Spark. It covers several key topics: 1) Parallelizing machine learning algorithms and neural networks to distribute computation across clusters. This includes data, model, and parameter server parallelism. 2) Apache Spark's Resilient Distributed Datasets (RDDs) programming model which allows distributing data and computation across a cluster in a fault-tolerant manner. 3) Examples of very large neural networks trained on clusters, such as a Google face detection model using 1,000 servers and a IBM brain-inspired chip model using 262,144 CPUs.