hammerlab / spark-tests   2.4.0

Apache License 2.0 GitHub

Utilities for writing tests that use Apache Spark.

Scala versions: 2.12 2.11

spark-tests

Build Status Coverage Status Maven Central

Utilities for writing tests that use Apache Spark.

SparkSuite: a SparkContext for each test suite

Add configuration options in subclasses using sparkConf(…), cf. KryoSparkSuite:

sparkConf( // Register this class as its own KryoRegistrator "spark.kryo.registrator"  getClass.getCanonicalName, "spark.serializer"  "org.apache.spark.serializer.KryoSerializer", "spark.kryo.referenceTracking"  referenceTracking.toString, "spark.kryo.registrationRequired"  registrationRequired.toString )

PerCaseSuite: SparkContext for each test case

SparkSuite implementation that provides hooks for kryo-registration:

register( classOf[Foo], "org.foo.Bar", classOf[Bar]  new BarSerializer )

Also useful for subclassing once per-project and filling in that project's default Kryo registrar, then having concrete tests subclass that; see cf. hammerlab/guacamole and hammerlab/pageant for examples.

Miscellaneous RDD / Job / Stage utilities

  • rdd.Util: make an RDD with specific elements in specific partitions.
  • NumJobsUtil: verify the number of Spark jobs that have been run.
  • RDDSerialization: interface that allows for verifying that performing a serialization+deserialization round-trip on an RDD results in the same RDD.