Utilities for writing tests that use Apache Spark.
SparkSuite
: a SparkContext
for each test suite
Add configuration options in subclasses using sparkConf(…)
, cf. KryoSparkSuite
:
sparkConf( // Register this class as its own KryoRegistrator "spark.kryo.registrator" → getClass.getCanonicalName, "spark.serializer" → "org.apache.spark.serializer.KryoSerializer", "spark.kryo.referenceTracking" → referenceTracking.toString, "spark.kryo.registrationRequired" → registrationRequired.toString )
PerCaseSuite
: SparkContext
for each test case
SparkSuite
implementation that provides hooks for kryo-registration:
register( classOf[Foo], "org.foo.Bar", classOf[Bar] → new BarSerializer )
Also useful for subclassing once per-project and filling in that project's default Kryo registrar, then having concrete tests subclass that; see cf. hammerlab/guacamole and hammerlab/pageant for examples.
rdd.Util
: make an RDD with specific elements in specific partitions.NumJobsUtil
: verify the number of Spark jobs that have been run.RDDSerialization
: interface that allows for verifying that performing a serialization+deserialization round-trip on an RDD results in the same RDD.