Apache Spark integration testingΒΆ
Apache Spark is become widely used, code become more complex, and integration tests are become important for check code quality. Below integration testing approaches with code samples. Two languages are covered - Java and Scala in separate sections.
Testing steps
- Resource allocation: SparkContext/SparkSession creation for test. Them can be created manually, or existing framework can be used;
- Data preparation: RDD/DataFrame created in code or read from disk;
- Run functionality: Two functionality types: a) reading data from storage - files have to be prepared; b) transformations - data can be created in code;
- Expected and actual comparison.
Examples
Apples weight and color manipulation is used as example. Sections contains links to available for download test projects.