Comparison

Check DataFrame structure

List<Apple> data = Lists.newArrayList(new Apple("Green", 85));
Dataset<Row> df = spark().createDataFrame(data, Apple.class);

assertEquals(Encoders.bean(Apple.class).schema(), df.schema());

Get one value

Apple apple = new Apple("Green", 85);
List<Apple> data = Lists.newArrayList(apple);
Dataset<Row> df = spark().createDataFrame(data, Apple.class);

Integer actual = df.first().getAs("weight");
assertEquals(apple.getWeight(), actual);

Primitives list - with “as”

List<String> data = Lists.newArrayList("green", "red");
Dataset<Row> df = spark().createDataset(data, Encoders.STRING()).toDF("color");
List<String> actual = df.select("color").as(Encoders.STRING()).collectAsList();

Compare DataFrames

List<Apple> data = Lists.newArrayList(new Apple("Green", 85));
Dataset<Row> expected = spark().createDataFrame(data, Apple.class);
Dataset<Row> actual = spark().createDataFrame(data, Apple.class);

assertEquals(0, expected.except(actual).count());

Example: ComparisonTest.java