DataFrame (Dataset<Row> in Java)¶
Empty with predefined structure¶
final Dataset<Row> actual = spark().emptyDataFrame().withColumn("color", lit("green"));
actual.printSchema();
Single primitive¶
Long value = 12L;
List<Row> rows = Collections.singletonList(RowFactory.create(value));
final Dataset<Row> actual = spark().createDataFrame(rows, Encoders.LONG().schema());
Primitive list¶
List<String> data = Lists.newArrayList("green", "red");
Dataset<Row> actual = spark().createDataset(data, Encoders.STRING()).toDF("color");
Row list with assigned schema¶
List<Row> rows = Arrays.asList(RowFactory.create("green"), RowFactory.create("red"));
StructType schema = DataTypes.createStructType(
new StructField[]{DataTypes.createStructField("color", DataTypes.StringType, false)});
final Dataset<Row> actual = spark().createDataFrame(rows, schema);
List of entities¶
List<Apple> rows = Lists.newArrayList(new Apple("green", 70), new Apple("red", 110));
final Dataset<Row> actual = spark().createDataFrame(rows, Apple.class);
Example: DataFrameCreationTest.java