尝试在Spark 2.0中的DataFrame上执行flatMap时，无法找到存储在数据集中的类型的编码器[重复]-Java 学习之路

这个问题在这里已有答案：

Encoder error while trying to map dataframe row to updated row 2个答案
Why is “Unable to find encoder for type stored in a Dataset” when creating a dataset of custom case class? 3个答案

我不断收到以下编译时错误：

Unable to find encoder for type stored in a Dataset.  
Primitive types (Int, String, etc) and Product types (case classes) 
are supported by importing spark.implicits._  
Support for serializing other types will be added in future releases.

我刚刚从Spark v1.6升级到v2.0.2，并且使用 DataFrame 的一大堆代码正在抱怨此错误 . 它抱怨的代码如下所示 .

def doSomething(data: DataFrame): Unit = {
 data.flatMap(row => {
  ...
 })
 .reduceByKey(_ + _)
 .sortByKey(ascending = false)
}

以前的SO帖子建议

但是，我没有任何case类，因为我使用的 DataFrame 等于 DataSet[Row] ，而且，我已经按如下方式内联了2个隐式导入，没有任何帮助来摆脱这个消息 .

val sparkSession: SparkSession = ???
val sqlContext: SQLContext = ???

import sparkSession.implicits._
import sqlContext.implicits._

请注意，我查看了DataSet和Encoder的文档 . 文档说的如下 .

Scala

Encoders are generally created automatically through implicits from a 
SparkSession, or can be explicitly created by calling static methods on 
Encoders.

import spark.implicits._

val ds = Seq(1, 2, 3).toDS() // implicitly provided (spark.implicits.newIntEncoder)

但是，我的方法无法访问 SparkSession . 此外，当我尝试该行 import spark.implicits._ 时，IntelliJ甚至找不到它 . 当我说我的DataFrame是一个DataSet [Row]时，我的确意味着它 .

这个问题被标记为可能duplicate，但请让我澄清一下 .

我没有案例类或业务对象关联 .
我正在使用.flatMap而另一个问题是使用.map
隐式导入似乎没有帮助
传递RowEncoder会产生编译时错误，例如 data.flatMap(row => { ... }, RowEncoder(data.schema)) （参数太多）

我知道这个新的Spark 2.0 Datasets / DataFrame API应该如何工作 . 在Spark shell中，下面的代码有效 . 请注意，我start the spark shell喜欢这个 $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.5.0

val schema = StructType(Array(
 StructField("x1", StringType, true),
 StructField("x2", StringType, true),
 StructField("x3", StringType, true),
 StructField("x4", StringType, true),
 StructField("x5", StringType, true)))

val df = sqlContext.read
 .format("com.databricks.spark.csv")
 .option("header", "true")
 .schema(schema)
 .load("/Users/jwayne/Downloads/mydata.csv")

df.columns.map(col => {
 df.groupBy(col)
   .count()
   .map(_.getString(0))
   .collect()
   .toList
 })
 .toList

但是，当我将其作为测试单元的一部分运行时，我得到相同的无法找到编码器错误 . 为什么这在shell中工作但在我的测试单元中不起作用？

在shell中，我输入 :imports 和 :implicits 并将它们放在我的scala文件/源代码中，但这也无济于事 .

尝试在Spark 2.0中的DataFrame上执行flatMap时，无法找到存储在数据集中的类型的编码器[重复]

相关问题