value subtract不是Spark Scala中org.apache.spark.sql.DataFrame的成员

loading...


1

在Spark Scala中尝试使用减法方法时,我收到以下错误

<console>:29: error: value subtract is not a member of org.apache.spark.sql.DataFrame

但是从以下链接我可以看到它存在于Python中

https://forums.databricks.com/questions/7505/comparing-two-dataframes.html https://spark.apache.org/docs/1.3.0/api/python/pyspark.sql.html?highlight=dataframe#pyspark.sql.DataFrame.subtract

我们在Spark Scala中有减法方法吗?如果不是它的替代品是什么?

我的示例代码如下所示:

scala> val myDf1 = sc.parallelize(Seq(1,2,2)).toDF
myDf1: org.apache.spark.sql.DataFrame = [value: int]

scala> val myDf2 = sc.parallelize(Seq(1,2)).toDF
myDf2: org.apache.spark.sql.DataFrame = [value: int]

scala> val result = myDf1.subtract(myDf2)
<console>:28: error: value subtract is not a member of org.apache.spark.sql.DataFrame
       val result = myDf1.subtract(myDf2)
1回答

  • 1

    那是因为 subtract 并不确定你要做什么:

    scala> val df1 = sc.parallelize(Seq(1,2,2)).toDF
    df1: org.apache.spark.sql.DataFrame = [value: int]
    
    scala> val df2 = sc.parallelize(Seq(1,2)).toDF
    df2: org.apache.spark.sql.DataFrame = [value: int]
    
    scala> df1.except(df2).show
    +-----+                                                                         
    |value|
    +-----+
    +-----+
    

    但似乎你想找到那些重复的东西并保留它们而不是删除它们 .

    从我的头顶:

    scala> val dupes = df1.groupBy("value").count.filter("count > 1").drop("count")
    dupes: org.apache.spark.sql.DataFrame = [value: int]
    
    scala> dupes.show()
    +-----+
    |value|
    +-----+
    |    2|
    +-----+
    

loading...

评论

暂时没有评论!