首页 文章

如何将 Map 的RDD转换为数据帧

提问于
浏览
7

我有 Map 的RDD,我想将其转换为数据帧这是RDD的输入格式

val mapRDD: RDD[Map[String, String]] = sc.parallelize(Seq(
   Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
   Map("empid" -> "13", "empName" -> "Ross", "depId" -> "201"),
   Map("empid" -> "14", "empName" -> "Richard", "depId" -> "401"),
   Map("empid" -> "15", "empName" -> "Michale", "depId" -> "501"),
   Map("empid" -> "16", "empName" -> "John", "depId" -> "701")))

有没有办法转换成数据帧,如

val df=mapRDD.toDf

df.show

empid,  empName,    depId
12      Rohan       201
13      Ross        201
14      Richard     401
15      Michale     501
16      John        701

1 回答

  • 13

    您可以轻松将其转换为Spark DataFrame:

    这是一个可以解决问题的代码:

    val mapRDD= sc.parallelize(Seq(
       Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
       Map("empid" -> "13", "empName" -> "Ross", "depId" -> "201"),
       Map("empid" -> "14", "empName" -> "Richard", "depId" -> "401"),
       Map("empid" -> "15", "empName" -> "Michale", "depId" -> "501"),
       Map("empid" -> "16", "empName" -> "John", "depId" -> "701")))
    
    val columns=mapRDD.take(1).flatMap(a=>a.keys)
    
    val resultantDF=mapRDD.map{value=>
          val list=value.values.toList
          (list(0),list(1),list(2))
          }.toDF(columns:_*)
    
    resultantDF.show()
    

    输出是:

    +-----+-------+-----+
    |empid|empName|depId|
    +-----+-------+-----+
    |   12|  Rohan|  201|
    |   13|   Ross|  201|
    |   14|Richard|  401|
    |   15|Michale|  501|
    |   16|   John|  701|
    +-----+-------+-----+
    

相关问题