这个问题在这里已有答案:
我想选择一个spaecific元素: select("File.columns.column._name")
|-- File: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- _Description: string (nullable = true)
| | |-- _RowTag: string (nullable = true)
| | |-- _name: string (nullable = true)
| | |-- _type: string (nullable = true)
| | |-- columns: struct (nullable = true)
| | | |-- column: array (nullable = true)
| | | | |-- element: struct (containsNull = true)
| | | | | |-- _Hive_Final_Table: string (nullable = true)
| | | | | |-- _Hive_Final_column: string (nullable = true)
| | | | | |-- _Hive_Table1: string (nullable = true)
| | | | | |-- _Hive_column1: string (nullable = true)
| | | | | |-- _Path: string (nullable = true)
| | | | | |-- _Type: string (nullable = true)
| | | | | |-- _VALUE: string (nullable = true)
| | | | | |-- _name: string (nullable = true)
我收到了这个错误:
线程“main”中的异常org.apache.spark.sql.AnalysisException:由于数据类型不匹配,无法解析'File.columns.column [_name]':参数2需要整数类型,但是'_name'是字符串类型 . at org.apache.spark.sql.catalyst.analysis.package $ AnalysisErrorAt.failAnalysis(package.scala:42)at org.apache.spark.sql.catalyst.analysis.CheckAnalysis $$ anonfun $ checkAnalysis $ 1 $$ anonfun $ apply $ 2.applyOrElse(CheckAnalysis.scala:65)在org.apache.spark.sql.catalyst.analysis.CheckAnalysis $$ anonfun $ checkAnalysis $ 1 $$ anonfun $ apply $ 2.applyOrElse(CheckAnalysis.scala:57)at org.apache . spark.sql.catalyst.trees.TreeNode $$ anonfun $ transformUp $ 1.apply(TreeNode.scala:335)at org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ transformUp $ 1.apply(TreeNode.scala) :335)org.apache.spark.sql.catalyst.trees.CurrentOrigin $ .withOrigin(TreeNode.scala:69)at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334) at org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 5.apply(TreeNode.scala:332)at org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 5.apply(TreeNode . scala:332)在scala.collection.tt上的org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 4.apply(TreeNode.scala:281) erator $ anon $ 11.next(Iterator.scala:328)at scala.collection.Iterator $ class.foreach(Iterator.scala:727)at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)at scala.collection scala.collection.mutable.ArrayBuffer上的.generic.Growable $ class . $ plus $ plus $ eq(Growable.scala:48)scala.collection.mutable.ArrayBuffer . $ plus $ plus $ eq(ArrayBuffer.scala:103)at scala.collection.mutable.ArrayBuffer在scala.collection的scala.collection.TraversableOnce $ class.to(TraversableOnce.scala:273)scala.collection.AbstractIterator.to(Iterator.scala:1157)上的$ plus $ plus $ eq(ArrayBuffer.scala:47) .TraversableOnce $ class.toBuffer(TraversableOnce.scala:265)at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)scala.collection.TraversableOnce $ class.toArray(TraversableOnce.scala:252)at scala.collection . 位于org.apache.spark.sql.catalyst.trees.TreeNode.transformUp的org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:321)的AbstractIterator.toArray(Iterator.scala:1157) TreeNode.scala:332)在org.apache . 来自org.apache.spark.sql.catalyst.plans.QueryPlan.org的spark.sql.catalyst.plans.QueryPlan.transformExpressionUp $ 1(QueryPlan.scala:108)$ apache $ spark $ sql $ catalyst $ plans $ QueryPlan $$ recursiveTransform $ 2(QueryPlan.scala:118)
你能帮我吗 ?
2 回答
您需要explode函数来获取所需的列
首先,爆炸给你
column
字段第二次爆炸会为您提供
_name
列希望这可以帮助!
查看您的模式,您可以执行以下操作从
dataframe
的嵌套结构中选择_name