from datetime import datetime
from pyspark.sql.functions import col, udf
from pyspark.sql.types import DateType
# Creation of a dummy dataframe:
df1 = sqlContext.createDataFrame([("11/25/1991","11/24/1991","11/30/1991"),
("11/25/1391","11/24/1992","11/30/1992")], schema=['first', 'second', 'third'])
# Setting an user define function:
# This function converts the string cell into a date:
func = udf (lambda x: datetime.strptime(x, '%m/%d/%Y'), DateType())
df = df1.withColumn('test', func(col('first')))
df.show()
df.printSchema()
4 回答
没有udf可以(优先?)这样做:
Update (1/10/2018):
对于Spark 2.2,最好的方法是使用to_date或to_timestamp函数,它们都支持
format
参数 . 来自文档:这是输出:
strptime()方法对我不起作用 . 我得到另一个清洁解决方案,使用演员:
试试这个: