我试图用于参考的最初用于创建折线图https://matplotlib.org/gallery/text_labels_and_annotations/date.html

我一直在尝试绘制一个包含两列的numpy数组 post_records . 我正在使用社交媒体数据,所以第一列是 post_ids ,第二列是 datetime_obj_col ,我设法使用一些脚本从csv文件中读取 .

我设法用matplotlib中的这些数据创建一个折线图,但我不太清楚如何制作直方图 .

现在,我运行程序时没有任何显示

fig, ax = plt.subplots()
    hist, bins, patch_lst = ax.hist(post_records[:,1], bins=range(31)) # thought that bins could be a sequence, wanted to create 31 bins for 31 total days in a month
    ax.plot(hist, bins)
    ax.set_xlabel('Days')
    ax.set_ylabel('frequency')
    ax.set_title(r'Histogram of Time')

    plt.show() # shows nothing
  • 我需要传递给ax.plot?我不清楚如何传递我的x数据集

  • 为什么不显示窗口?

编辑以及如何复制此:

def create_dataframe_of_datetime_objects_and_visualize():
  datetime_lst = [1521071920000000000, 1521071901000000000, 1521071844000000000, 1521071741000000000, 1521071534000000000] # to get this variable  I loaded my original dataframe with 1980000, sliced the first 5 entries, then printed out the 'datetime_obj_col'. I can't exactly remember what this format is called, I think it's unix time.
  id_lst = [974013, 974072, 327212, 123890, 438201]

  for each in range(len(datetime_lst)):
    datetime_lst[each] = pd.to_datetime(datetime_lst[each], errors='coerce')
    datetime_lst[each] = datetime_lst[each].strftime("%d-%b-%y %H:%M:%S")
    datetime_lst[each] = pd.to_datetime(datetime_lst[each], errors='coerce', dayfirst=True, format="%d-%b-%y %H:%M:%S")
  datetime_lst = pd.Series(datetime_lst)

  df = pd.DataFrame({'tweet_id':id_lst, 'datetime_obj_col': datetime_lst})
  gb_var = df.groupby(df["datetime_obj_col"].dt.month)
  gb_var_count = gb_var.count()
  gb_var.plot(kind="bar")
  plt.show()

请注意,我不再使用直方图了 . 但是应该出现两个错误,如下:

回溯(最近一次调用最后一次):文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\core \groupby\groupby.py",第918行,在apply result = self._python_apply_general(f)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\core \groupby\groupby.py",第936行,在_python_apply_general self.axis中)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\core \groupby\groupby.py",第2273行,在apply res = f(group)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\core \groupby\groupby.py",第541行,在f中返回self.plot(* args,** kwargs)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\plot ting_core.py",第2941行,在 call sort_columns = sort_columns,** kwds)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\plot ting_core.py",第1977行,在plot_frame ** kwds)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\plot ting_core.py" ,第1804行,在_plot plot_obj.generate()文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\plot ting_core.py",第266行,生成self._post_plot_logic_common(ax,self.data)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\plot ting_core.py",第405行,在_post_plot_logic_common中self.apply_axis_properties(ax.yaxis,fontsize = self.fontsize )文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\plot ting_core.py",第478行,在_apply_axis_properties中label = axis.get_majorticklabels()axis.get_minorticklabels()文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ axis.py",第1245行,在get_majorticklabels中ticks = self.get_major_ticks()文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ axis.py",第1396行,在get_major_ticks中numticks = len(self . get_major_locator()())文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ dates.py",第1249行,在_644中511 self.refresh()文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ dates.py",第1269行,刷新dmin,dmax = self.viewlim_to_dt()文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ dates.py",第1026行,在viewlim_to_dt .format(vmin)中)ValueError:视图限制最小值0.0小于1并且是无效的Matplotlib d ate值 . 如果将非日期时间值传递给h作为日期时间单位的轴,则通常会发生这种情况

编辑:

这开始看起来像是一个与尝试使用hist()绘制一列datetime对象有关的bug .

我从 post_records 获取了数据,这是一个加载的numpy数组,它存储了198个post id和datetime对象的2d数据集 .

这是一个名为create datetime objects的函数的代码 . 它打开一个csv文件"tweet_time_info_preprocessed.csv,",它只有三列:'tweet_id " " tweet_created_at_date ," and " tweet_created_at_hour. “以下是使用 pandasto_datetime() 方法将 tweet_created_at_datetweet_created_at_hour 列组合成格式化的datetime对象的代码 .

Csv文件样本

enter image description here

def create_datetime_objects():  
   with open("post_time_info_preprocessed.csv", 'r', encoding='utf8') as time_csv:
         mycsv = csv.reader(time_csv)
         progress = 0
         for row in mycsv:
            progress +=1
            if progress == 1: #header row
               continue
            if progress % 10000 == 0:
               print(progress)
            each_post_datetime_lst = []
            each_post_datetime_lst.append(row[0])
            time_str = str(row[1]) + " " + str(row[2])
            a_date_object = pd.to_datetime(time_str, dayfirst=True, format="%d-%b-%y %H:%M:%S")
            each_post_datetime_lst.append(a_date_object)
            post_and_datetime_lst.append(each_tweet_datetime_lst)
    numpy_arr_of_tweets_and_datetimes = np.array(tweets_and_datetime_objs)
    np.save(np_save_path, numpy_arr_of_tweets_and_datetimes)

然后我有 visualize_objects_histogram()

def visualize_objects_histogram():
    print("Visualizing timeplot as histogram")
    post_records= np.load("tweets_and_datetime_objects.npy")
    df = pd.DataFrame(data=post_records, columns=['post_id', 'datetime_obj_col'])
     df_sliced = df[0:5]
     print(df_sliced)
     fig, ax = plt.subplots()
     hist, bins, patch_lst = ax.hist(df_sliced['datetime_obj_col'], bins=range(5))
     ax.plot(hist, bins)
     ax.set_xlabel('Days')
     ax.set_ylabel('frequency')
     ax.set_title('Histogram of Time')

     plt.show()

所以我切掉5行数据帧并将它们存储到 df_slice 中 . 我运行此代码,出现一个空白的白色窗口 . 打印 df_slice 给出

tweet_id     datetime_obj_col
   0  974072352958042112  2018-03-14 23:58:40
   1  974072272578166784  2018-03-14 23:58:21
   2  974072032177598464  2018-03-14 23:57:24
   3  974071601313533953  2018-03-14 23:55:41
   4  974070732777914368  2018-03-14 23:52:14

并且还有一个空白窗口附带的错误消息 . 这很长 .

Tkinter回调中的异常回溯(最近一次调用最后一次):文件"C:\Users\biney\AppData\Local\Programs\Python\Python36-32\lib\tkinter__i nit__.py",第1699行,在 call 中返回self.func(* args)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ backends_backend_tk.py",第227行,在resize self.draw()文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ backends\backend_tkagg.py",第12行,在绘图超级(FigureCanvasTkAgg,self).draw()文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ backends\backend_agg.py",第433行,在绘制self.figure.draw(self.renderer)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ artist.py",第55行,在draw_wrapper中返回绘图(artist,renderer,* args,** kwargs)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ figure.py",第1475行,在绘制渲染器中,self,artists,self.suppressComposite)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ image.py",第141行,在_draw_list_compositing_images中a.draw(渲染器)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ artist.py",第55行,在draw_wrapper中返回绘图(artist,renderer,* args,* * kwargs)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ axes_base.py",第2607行,在绘制mimage._draw_list_compositing_images(渲染器,自我,艺术家)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ image.py",第141行,在_draw_list_compositing_images a.draw(渲染器)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ artist.py",第55行,在draw_wrapper中返回绘图(艺术家,渲染器, * args,** kwargs)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ axis.py",第1190行,在绘制中ticks_to_draw = self._update_ticks(渲染)呃)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ axis.py",第1028行,在_update_ticks中tick_tups = list(self.iter_ticks())#iter_ticks调用定位器文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ axis.py",第971行,在iter_ticks中majorLocs = self.major.locator()文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ dates.py",第1249行,在 call 中self.refresh()文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ dates.py",第1269行,刷新dmin,dmax = self.viewlim_to_dt()文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ dates.py",第1026行,在viewlim_to_dt .format(vmin)中)ValueError:视图限制最小值-0.19500000000000003小于1并且是inv alid Matplotlib日期值 . 如果将非日期时间值传递给具有日期时间单位的轴,则通常会发生这种情况

对于我的5条记录,"view limit."可能为5条错误消息的值略有不同,此错误消息重复5次 . 我认为错误消息与以下在线版本的dates.py最密切相关......可能是错误的:https://fossies.org/linux/matplotlib/lib/matplotlib/dates.py(在第1022行左右,我要检查一下很快我的电脑上的实际文件) .

我将尝试从这篇文章中查看它是否会有所帮助:Can Pandas plot a histogram of dates?

编辑2:之前的stackoverflow向我介绍了两个有用的方法,但它们没有用 . 我将visualize ...功能更改为以下内容

def visualize_datetime_objects_with_pandas():
tweets_and_datetime_objects = np.load("tweets_and_datetime_objects.npy") # contains python datetime objects
print("with pandas")
print(tweets_and_datetime_objects.shape)
df = pd.DataFrame(data=tweets_and_datetime_objects, columns=['tweet_id', 'datetimeobj'])
pandas_freq_dict = df['datetimeobj'].value_counts().to_dict()
#print(pandas_freq_dict)
print(len(list(pandas_freq_dict.keys())))
print(list(pandas_freq_dict.keys())[0])
print(list(pandas_freq_dict.values())[1])

plt.plot(pandas_freq_dict.keys(), pandas_freq_dict.values())

#df = df.set_index('datetimeobj')
# changing the index of this dataframe to a time index
#df['datetimeobj'].plot(kind='line', style=['--'])

plt.show()

它给出以下输出/错误消息 .

date-time temporal data visualization script

将时间图可视化为直方图tweet_id datetime_obj_col datetime_obj_col 14 5 5 tweet_id datetime_obj_col datetime_obj_col 14 5 5回溯(最近一次调用最后一次):文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\core \groupby\groupby.py",第918行,应用结果= self._python_apply_general(f)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\core \groupby\groupby.py",第936行,在_python_apply_general self中 . axis)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\core \groupby\groupby.py",第2273行,在应用res = f(组)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\core \groupby\groupby.py",第541行,在f返回self.plot(* args,** kwargs)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\plot ting_core.py",第2941行,在 call sort_columns = sort_columns,** kwds)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\plot ting_core.py",第1977行,在plot_frame ** kwds中)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\plot ting_core.py",第1804行,在_plot plot_obj.generate()文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\plot ting_core.py",第266行,在生成self._post_plot_logic_common(ax,self.data)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\plot ting_core.py",第405行,在_post_plot_logic_common中self.apply_axis_properties(ax.yaxis,fontsize = self.fontsize)文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\pandas\plot ting_core.py",第478行,在_apply_axis_properties中labels = axis.get_majorticklabels()axis.get_minorticklabels()文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ axis.py",第1245行,在get_majorticklabels中ticks = self.get_major ticks()文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ axis.py",第1396行,在get_major_ticks中numticks = len(self.get_major_locator()())文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ dates.py",第1249行,在 call self.refresh()文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ dates.py",第1269行,刷新dmin,dmax = self . viewlim_to_dt()文件"C:\Users\biney\AppData\Roaming\Python\Python36\site-packages\matplotlib\ dates.py",第1026行,在viewlim_to_dt .format(vmin)中)ValueError:视图限制最小值0.0小于1并且是无效的Matplotlib值 . 如果将非日期时间值传递给h作为日期时间单位的轴,则通常会发生这种情况