将大数据集绘制为kind = bar无效-Java 学习之路

我正在处理大约100,000条记录的半大数据集 . 当我将df列绘制为带有下面代码的行时，绘图大约需要2秒钟 .

with plt.style.context('ggplot'):
    plt.figure(3,figsize=(16,12))
    plt.subplot(411)
    df_pca_std['PC1_resid'].plot(title ="PC1 Residual", color='r')

    #If I change the plot to a bar (no other change)
    df_X_std['PC1_resid'].plot(**kind='bar'**, title ="PC1 Residual", color='r')

这需要112秒，渲染会像这样变化（混乱的x轴）：

enter image description here

我压制了轴并改变了风格但没有帮助 . 任何人都有想法如何更好地渲染和花更少的时间？正在检查绘制的数据的均值回归，并更好地显示为条形图 .

2 回答

不是最好的视觉图表，但至少它呈现 . 在14.2秒内绘制210万条柱 .

import pygal                                                      
bar_chart = pygal.Bar()                                            
bar_chart.add('PC1_residuals',df_X_std['PC1_resid'])                        
bar_chart.render_to_file('bar_chart.svg')

回复于 2024-05-16T14:17:52+08:00

0
一种可能的解决方案：我实际上不需要绘制条形图，但可以使用非常快的线条图和'fill_between'属性来绘制从零到线的图 . 效果类似于在一小部分时间内绘制所有条形图 .

使用DatetimeIndex的pydatetime方法将Date（df索引）转换为matplotlib可以使用的datetime.datetime数组，然后更改绘图 .
```
plotDates = mpl.date2num(df.index.to_pydatetime())

plt.fill_between(plotDates,0,df_pca_std['PC1_resid'], alpha=0.5)
```
回复于 2024-05-16T14:17:52+08:00

将大数据集绘制为kind = bar无效

2 回答

相关问题