首页 文章

将对数正态分布的拟合PDF缩放到python中的histrogram

提问于
浏览
2

我有一个对数正态分布式设置样本,并希望对它进行拟合 . 然后我想将样本的直方图和拟合的PDF绘制成一个图,我想用直方图的原始缩放 .

我的问题:如何直接缩放PDF,使其在直方图中可见?

这是代码:

import numpy as np
import scipy.stats

# generate log-normal distributed set of samples
samples   = np.random.lognormal( mean=1., sigma=.4, size=10000 )

# make a fit to the samples and generate the resulting PDF
shape, loc, scale = scipy.stats.lognorm.fit( samples, floc=0 )
x_fit       = np.linspace( samples.min(), samples.max(), 100 )
samples_fit = scipy.stats.lognorm.pdf( x_fit, shape, loc=loc, scale=scale )

而且,为了更好地理解我的意思,这是图:
Left: Samples. Middle: histogram and fitted PDF. Right: normalized histogram and fitted PDF

我的问题是,如果有一个参数可以轻松地将PDF缩放到直方图(我没有找到一个,但这并不意味着太多......),这样PDF在中间的情节中可见?

1 回答

  • 5

    您要求的是预期直方图的图表 .

    假设[a,b]是直方图的x个区间之一 . 对于大小为n的随机样本,间隔中的预期样本数为

    (cdf(b) - cdf(a))*n
    

    其中cdf(x)是累积分布函数 . 要绘制预期的直方图,您将计算每个bin的值 .

    下面的脚本显示了在matplotlib直方图上绘制预期直方图的一种方法 . 它生成这个图:

    histogram plot

    import numpy as np
    import scipy.stats
    import matplotlib.pyplot as plt
    
    
    # Generate log-normal distributed set of samples
    np.random.seed(1234)
    samples = np.random.lognormal(mean=1., sigma=.4, size=10000)
    
    # Make a fit to the samples.
    shape, loc, scale = scipy.stats.lognorm.fit(samples, floc=0)
    
    # Create the histogram plot using matplotlib.  The first two values in
    # the tuple returned by hist are the number of samples in each bin and
    # the values of the histogram's bin edges.  counts has length num_bins,
    # and edges has length num_bins + 1.
    num_bins = 50
    clr = '#FFE090'
    counts, edges, patches = plt.hist(samples, bins=num_bins, color=clr, label='Sample histogram')
    
    # Create an array of length num_bins containing the center of each bin.
    centers = 0.5*(edges[:-1] + edges[1:])
    
    # Compute the CDF at the edges. Then prob, the array of differences,
    # is the probability of a sample being in the corresponding bin.
    cdf = scipy.stats.lognorm.cdf(edges, shape, loc=loc, scale=scale)
    prob = np.diff(cdf)
    
    plt.plot(centers, samples.size*prob, 'k-', linewidth=2, label='Expected histogram')
    
    # prob can also be approximated using the PDF at the centers multiplied
    # by the width of the bin:
    # p = scipy.stats.lognorm.pdf(centers, shape, loc=loc, scale=scale)
    # prob = p*(edges[1] - edges[0])
    # plt.plot(centers, samples.size*prob, 'r')
    
    plt.legend()
    
    plt.show()
    

    注意:由于PDF是CDF的衍生物,您可以将cdf(b) - cdf(a)的近似值写为

    cdf(b) - cdf(a) = pdf(m)*(b - a)
    

    其中m是,例如,区间[a,b]的中点 . 然后,您提出的确切问题的答案是通过将PDF乘以样本大小和直方图区域宽度来缩放PDF . 脚本中有一些注释掉的代码,显示如何使用缩放的PDF绘制预期的直方图 . 但由于CDF也可用于对数正态分布,因此您也可以使用它 .

相关问题