编辑2:基于评论的附加信息:该函数应该计算图像的径向积分(即在极坐标中),然后通过半径归一化 . 即,在Mathematica表示法中:rotmean =积分[f [r] r,{r,0,rmax}] /积分[r,{r,0,rmax}] . 实际上,当我将图像从二维像素图展平为一维径向平均值时,我想进行双线性插值 . 如果我不进行插值,并且只使用最近的邻居,那么接近零半径的值可以完全关闭 .

所以算法可以描述如下:

  • 计算半径网格

  • 找到半径的底面作为矩阵索引

  • 从地板中找出像素的双线性插值的余数

现在,对于底部给出的矢量化解决方案,我做了一些预计算:

  • 找到由双线性插值加权的图像(0和1像素分别是mage_p和mage_n . 这给出了径向和(而不是平均值)积分[f [r] r,{r,0,rmax}]

  • 概念上(对我来说)向量化问题的艰难步骤是将N ** 2像素图像折叠成N * sqrt(2)径向和 . 这是通过诸如“rmean [rfloor] = mage_p”之类的行完成的,注意到rfloor的长度都是N ** 2而rmean的长度是N * sqrt(2)

  • 还计算权重,给出积分[r,{r,0,rmax}]

  • 从径向和转换为平均值 .

ORIGINAL POST:我正在尝试将一个函数从Matlab移植到Python(基于WinPython x64 2.7.6.4),我正在努力解决它的缓慢问题 . 该代码旨在获取围绕中心的图像的旋转平均值 . 通常,应用程序将找到图像的功率谱的旋转平均值,但是它也可以用于图像空间 . 这是代码:

def rotmean( mage ):
# This is terrifyingly slow
t0 = time.time()

N = int( np.floor( mage.shape[0]/2.0 ) )
M = int( np.floor( mage.shape[1]/2.0 ) )

rmax = np.ceil( np.sqrt( N**2 + M**2 ) ) + 1

rmean = np.zeros( [rmax] )
weights = np.zeros( [rmax] )

[xmesh, ymesh] = np.meshgrid( range(-N, N), range(-M, M) )
rmesh = np.sqrt( xmesh**2 + ymesh**2 )
rfloor = np.floor( rmesh )

remain = rmesh - rfloor
# Make rfloor into an index look-up table
rfloor = rfloor.astype(np.int)

t1 = time.time()
# It takes 300 ms just to get to here in the function.  
print "Init time = " + str(t1-t0)
print "Max rfloor = " + str( rfloor.max() )
print "rmean.shape = " + str( rmean.shape )


# Certainly mage*(1-remain) can be precalculated as well if we want to use more memory

for I in rfloor:
    # Sum pixels
    rmean[rfloor[I]] += mage[I]*(1-remain[I])
    rmean[rfloor[I]+1] += mage[I]*remain[I]

    # Calculate the total area of each pixel so we can compute the average from the sum
    weights[rfloor[I]] += (1-remain[I])
    weights[rfloor[I]+1] += remain[I]

t4 = time.time()
print "Total loop time = " + str(t4 - t1)

rmean /= weights # compute average from sum
raxis = range(0,rmean.size)
return [rmean, raxis]

很难理解函数与Matlab相比有多慢 . 在我的笔记本电脑上执行需要大约450毫秒才能在我的笔记本电脑上执行,而需要300毫秒才能进入for循环,并且在我更强大的桌面上执行2k x 2k图像需要大约180秒 . 我意识到Matlab使用JIT编译for循环,但我尝试将代码编译成Cython以加速for循环,它实际上比解释代码慢 . 我怀疑我不了解Python如何分配内存等,但我找不到任何有关分析的注意事项 . 据我可以从有限的文档中看出,我在这里使用=运算符进行在线(就地)操作?

通过numpy进行矢量化似乎是一个明显的解决方案,但我无法看到如何对操作进行矢量化 .

编辑:好的,我对代码进行了矢量化,现在它在吐痰距离内,在Python中大约0.5秒(尽管桌面计算机具有更强大的CPU,Xeon与i5相比) . 我尝试了很多方法来提高for循环的速度,但我能做的最好的是超过30秒 .

t0 = time.time()

N = int( np.floor( mage.shape[0]/2.0 ) )
M = int( np.floor( mage.shape[1]/2.0 ) )

rmax = np.ceil( np.sqrt( N**2 + M**2 ) ) + 1

[xmesh, ymesh] = np.meshgrid( range(-N, N), range(-M, M) )
rmesh = np.sqrt( xmesh**2 + ymesh**2 )
rfloor = np.floor( rmesh )

remain = rmesh - rfloor
# Make rfloor into an index look-up table
rfloor = rfloor.astype(np.int)

# I can flatten remain and mage
mage = mage.ravel()
remain = remain.ravel()
# remain_n = np.ones( remain.shape ) - remain;
remain_n = 1.0 - remain;
rfloor = rfloor.ravel()
mage_p = mage*remain
mage_n = mage*remain_n

# Somewhat better initialization time (~200 ms) but still slow...
t2 = time.time()
print "Initialize time = " + str(t2-t0)

rmean = np.zeros( [rmax] )
rmean_n = np.zeros( [rmax] )
weights = np.zeros( [rmax] )
weights_n = np.zeros( [rmax] )

# Find positive remainders
rmean[rfloor] = mage_p
weights[rfloor] = remain

# Add one to indexing array and add negative remainders to sum
rfloor += 1
rmean_n[rfloor] += mage_n
weights_n[rfloor] += remain_n

# sum
rmean += rmean_n
weights += weights_n
# and normalize sum to average
rmean /= weights

raxis = range(0,rmean.size)
t1 = time.time()
print "Time elapsed = " + str(t1-t0)