加快python代码-Java 学习之路

编辑2：基于评论的附加信息：该函数应该计算图像的径向积分（即在极坐标中），然后通过半径归一化 . 即，在Mathematica表示法中：rotmean =积分[f [r] r，{r，0，rmax}] /积分[r，{r，0，rmax}] . 实际上，当我将图像从二维像素图展平为一维径向平均值时，我想进行双线性插值 . 如果我不进行插值，并且只使用最近的邻居，那么接近零半径的值可以完全关闭 .

所以算法可以描述如下：

计算半径网格
找到半径的底面作为矩阵索引
从地板中找出像素的双线性插值的余数

现在，对于底部给出的矢量化解决方案，我做了一些预计算：

找到由双线性插值加权的图像（0和1像素分别是mage_p和mage_n . 这给出了径向和（而不是平均值）积分[f [r] r，{r，0，rmax}]
概念上（对我来说）向量化问题的艰难步骤是将N ** 2像素图像折叠成N * sqrt（2）径向和 . 这是通过诸如“rmean [rfloor] = mage_p”之类的行完成的，注意到rfloor的长度都是N ** 2而rmean的长度是N * sqrt（2）
还计算权重，给出积分[r，{r，0，rmax}]
从径向和转换为平均值 .

ORIGINAL POST：我正在尝试将一个函数从Matlab移植到Python（基于WinPython x64 2.7.6.4），我正在努力解决它的缓慢问题 . 该代码旨在获取围绕中心的图像的旋转平均值 . 通常，应用程序将找到图像的功率谱的旋转平均值，但是它也可以用于图像空间 . 这是代码：

def rotmean( mage ):
# This is terrifyingly slow
t0 = time.time()

N = int( np.floor( mage.shape[0]/2.0 ) )
M = int( np.floor( mage.shape[1]/2.0 ) )

rmax = np.ceil( np.sqrt( N**2 + M**2 ) ) + 1

rmean = np.zeros( [rmax] )
weights = np.zeros( [rmax] )

[xmesh, ymesh] = np.meshgrid( range(-N, N), range(-M, M) )
rmesh = np.sqrt( xmesh**2 + ymesh**2 )
rfloor = np.floor( rmesh )

remain = rmesh - rfloor
# Make rfloor into an index look-up table
rfloor = rfloor.astype(np.int)

t1 = time.time()
# It takes 300 ms just to get to here in the function.  
print "Init time = " + str(t1-t0)
print "Max rfloor = " + str( rfloor.max() )
print "rmean.shape = " + str( rmean.shape )


# Certainly mage*(1-remain) can be precalculated as well if we want to use more memory

for I in rfloor:
    # Sum pixels
    rmean[rfloor[I]] += mage[I]*(1-remain[I])
    rmean[rfloor[I]+1] += mage[I]*remain[I]

    # Calculate the total area of each pixel so we can compute the average from the sum
    weights[rfloor[I]] += (1-remain[I])
    weights[rfloor[I]+1] += remain[I]

t4 = time.time()
print "Total loop time = " + str(t4 - t1)

rmean /= weights # compute average from sum
raxis = range(0,rmean.size)
return [rmean, raxis]

很难理解函数与Matlab相比有多慢 . 在我的笔记本电脑上执行需要大约450毫秒才能在我的笔记本电脑上执行，而需要300毫秒才能进入for循环，并且在我更强大的桌面上执行2k x 2k图像需要大约180秒 . 我意识到Matlab使用JIT编译for循环，但我尝试将代码编译成Cython以加速for循环，它实际上比解释代码慢 . 我怀疑我不了解Python如何分配内存等，但我找不到任何有关分析的注意事项 . 据我可以从有限的文档中看出，我在这里使用=运算符进行在线（就地）操作？

通过numpy进行矢量化似乎是一个明显的解决方案，但我无法看到如何对操作进行矢量化 .

编辑：好的，我对代码进行了矢量化，现在它在吐痰距离内，在Python中大约0.5秒（尽管桌面计算机具有更强大的CPU，Xeon与i5相比） . 我尝试了很多方法来提高for循环的速度，但我能做的最好的是超过30秒 .

t0 = time.time()

N = int( np.floor( mage.shape[0]/2.0 ) )
M = int( np.floor( mage.shape[1]/2.0 ) )

rmax = np.ceil( np.sqrt( N**2 + M**2 ) ) + 1

[xmesh, ymesh] = np.meshgrid( range(-N, N), range(-M, M) )
rmesh = np.sqrt( xmesh**2 + ymesh**2 )
rfloor = np.floor( rmesh )

remain = rmesh - rfloor
# Make rfloor into an index look-up table
rfloor = rfloor.astype(np.int)

# I can flatten remain and mage
mage = mage.ravel()
remain = remain.ravel()
# remain_n = np.ones( remain.shape ) - remain;
remain_n = 1.0 - remain;
rfloor = rfloor.ravel()
mage_p = mage*remain
mage_n = mage*remain_n

# Somewhat better initialization time (~200 ms) but still slow...
t2 = time.time()
print "Initialize time = " + str(t2-t0)

rmean = np.zeros( [rmax] )
rmean_n = np.zeros( [rmax] )
weights = np.zeros( [rmax] )
weights_n = np.zeros( [rmax] )

# Find positive remainders
rmean[rfloor] = mage_p
weights[rfloor] = remain

# Add one to indexing array and add negative remainders to sum
rfloor += 1
rmean_n[rfloor] += mage_n
weights_n[rfloor] += remain_n

# sum
rmean += rmean_n
weights += weights_n
# and normalize sum to average
rmean /= weights

raxis = range(0,rmean.size)
t1 = time.time()
print "Time elapsed = " + str(t1-t0)

加快python代码

相关问题