我目前正在优化我的代码的运行时间,它仍然不在我想要的消费范围内 . 在执行高斯求积时,我已经达到了 80% of the time is spent on running lambdify() on my sympy Matrix expressions and evaluating the resulting lambda functions 的程度 . 代码的所有其他方面都得到了充分的优化,因此我希望有人可以帮助我在我的lambdifying和评估sympy表达式的代码中优化实质性的"bottleneck" .

代码使用Python 3.5.2在64位Windows 7计算机上编写(下面的示例,说明代码,在Jupyter QtConsole上执行)和以下模块版本:

  • Sympy:1.0

  • Numpy:1.11.1

  • Numba:0.27

Lambdify()

lambdify()使用大量时间的原因我认为是sympy表达式的复杂性(包括sympy()表达式的相乘 . 这些表达式的简化是不可能的,因为它们是使用标准Alpert算法从Legendre缩放函数创建的小波函数 . 这里给出了一个较小的矩阵和与lambdifying“简单”矩阵的时间比较的例子:

from sympy import *
import numpy as np
import timeit

xi1 = symbols('xi1')
xi2 = symbols('xi2')
M = Matrix([[-0.0015625*(3.46410161513775*(0.00624999999999998*xi2 - 
           0.99375)*Piecewise((-1, 0.00624999999999998*xi2 - 0.99375 >= 0), 
           (1, 0.00624999999999998*xi2 - 0.99375 < 0)) + 
           1.73205080756888)*Piecewise((1, And(0.00624999999999998*xi2 - 
           0.99375 <= 1, 0.00624999999999998*xi2 - 
           0.99375 >= -1)), (0, True))], 
          [-0.00156249999999999*(0.0187499999999999*xi2 + 2.0*Piecewise((-1, 
           0.00624999999999998*xi2 - 0.99375 >= 0), (1, 
           0.00624999999999998*xi2 - 0.99375 < 0)) - 2.98125)*Piecewise((1, 
           And(0.00624999999999998*xi2 - 0.99375 <= 1, 
           0.00624999999999998*xi2 - 0.99375 >= -1)), (0, True))], 
          [-0.00270632938682636*xi1*(3.46410161513775*
           (0.00624999999999998*xi2 - 0.99375)*Piecewise((-1, 
           0.00624999999999998*xi2 - 0.99375 >= 0), (1, 
           0.00624999999999998*xi2 - 0.99375 < 0)) + 
           1.73205080756888)*Piecewise((1, And(0.00624999999999998*xi2 - 
           0.99375 <= 1, 0.00624999999999998*xi2 - 0.99375 >= -1)), (0, 
           True))]])
M_simpl = Matrix([(xi2**2),(xi2**2)*xi1,(xi2**2)*(xi1**2)])

时间比较产量:

import timeit

%timeit lambdify([xi1,xi2], M, 'numpy')
10 loops, best of 3: 23 ms per loop
%timeit lambdify([xi1,xi2], M_simpl, 'numpy')
100 loops, best of 3: 2.47 ms per loop

这表明更复杂的表达式处理速度比简单的矩阵慢近10倍,当lambdify()应用于这些类型的矩阵中的几种时,这对运行时有很大的贡献 . 研究这个主题我已经了解了sympy.utilities.autowrap中更快的ufuncify()函数,它似乎最适合使用Fortran或C后端 . 但是,在我的情况下,这不是最好的替代方案,因为函数还没有延伸到sympy Matrices,我希望代码足够通用s.t.其他Windows用户调整代码不需要安装C编译器等 . So, is there anyway of achieving a speed up of the lambdify() function for these types of sympy expressions without using other compilers?

Lambda函数评估

当涉及在特定坐标处的评估时,上面的交感矩阵的lambdifyed函数也执行不同 . 以下简单的5点正交示例说明了这一点:

# Quadrature coordinates
xi_v = np.array([[-1,-1], [-0.5,-0.5], [0,0], [0.5,0.5], [1,1]])
# Quadrature weights
w = np.array([3, 2, 1, 2, 3])

# Quadrature
def quad_func(func, xi_v, w):
    G = np.zeros((3, 1))
    for i in range(0, len(w), 1):
        G += w[i]*func(*xi_v[i,:])
    return G

# Testing time usage
f = lambdify([xi1,xi2], M, 'numpy')
%timeit quad_func(f, xi_v, w)
1000 loops, best of 3: 852 µs per loop
f_simpl = lambdify([xi1,xi2], M_simpl, 'numpy')
%timeit quad_func(f_simpl, xi_v, w)
10000 loops, best of 3: 33.9 µs per loop

我的第一直觉是从numba模块引入jit以加快评估速度 . 但是,这导致弹出窗口指出python已停止工作,并且内核重新启动(f和f_simpl都发生了):

import numba

quad_func_jit = numba.jit(quad_func)
quad_func_jit(f, xi_v, w)

Kernel died, restarting

So again, is there anyway to speed up these lambda function evaluations in order to reduce the total runtime? Or possibly some way of avoiding the crash for numba.jit?