特征:用Eigen内在函数简化表达式

我正在尝试使用向量中的相应值来缩放矩阵中的所有列 . 如果此值为0,我想用一个由常量缩放的其他矩阵中的列替换该列 . 听起来很复杂,但在Matlab中它非常简单(但可能没有完全优化):

a(:,b ~= 0) = a(:,b ~= 0)./b(b ~= 0);
a(:,b == 0) = c(:,b == 0)*x;

使用C中的 for loop 执行此操作也非常简单:

RowVectorXf b;
Matrix3Xf a, c;
float x;
for (int i = 0; i < b.size(); i++) {
    if (b(i) != 0) {
        a.col(i) = a.col(i) / b(i);
    } else {
        a.col(i) = c.col(i) * x;
    }
}

是否有可能使用Eigen内在函数(如 colwiseselect )执行此操作(更快)?

附:我试图将if条件缩短到表单

a.col(i) = (b(i) != 0) ? (a.col(i) / b(i)) : (c.col(i) * x);

但这不会编译错误 error: operands to ?: have different types ...(long listing of the types)

编辑:我添加了测试答案的代码,这里是:

#include <Eigen/Dense>
#include <stdlib.h>
#include <chrono>
#include <iostream>

using namespace std;
using namespace Eigen;

void flushCache()
{
    const int size = 20 * 1024 * 1024; // Allocate 20M. Set much larger than L2
    volatile char *c = (char *) malloc(size);
    volatile int i = 8;
    for (volatile int j = 0; j < size; j++)
        c[j] = i * j;

    free((void*) c);
}

int main()
{
    Matrix3Xf a(3, 1000000);
    RowVectorXf b(1000000);
    Matrix3Xf c(3, 1000000);
    float x = 0.4;

    a.setRandom();
    b.setRandom();
    c.setRandom();

    for (int testNumber = 0; testNumber < 4; testNumber++) {
        flushCache();
        chrono::high_resolution_clock::time_point t1 = chrono::high_resolution_clock::now();
        for (int repetition = 0; repetition < 1000; repetition++) {
            switch (testNumber) {
                case 0:
                    for (int i = 0; i < b.size(); i++) {
                        if (b(i) != 0) {
                            a.col(i) = a.col(i) / b(i);
                        } else {
                            a.col(i) = c.col(i) * x;
                        }
                    }
                    break;
                case 1:
                    for (int i = 0; i < b.size(); i++) {
                        a.col(i) = (b(i) != 0) ? (a.col(i) / b(i)).eval() : (c.col(i) * x).eval();
                    }
                    break;
                case 2:
                    for (int i = 0; i < b.size(); i++) {
                        a.col(i) = (b(i) != 0) ? (a.col(i) * (1.0f / b(i))) : (c.col(i) * x);
                    }
                    break;
                case 3:
                    a = b.cwiseEqual(0.0f).replicate< 3, 1 >().select(c * x, a.cwiseQuotient(b.replicate< 3, 1 >()));
                    break;
                default:
                    break;
            }
        }

        chrono::high_resolution_clock::time_point t2 = chrono::high_resolution_clock::now();
        auto duration = chrono::duration_cast< chrono::milliseconds >(t2 - t1).count();
        cout << "duration: " << duration << "ms" << endl;
    }

    return 0;
}

示例输出是:

duration: 14391ms
duration: 15219ms
duration: 9148ms
duration: 13513ms

顺便说一句,不使用setRandom来初始化变量,输出完全不同:

duration: 10255ms
duration: 11076ms
duration: 8250ms
duration: 5198ms

由于分支预测,@ chtz建议's because of denormalized values, but I think it' . 由于分支预测的原因在于,初始化 b.setZero(); 导致与未初始化相同的时序 .

回答(1)

2 years ago

a.col(i) = (b(i) != 0) ? (a.col(i) * (1.0f/b(i))) : (c.col(i) * x);

会工作,但只是因为表达式是相同的类型,并且它可能在任何时候都不安全( ? : 表达式基本上被转换为与 if - else 分支相同 . )

如果您更喜欢将其写入一行,则以下表达式应该起作用:

a = b.cwiseEqual(0.0f).replicate<3,1>().select(c*x, a.cwiseQuotient(b.replicate<3,1>()));

同样,我怀疑它会产生任何显着的性能差异 .