生成包含取自n个向量的所有元素组合的矩阵-Java 学习之路

这个问题经常以一种或另一种形式出现（参见例如here或here） . 所以我认为我会以一般形式呈现它，并提供一个可能供将来参考的答案 .

给定n个可能不同大小的矢量n，生成n列矩阵，其行描述从这些矢量中取出的所有元素组合（笛卡儿积） .

例如，

vectors = { [1 2], [3 6 9], [10 20] }

应该给

combs = [ 1     3    10
          1     3    20
          1     6    10
          1     6    20
          1     9    10
          1     9    20
          2     3    10
          2     3    20
          2     6    10
          2     6    20
          2     9    10
          2     9    20 ]

4 回答

ndgrid函数几乎给出了答案，但有一点需要注意：必须明确定义 n 输出变量才能调用它 . 由于 n 是任意的，最好的方法是使用comma-separated list（从具有 n 单元格的单元格数组生成）作为输出 . 然后将生成的 n 矩阵连接到所需的 n 列矩阵：

vectors = { [1 2], [3 6 9], [10 20] }; %// input data: cell array of vectors

n = numel(vectors); %// number of vectors
combs = cell(1,n); %// pre-define to generate comma-separated list
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); %// the reverse order in these two
%// comma-separated lists is needed to produce the rows of the result matrix in
%// lexicographical order 
combs = cat(n+1, combs{:}); %// concat the n n-dim arrays along dimension n+1
combs = reshape(combs,[],n); %// reshape to obtain desired matrix

回复于 2024-04-30T04:20:51+08:00

稍微简单一点......如果你有神经网络工具箱，你可以简单地使用combvec：

vectors = {[1 2], [3 6 9], [10 20]};
combs = combvec(vectors{:}).' % Use cells as arguments

它以稍微不同的顺序返回矩阵：

combs =

     1     3    10
     2     3    10
     1     6    10
     2     6    10
     1     9    10
     2     9    10
     1     3    20
     2     3    20
     1     6    20
     2     6    20
     1     9    20
     2     9    20

如果您想要问题中的矩阵，可以使用sortrows：

combs = sortrows(combvec(vectors{:}).')
% Or equivalently as per @LuisMendo in the comments: 
% combs = fliplr(combvec(vectors{end:-1:1}).')

这使

combs =

     1     3    10
     1     3    20
     1     6    10
     1     6    20
     1     9    10
     1     9    20
     2     3    10
     2     3    20
     2     6    10
     2     6    20
     2     9    10
     2     9    20

如果你查看 combvec 的内部（在命令窗口中输入 edit combvec ），你'll see that it uses different code than @LuisMendo'的回答 . 我不能说哪个更有效率 .

如果您碰巧有一个矩阵，其行类似于早期的单元格数组，您可以使用：

vectors = [1 2;3 6;10 20];
vectors = num2cell(vectors,2);
combs = sortrows(combvec(vectors{:}).')

回复于 2024-04-30T04:20:51+08:00

44
我已经对两个提出的解决方案做了一些基准测试 . 基准测试代码基于timeit function，并包含在本文末尾 .

我考虑两种情况：三个大小为 n 的向量，以及三个大小分别为 n/10 ， n 和 n*10 的向量（两种情况都给出相同数量的组合） . n 最多变化 240 （我选择此值以避免在我的笔记本电脑中使用虚拟内存） .

结果如下图所示 . 基于_1390976的解决方案始终比 combvec 花费更少的时间 . 值得注意的是，在不同大小的情况下， combvec 所花费的时间变化不大 .

Benchmarking code

基于 ndgrid 的解决方案的功能：
```
function combs = f1(vectors)
n = numel(vectors); %// number of vectors
combs = cell(1,n); %// pre-define to generate comma-separated list
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); %// the reverse order in these two
%// comma-separated lists is needed to produce the rows of the result matrix in
%// lexicographical order
combs = cat(n+1, combs{:}); %// concat the n n-dim arrays along dimension n+1
combs = reshape(combs,[],n);
```
combvec 解决方案的功能：
```
function combs = f2(vectors)
combs = combvec(vectors{:}).';
```
通过在这些函数上调用 timeit 来测量时间的脚本：
```
nn = 20:20:240;
t1 = [];
t2 = [];
for n = nn;
    %//vectors = {1:n, 1:n, 1:n};
    vectors = {1:n/10, 1:n, 1:n*10};
    t = timeit(@() f1(vectors));
    t1 = [t1; t];
    t = timeit(@() f2(vectors));
    t2 = [t2; t];
end
```
回复于 2024-04-30T04:20:51+08:00
2
这是一个自己动手做的方法，让我高兴地笑，使用 nchoosek ，虽然它并不比@Luis Mendo接受的解决方案更好 .

对于给出的示例，在1,000次运行之后，此解决方案使我的机器平均为0.00065935 s，而接受的解决方案为0.00012877 s . 对于较大的向量，遵循@Luis Mendo的基准测试帖子，此解决方案始终比接受的答案慢 . 尽管如此，我决定发布它，希望你能找到一些有用的东西：

Code:
```
tic;
v = {[1 2], [3 6 9], [10 20]};

L = [0 cumsum(cellfun(@length,v))];
V = cell2mat(v);

J = nchoosek(1:L(end),length(v));
J(any(J>repmat(L(2:end),[size(J,1) 1]),2) | ...
  any(J<=repmat(L(1:end-1),[size(J,1) 1]),2),:)  = [];

V(J)
toc
```
给
```
ans =

 1     3    10
 1     3    20
 1     6    10
 1     6    20
 1     9    10
 1     9    20
 2     3    10
 2     3    20
 2     6    10
 2     6    20
 2     9    10
 2     9    20

Elapsed time is 0.018434 seconds.
```
Explanation:

L 使用 cellfun 获取每个向量的长度 . 虽然 cellfun 基本上是一个循环，但考虑到你的向量数量必须相对较低，这个问题甚至是实用的 .

V 连接所有向量以便以后轻松访问（这假设您将所有向量输入为行.v'适用于列向量 . ）

nchoosek 获取从元素总数 L(end) 中选择 n=length(v) 元素的所有方法 . There will be more combinations here than what we need.
```
J =

 1     2     3
 1     2     4
 1     2     5
 1     2     6
 1     2     7
 1     3     4
 1     3     5
 1     3     6
 1     3     7
 1     4     5
 1     4     6
 1     4     7
 1     5     6
 1     5     7
 1     6     7
 2     3     4
 2     3     5
 2     3     6
 2     3     7
 2     4     5
 2     4     6
 2     4     7
 2     5     6
 2     5     7
 2     6     7
 3     4     5
 3     4     6
 3     4     7
 3     5     6
 3     5     7
 3     6     7
 4     5     6
 4     5     7
 4     6     7
 5     6     7
```
由于 v(1) 中只有两个元素，我们需要抛出 J(:,1)>2 所在的任何行 . 类似地， J(:,2)<3 ， J(:,2)>5 等...使用 L 和 repmat 我们可以确定 J 的每个元素是否在其适当的范围内，然后使用 any 来丢弃具有任何坏元素的行 .

最后，这些不是 v 的实际值，只是指数 . V(J) 将返回所需的矩阵 .
回复于 2024-04-30T04:20:51+08:00

生成包含取自n个向量的所有元素组合的矩阵

4 回答

相关问题