首页 文章

找到可能的唯一固定长度排列数的最有效方法是什么?

提问于
浏览
0

我有这本词典:

num_dict = {
    (2, 3): [(2, 2), (4, 4), (4, 5)],
    (2, 2): [(2, 3), (4, 4), (4, 5)],
    (4, 5): [(4, 4)],
    (1, 0): [(1, 1), (2, 2), (2, 3), (4, 4), (4, 5)],
    (4, 4): [(4, 5)],
    (1, 1): [(1, 0), (2, 2), (2, 3), (4, 4), (4, 5)],
    }

我需要找到每个这些元组的第一个值的3个长组合的最大数量,其中只有每个键的值可以进行所述键 .

我目前用于查找所有唯一(3个长)组合的代码是:

ans_set = set()
for x in num_dict:
    for y in num_dict[x]:
        for z in num_dict[y]:
            ans_set.add((x[0], y[0], z[0]))
return len(ans_set)

这返回 10ans_set 最终成为:

{
 (2, 2, 2), (1, 2, 2), (1, 4, 4),
 (2, 2, 4), (1, 1, 2), (4, 4, 4),
 (1, 2, 4), (1, 1, 4), (1, 1, 1),
 (2, 4, 4)
}

但我实际上并不关心这些是什么,只关心它们的数量

这种方法不是特别有效,因为它实际上生成了所有可能的组合并将其放入一组中 .

我不需要知道每个独特的组合,我只需知道有多少组合 .

我觉得这可以做到,也许使用值列表的长度?但是我无法绕过它 .

当我意识到我可能没有以最清晰的方式解释它时,澄清关于我需要的问题是受欢迎的 .

最终编辑

通过重新评估我需要它做什么,我找到了找到三元组数量的最佳方法 . 这种方法实际上并没有找到三元组,它只计算它们 .

def foo(l):
    llen = len(l)
    total = 0
    cache = {}
    for i in range(llen):
        cache[i] = 0
    for x in range(llen):
        for y in range(x + 1, llen):
            if l[y] % l[x] == 0:
                cache[y] += 1
                total += cache[x]
    return total

这里有一个函数版本可以解释思考过程(虽然因为垃圾邮件打印而对大型列表不利):

def bar(l):
    list_length = len(l)
    total_triples = 0
    cache = {}
    for i in range(list_length):
        cache[i] = 0
    for x in range(list_length):
        print("\n\nfor index[{}]: {}".format(x, l[x]))
        for y in range(x + 1, list_length):
            print("\n\ttry index[{}]: {}".format(y, l[y]))
            if l[y] % l[x] == 0:
                print("\n\t\t{} can be evenly diveded by {}".format(l[y], l[x]))
                cache[y] += 1
                total_triples += cache[x]
                print("\t\tcache[{0}] is now {1}".format(y, cache[y]))
                print("\t\tcount is now {}".format(total_triples))
                print("\t\t(+{} from cache[{}])".format(cache[x], x))
            else:
                print("\n\t\tfalse")
    print("\ntotal number of triples:", total_triples)

1 回答

  • 1

    如果我找对你:

    from itertools import combinations
    
    num_dict = {
        (2, 3): [(2, 2), (4, 4), (4, 5)],
        (2, 2): [(2, 3), (4, 4), (4, 5)],
        (4, 5): [(4, 4)],
        (1, 0): [(1, 1), (2, 2), (2, 3), (4, 4), (4, 5)],
        (4, 4): [(4, 5)],
        (1, 1): [(1, 0), (2, 2), (2, 3), (4, 4), (4, 5)]
        }
    set(combinations([k[0] for k in num_dict.keys()], 3))
    

    输出:

    {(1, 4, 1),
     (2, 1, 1),
     (2, 1, 4),
     (2, 2, 1),
     (2, 2, 4),
     (2, 4, 1),
     (2, 4, 4),
     (4, 1, 1),
     (4, 1, 4),
     (4, 4, 1)}
    

    并且 len()10

    所以基本上你要做的是,用itertools.combinations进行所有组合,从长度为3的dict键的第一个元素开始,然后得到 set 以消除重复元素 .

    UPDATE

    由于您使用所需的输出数据更新了问题

    您可以执行以下操作

    from itertools import combinations_with_replacement
    list(combinations_with_replacement(set([k[0] for k in num_dict.keys()]), 3))
    

    输出:

    [(1, 1, 1),
     (1, 1, 2),
     (1, 1, 4),
     (1, 2, 2),
     (1, 2, 4),
     (1, 4, 4),
     (2, 2, 2),
     (2, 2, 4),
     (2, 4, 4),
     (4, 4, 4)]
    

    UPD2

    所以关于时间消耗我已经运行了

    num_dict = {
        (2, 3): [(2, 2), (4, 4), (4, 5)],
        (2, 2): [(2, 3), (4, 4), (4, 5)],
        (4, 5): [(4, 4)],
        (1, 0): [(1, 1), (2, 2), (2, 3), (4, 4), (4, 5)],
        (4, 4): [(4, 5)],
        (1, 1): [(1, 0), (2, 2), (2, 3), (4, 4), (4, 5)]
        }
    def a(num_dict):
        ans_set = set()
        for x in num_dict:
            for y in num_dict[x]:
                for z in num_dict[y]:
                    ans_set.add((x[0], y[0], z[0]))
        return len(ans_set)
    def b(num_dict):
        from itertools import combinations_with_replacement
        return len(list(combinations_with_replacement(set([k[0] for k in num_dict.keys()]), 3)))
    %timeit a(num_dict)
    %timeit b(num_dict)
    

    结果是:

    The slowest run took 4.90 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 12.1 µs per loop
    
    The slowest run took 5.37 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 4.77 µs per loop
    

    所以我在这里提出的解决方案速度提高了2倍 .

相关问题