首页 文章

如何计算列表项的出现次数?

提问于
浏览
1185

给定一个项目,如何在Python的列表中计算它的出现次数?

22 回答

  • 205

    在字典中获取每个项目出现次数的另一种方法:

    dict((i, a.count(i)) for i in a)
    
  • 11
    # Python >= 2.6 (defaultdict) && < 2.7 (Counter, OrderedDict)
    from collections import defaultdict
    def count_unsorted_list_items(items):
        """
        :param items: iterable of hashable items to count
        :type items: iterable
    
        :returns: dict of counts like Py2.7 Counter
        :rtype: dict
        """
        counts = defaultdict(int)
        for item in items:
            counts[item] += 1
        return dict(counts)
    
    
    # Python >= 2.2 (generators)
    def count_sorted_list_items(items):
        """
        :param items: sorted iterable of items to count
        :type items: sorted iterable
    
        :returns: generator of (item, count) tuples
        :rtype: generator
        """
        if not items:
            return
        elif len(items) == 1:
            yield (items[0], 1)
            return
        prev_item = items[0]
        count = 1
        for item in items[1:]:
            if prev_item == item:
                count += 1
            else:
                yield (prev_item, count)
                count = 1
                prev_item = item
        yield (item, count)
        return
    
    
    import unittest
    class TestListCounters(unittest.TestCase):
        def test_count_unsorted_list_items(self):
            D = (
                ([], []),
                ([2], [(2,1)]),
                ([2,2], [(2,2)]),
                ([2,2,2,2,3,3,5,5], [(2,4), (3,2), (5,2)]),
                )
            for inp, exp_outp in D:
                counts = count_unsorted_list_items(inp) 
                print inp, exp_outp, counts
                self.assertEqual(counts, dict( exp_outp ))
    
            inp, exp_outp = UNSORTED_WIN = ([2,2,4,2], [(2,3), (4,1)])
            self.assertEqual(dict( exp_outp ), count_unsorted_list_items(inp) )
    
    
        def test_count_sorted_list_items(self):
            D = (
                ([], []),
                ([2], [(2,1)]),
                ([2,2], [(2,2)]),
                ([2,2,2,2,3,3,5,5], [(2,4), (3,2), (5,2)]),
                )
            for inp, exp_outp in D:
                counts = list( count_sorted_list_items(inp) )
                print inp, exp_outp, counts
                self.assertEqual(counts, exp_outp)
    
            inp, exp_outp = UNSORTED_FAIL = ([2,2,4,2], [(2,3), (4,1)])
            self.assertEqual(exp_outp, list( count_sorted_list_items(inp) ))
            # ... [(2,2), (4,1), (2,1)]
    
  • 59

    我今天遇到了这个问题并在我想检查之前推出了自己的解决方案 . 这个:

    dict((i,a.count(i)) for i in a)
    

    对于大型列表来说真的非常慢 . 我的解决方案

    def occurDict(items):
        d = {}
        for i in items:
            if i in d:
                d[i] = d[i]+1
            else:
                d[i] = 1
    return d
    

    实际上比Counter解决方案快一点,至少对于Python 2.7来说 .

  • 11

    以下是三种解决方案:

    Fastest is using a for loop and storing it in a Dict.

    import time
    from collections import Counter
    
    
    def countElement(a):
        g = {}
        for i in a:
            if i in g: 
                g[i] +=1
            else: 
                g[i] =1
        return g
    
    
    z = [1,1,1,1,2,2,2,2,3,3,4,5,5,234,23,3,12,3,123,12,31,23,13,2,4,23,42,42,34,234,23,42,34,23,423,42,34,23,423,4,234,23,42,34,23,4,23,423,4,23,4]
    
    
    #Solution 1 - Faster
    st = time.monotonic()
    for i in range(1000000):
        b = countElement(z)
    et = time.monotonic()
    print(b)
    print('Simple for loop and storing it in dict - Duration: {}'.format(et - st))
    
    #Solution 2 - Fast
    st = time.monotonic()
    for i in range(1000000):
        a = Counter(z)
    et = time.monotonic()
    print (a)
    print('Using collections.Counter - Duration: {}'.format(et - st))
    
    #Solution 3 - Slow
    st = time.monotonic()
    for i in range(1000000):
        g = dict([(i, z.count(i)) for i in set(z)])
    et = time.monotonic()
    print(g)
    print('Using list comprehension - Duration: {}'.format(et - st))
    

    Result

    #Solution 1 - 更快

    {1: 4, 2: 5, 3: 4, 4: 6, 5: 2, 234: 3, 23: 10, 12: 2, 123: 1, 31: 1, 13: 1, 42: 5, 34: 4, 423: 3}
    Simple for loop and storing it in dict - Duration: 12.032000000000153
    

    #Solution 2 - 快速

    Counter({23: 10, 4: 6, 2: 5, 42: 5, 1: 4, 3: 4, 34: 4, 234: 3, 423: 3, 5: 2, 12: 2, 123: 1, 31: 1, 13: 1})
    Using collections.Counter - Duration: 15.889999999999418
    

    #Solution 3 - 慢

    {1: 4, 2: 5, 3: 4, 4: 6, 5: 2, 34: 4, 423: 3, 234: 3, 42: 5, 12: 2, 13: 1, 23: 10, 123: 1, 31: 1}
    Using list comprehension - Duration: 33.0
    
  • 28

    您还可以使用内置模块operatorcountOf方法 .

    >>> import operator
    >>> operator.countOf([1, 2, 3, 4, 1, 4, 1], 1)
    3
    
  • 1530

    list.count(x) 返回 x 出现在列表中的次数

    见:http://docs.python.org/tutorial/datastructures.html#more-on-lists

  • 41
    sum([1 for elem in <yourlist> if elem==<your_value>])
    

    这将返回your_value的出现次数

  • 1

    Counting the occurrences of one item in a list

    要计算只有一个列表项的出现次数,您可以使用 count()

    >>> l = ["a","b","b"]
    >>> l.count("a")
    1
    >>> l.count("b")
    2
    

    计算列表中所有项目的出现次数也称为"tallying"列表,或创建计数器计数器 .

    Counting all items with count()

    要计算 l 中项目的出现次数,可以简单地使用列表推导和 count() 方法

    [[x,l.count(x)] for x in set(l)]
    

    (或类似于字典 dict((x,l.count(x)) for x in set(l))

    例:

    >>> l = ["a","b","b"]
    >>> [[x,l.count(x)] for x in set(l)]
    [['a', 1], ['b', 2]]
    >>> dict((x,l.count(x)) for x in set(l))
    {'a': 1, 'b': 2}
    

    Counting all items with Counter()

    或者, collections 库中的 Counter 类更快

    Counter(l)
    

    例:

    >>> l = ["a","b","b"]
    >>> from collections import Counter
    >>> Counter(l)
    Counter({'b': 2, 'a': 1})
    

    How much faster is Counter?

    我检查了 Counter 用于计算列表的速度有多快 . 我用 n 的几个值尝试了两种方法,并且看起来 Counter 的常数因子大约为2 .

    这是我用过的脚本:

    from __future__ import print_function
    import timeit
    
    t1=timeit.Timer('Counter(l)', \
                    'import random;import string;from collections import Counter;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]'
                    )
    
    t2=timeit.Timer('[[x,l.count(x)] for x in set(l)]',
                    'import random;import string;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]'
                    )
    
    print("Counter(): ", t1.repeat(repeat=3,number=10000))
    print("count():   ", t2.repeat(repeat=3,number=10000)
    

    并输出:

    Counter():  [0.46062711701961234, 0.4022796869976446, 0.3974247490405105]
    count():    [7.779430688009597, 7.962715800967999, 8.420845870045014]
    
  • 3

    如果您只想要一个项目的计数,请使用 count 方法:

    >>> [1, 2, 3, 4, 1, 4, 1].count(1)
    3
    

    Don't 如果要计算多个项目,请使用此项 . 在循环中调用 count 需要在每个 count 调用的列表上单独传递,这对性能来说可能是灾难性的 . 如果要计算所有项目,或者甚至只计算多个项目,请使用 Counter ,如其他答案中所述 .

  • 16

    为什么不使用熊猫?

    import pandas as pd
    
    l = ['a', 'b', 'c', 'd', 'a', 'd', 'a']
    
    # converting the list to a Series and counting the values
    my_count = pd.Series(l).value_counts()
    my_count
    

    输出:

    a    3
    d    2
    b    1
    c    1
    dtype: int64
    

    如果您正在寻找特定元素的计数,请说:a,尝试:

    my_count['a']
    

    输出:

    3
    
  • 24

    使用itertools.groupby()计算所有元素的数量

    用于获取列表中所有元素计数的Antoher可能性可以通过 itertools.groupby() .

    With "duplicate" counts

    from itertools import groupby
    
    L = ['a', 'a', 'a', 't', 'q', 'a', 'd', 'a', 'd', 'c']  # Input list
    
    counts = [(i, len(list(c))) for i,c in groupby(L)]      # Create value-count pairs as list of tuples 
    print(counts)
    

    返回

    [('a', 3), ('t', 1), ('q', 1), ('a', 1), ('d', 1), ('a', 1), ('d', 1), ('c', 1)]
    

    注意它如何将前三个 a 组合为第一组,而 a 的其他组则位于列表的下方 . 发生这种情况是因为输入列表 L 未排序 . 如果这些团体实际上应该是分开的,那么这有时会带来好处 .

    With unique counts

    如果需要唯一的组计数,只需对输入列表进行排序:

    counts = [(i, len(list(c))) for i,c in groupby(sorted(L))]
    print(counts)
    

    返回

    [('a', 5), ('c', 1), ('d', 2), ('q', 1), ('t', 1)]
    
  • 1
    def countfrequncyinarray(arr1):
        r=len(arr1)
        return {i:arr1.count(i) for i in range(1,r+1)}
    arr1=[4,4,4,4]
    a=countfrequncyinarray(arr1)
    print(a)
    
  • 27

    给定一个项目,如何在Python的列表中计算它的出现次数?

    这是一个示例列表:

    >>> l = list('aaaaabbbbcccdde')
    >>> l
    ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'e']
    

    list.count

    list.count 方法

    >>> l.count('b')
    4
    

    这适用于任何列表 . 元组也有这种方法:

    >>> t = tuple('aabbbffffff')
    >>> t
    ('a', 'a', 'b', 'b', 'b', 'f', 'f', 'f', 'f', 'f', 'f')
    >>> t.count('f')
    6
    

    collections.Counter

    然后就是馆藏 . 计数器 . 您可以将任何iterable转储到Counter中,而不仅仅是列表,Counter将保留元素计数的数据结构 .

    用法:

    >>> from collections import Counter
    >>> c = Counter(l)
    >>> c['b']
    4
    

    计数器基于Python字典,它们的键是元素,因此键需要是可清除的 . 它们基本上就像允许冗余元素进入它们的集合 .

    collections.Counter的进一步使用

    您可以使用计数器中的可迭代添加或减去:

    >>> c.update(list('bbb'))
    >>> c['b']
    7
    >>> c.subtract(list('bbb'))
    >>> c['b']
    4
    

    您也可以使用计数器进行多组操作:

    >>> c2 = Counter(list('aabbxyz'))
    >>> c - c2                   # set difference
    Counter({'a': 3, 'c': 3, 'b': 2, 'd': 2, 'e': 1})
    >>> c + c2                   # addition of all elements
    Counter({'a': 7, 'b': 6, 'c': 3, 'd': 2, 'e': 1, 'y': 1, 'x': 1, 'z': 1})
    >>> c | c2                   # set union
    Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2, 'e': 1, 'y': 1, 'x': 1, 'z': 1})
    >>> c & c2                   # set intersection
    Counter({'a': 2, 'b': 2})
    

    为什么不是熊猫?

    另一个答案暗示:

    为什么不用大熊猫?

    Pandas是一个常见的库,但它不在标准库中 . 将其添加为要求并非易事 .

    在列表对象本身以及标准库中有针对此用例的内置解决方案 .

    如果你的项目还不需要pandas,那么仅仅为了这个功能而要求它是愚蠢的 .

  • 6

    如果您想要特定元素的多次出现:

    >>> from collections import Counter
    >>> z = ['blue', 'red', 'blue', 'yellow', 'blue', 'red']
    >>> single_occurrences = Counter(z)
    >>> print(single_occurrences.get("blue"))
    3
    >>> print(single_occurrences.values())
    dict_values([3, 2, 1])
    
  • 14

    如果你想 count all values at once 你可以使用numpy数组和 bincount 非常快地完成它,如下所示

    import numpy as np
    a = np.array([1, 2, 3, 4, 1, 4, 1])
    np.bincount(a)
    

    这使

    >>> array([0, 3, 1, 1, 2])
    
  • 2

    建议使用numpy的bincount,但它仅适用于具有非负整数的1d数组 . 此外,生成的数组可能会令人困惑(它包含从原始列表的min到max的整数出现,并将缺少的整数设置为0) .

    使用numpy更好的方法是使用unique函数,并将属性 return_counts 设置为True . 它返回一个元组,其中包含唯一值的数组和每个唯一值的出现数组 .

    # a = [1, 1, 0, 2, 1, 0, 3, 3]
    a_uniq, counts = np.unique(a, return_counts=True)  # array([0, 1, 2, 3]), array([2, 3, 1, 2]
    

    然后我们可以将它们配对

    dict(zip(a_uniq, counts))  # {0: 2, 1: 3, 2: 1, 3: 2}
    

    它也适用于其他数据类型和“2d列出“,例如

    >>> a = [['a', 'b', 'b', 'b'], ['a', 'c', 'c', 'a']]
    >>> dict(zip(*np.unique(a, return_counts=True)))
    {'a': 3, 'b': 3, 'c': 2}
    
  • 4

    要计算具有共同类型的不同元素的数量:

    li = ['A0','c5','A8','A2','A5','c2','A3','A9']
    
    print sum(1 for el in li if el[0]=='A' and el[1] in '01234')
    

    3 ,而不是6

  • 0

    我已经将所有建议的解决方案(以及一些新解决方案)与perfplot(我的一个小项目)进行了比较 .

    计算一个项目

    对于足够大的数组,事实证明

    numpy.sum(numpy.array(a) == 1)
    

    比其他解决方案略快 .

    enter image description here

    计算所有项目

    As established before

    numpy.bincount(a)
    

    是你想要的 .

    enter image description here


    重现情节的代码:

    from collections import Counter
    from collections import defaultdict
    import numpy
    import operator
    import pandas
    import perfplot
    
    
    def counter(a):
        return Counter(a)
    
    
    def count(a):
        return dict((i, a.count(i)) for i in set(a))
    
    
    def bincount(a):
        return numpy.bincount(a)
    
    
    def pandas_value_counts(a):
        return pandas.Series(a).value_counts()
    
    
    def occur_dict(a):
        d = {}
        for i in a:
            if i in d:
                d[i] = d[i]+1
            else:
                d[i] = 1
        return d
    
    
    def count_unsorted_list_items(items):
        counts = defaultdict(int)
        for item in items:
            counts[item] += 1
        return dict(counts)
    
    
    def operator_countof(a):
        return dict((i, operator.countOf(a, i)) for i in set(a))
    
    
    perfplot.show(
        setup=lambda n: list(numpy.random.randint(0, 100, n)),
        n_range=[2**k for k in range(20)],
        kernels=[
            counter, count, bincount, pandas_value_counts, occur_dict,
            count_unsorted_list_items, operator_countof
            ],
        equality_check=None,
        logx=True,
        logy=True,
        )
    

    2 .

    from collections import Counter
    from collections import defaultdict
    import numpy
    import operator
    import pandas
    import perfplot
    
    
    def counter(a):
        return Counter(a)
    
    
    def count(a):
        return dict((i, a.count(i)) for i in set(a))
    
    
    def bincount(a):
        return numpy.bincount(a)
    
    
    def pandas_value_counts(a):
        return pandas.Series(a).value_counts()
    
    
    def occur_dict(a):
        d = {}
        for i in a:
            if i in d:
                d[i] = d[i]+1
            else:
                d[i] = 1
        return d
    
    
    def count_unsorted_list_items(items):
        counts = defaultdict(int)
        for item in items:
            counts[item] += 1
        return dict(counts)
    
    
    def operator_countof(a):
        return dict((i, operator.countOf(a, i)) for i in set(a))
    
    
    perfplot.show(
        setup=lambda n: list(numpy.random.randint(0, 100, n)),
        n_range=[2**k for k in range(20)],
        kernels=[
            counter, count, bincount, pandas_value_counts, occur_dict,
            count_unsorted_list_items, operator_countof
            ],
        equality_check=None,
        logx=True,
        logy=True,
        )
    
  • 1427

    可能不是最有效的,需要额外的通过来删除重复 .

    功能实施:

    arr = np.array(['a','a','b','b','b','c'])
    print(set(map(lambda x  : (x , list(arr).count(x)) , arr)))
    

    回报:

    {('c', 1), ('b', 3), ('a', 2)}
    

    或者返回 dict

    print(dict(map(lambda x  : (x , list(arr).count(x)) , arr)))
    

    回报:

    {'b': 3, 'c': 1, 'a': 2}
    
  • 1

    如果您可以使用 pandas ,那么 value_counts 就可以进行救援了 .

    >>> import pandas as pd
    >>> a = [1, 2, 3, 4, 1, 4, 1]
    >>> pd.Series(a).value_counts()
    1    3
    4    2
    3    1
    2    1
    dtype: int64
    

    它还会根据频率自动对结果进行排序 .

    如果您希望结果位于列表列表中,请执行以下操作

    >>> pd.Series(a).value_counts().reset_index().values.tolist()
    [[1, 3], [4, 2], [3, 1], [2, 1]]
    
  • 0
    from collections import Counter
    country=['Uruguay', 'Mexico', 'Uruguay', 'France', 'Mexico']
    count_country = Counter(country)
    output_list= [] 
    
    for i in count_country:
        output_list.append([i,count_country[i]])
    print output_list
    

    输出清单:

    [['Mexico', 2], ['France', 1], ['Uruguay', 2]]
    
  • 0

    如果您使用的是Python 2.7或3,并且您希望每个元素出现次数:

    >>> from collections import Counter
    >>> z = ['blue', 'red', 'blue', 'yellow', 'blue', 'red']
    >>> Counter(z)
    Counter({'blue': 3, 'red': 2, 'yellow': 1})
    

相关问题