我有一个任意长度的列表,我需要将它分成相同大小的块并对其进行操作 . 有一些明显的方法可以做到这一点,比如保留一个计数器和两个列表,当第二个列表填满时,将它添加到第一个列表并清空下一轮数据的第二个列表,但这可能非常昂贵 .

我想知道是否有人对任何长度的列表都有一个很好的解决方案,例如使用发电机 .

我在寻找 itertools 中有用的东西,但我不能错过它 .

  • 9

    此时,我认为我们需要强制性的匿名递归功能 .

    Y = lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args)))
    chunks = Y(lambda f: lambda n: [n[0][:n[1]]] + f((n[0][n[1]:], n[1])) if len(n[0]) > 0 else [])
  • 7


    这些答案中没有一个是大小均匀的块,它们最后都留下了一个小块,所以它们并不是完全 balancer 的 . 如果你使用这些功能来分配工作,你已经内置了一个可能在其他人之前完成的前景,所以当其他人继续努力工作时,它会无所事事 .


    [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
    [70, 71, 72, 73, 74]]


    其他人,如 list(grouper(3, xrange(7)))chunk(xrange(7), 3) 都返回: [(0, 1, 2), (3, 4, 5), (6, None, None)] . None 只是填充,在我看来相当不优雅 . 它们不是均匀地分块迭代 .



    这里's a balanced solution, adapted from a function I' ve用于 生产环境 (在Python 3中注意用 range 替换 xrange ):

    def baskets_from(items, maxbaskets=25):
        baskets = [[] for _ in xrange(maxbaskets)] # in Python 3 use range
        for i, item in enumerate(items):
            baskets[i % maxbaskets].append(item)
        return filter(None, baskets)


    def iter_baskets_from(items, maxbaskets=3):
        '''generates evenly balanced baskets from indexable iterable'''
        item_count = len(items)
        baskets = min(item_count, maxbaskets)
        for x_i in xrange(baskets):
            yield [items[y_i] for y_i in xrange(x_i, item_count, baskets)]


    def iter_baskets_contiguous(items, maxbaskets=3, item_count=None):
        generates balanced baskets from iterable, contiguous contents
        provide item_count if providing a iterator that doesn't support len()
        item_count = item_count or len(items)
        baskets = min(item_count, maxbaskets)
        items = iter(items)
        floor = item_count // baskets 
        ceiling = floor + 1
        stepdown = item_count % baskets
        for x_i in xrange(baskets):
            length = ceiling if x_i < stepdown else floor
            yield [items.next() for _ in xrange(length)]



    print(baskets_from(xrange(6), 8))
    print(list(iter_baskets_from(xrange(6), 8)))
    print(list(iter_baskets_contiguous(xrange(6), 8)))
    print(baskets_from(xrange(22), 8))
    print(list(iter_baskets_from(xrange(22), 8)))
    print(list(iter_baskets_contiguous(xrange(22), 8)))
    print(baskets_from('ABCDEFG', 3))
    print(list(iter_baskets_from('ABCDEFG', 3)))
    print(list(iter_baskets_contiguous('ABCDEFG', 3)))
    print(baskets_from(xrange(26), 5))
    print(list(iter_baskets_from(xrange(26), 5)))
    print(list(iter_baskets_contiguous(xrange(26), 5)))


    [[0], [1], [2], [3], [4], [5]]
    [[0], [1], [2], [3], [4], [5]]
    [[0], [1], [2], [3], [4], [5]]
    [[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
    [[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
    [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19], [20, 21]]
    [['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
    [['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
    [['A', 'B', 'C'], ['D', 'E'], ['F', 'G']]
    [[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
    [[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
    [[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]

    请注意,连续生成器以与其他两个相同的长度模式提供块,但是这些项都是有序的,并且它们被均匀地划分为可以划分离散元素的列表 .

  • 5
    def chunk(input, size):
        return map(None, *([iter(input)] * size))
  • 5


    def split_list(the_list, chunk_size):
        result_list = []
        while the_list:
            the_list = the_list[chunk_size:]
        return result_list
    a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    print split_list(a_list, 3)


    [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
  • 15


    l = range(1, 1000)
    print [l[x:x+10] for x in xrange(0, len(l), 10)]


    chunks = lambda l, n: [l[x: x+n] for x in xrange(0, len(l), n)]
    chunks(l, 10)
  • 10


    def split_seq(iterable, size):
        it = iter(iterable)
        item = list(itertools.islice(it, size))
        while item:
            yield item
            item = list(itertools.islice(it, size))


    >>> import pprint
    >>> pprint.pprint(list(split_seq(xrange(75), 10)))
    [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
     [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
     [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
     [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
     [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
     [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
     [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
     [70, 71, 72, 73, 74]]
  • 5

    我很惊讶没有人想过使用 itertwo-argument form

    from itertools import islice
    def chunk(it, size):
        it = iter(it)
        return iter(lambda: tuple(islice(it, size)), ())


    >>> list(chunk(range(14), 3))
    [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]

    这适用于任何可迭代的并且懒惰地产生输出 . 它返回元组而不是迭代器,但我认为它有一定的优雅 . 它也不垫;如果你想要填充,上面的一个简单的变化就足够了:

    from itertools import islice, chain, repeat
    def chunk_pad(it, size, padval=None):
        it = chain(iter(it), repeat(padval))
        return iter(lambda: tuple(islice(it, size)), (padval,) * size)


    >>> list(chunk_pad(range(14), 3))
    [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
    >>> list(chunk_pad(range(14), 3, 'a'))
    [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

    像基于 izip_longest 的解决方案一样,上面的内容总是如此 . 据我所知,没有一行或两行的itertools配方可选择填充功能 . 通过结合上述两种方法,这一方法非常接近:

    _no_padding = object()
    def chunk(it, size, padval=_no_padding):
        if padval == _no_padding:
            it = iter(it)
            sentinel = ()
            it = chain(iter(it), repeat(padval))
            sentinel = (padval,) * size
        return iter(lambda: tuple(islice(it, size)), sentinel)


    >>> list(chunk(range(14), 3))
    [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
    >>> list(chunk(range(14), 3, None))
    [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
    >>> list(chunk(range(14), 3, 'a'))
    [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

    我相信这是提供可选填充的最短时间段 .

    作为Tomasz Gandor observed,如果两个填充块遇到一长串填充值,它们将意外停止 . 这是一个以合理的方式解决该问题的最终变体:

    _no_padding = object()
    def chunk(it, size, padval=_no_padding):
        it = iter(it)
        chunker = iter(lambda: tuple(islice(it, size)), ())
        if padval == _no_padding:
            yield from chunker
            for ch in chunker:
                yield ch if len(ch) == size else ch + (padval,) * (size - len(ch))


    >>> list(chunk([1, 2, (), (), 5], 2))
    [(1, 2), ((), ()), (5,)]
    >>> list(chunk([1, 2, None, None, 5], 2, None))
    [(1, 2), (None, None), (5, None)]
  • 5


    def chunks(seq, n):
        return (seq[i:i+n] for i in xrange(0, len(seq), n))

    例如 .

    print list(chunks(range(1, 1000), 10))
  • 114


    In [48]: chunk = lambda ulist, step:  map(lambda i: ulist[i:i+step],  xrange(0, len(ulist), step))
    In [49]: chunk(range(1,100), 10)
    [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
     [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
     [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
     [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
     [41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
     [51, 52, 53, 54, 55, 56, 57, 58, 59, 60],
     [61, 62, 63, 64, 65, 66, 67, 68, 69, 70],
     [71, 72, 73, 74, 75, 76, 77, 78, 79, 80],
     [81, 82, 83, 84, 85, 86, 87, 88, 89, 90],
     [91, 92, 93, 94, 95, 96, 97, 98, 99]]
  • 6
    def chunks(iterable,n):
        """assumes n is an integer>0
        while True:
            for i in range(n):
                except StopIteration:
            if result:
                yield result
    g1=(i*i for i in range(10))
    print g2
    '<generator object chunks at 0x0337B9B8>'
    print list(g2)
    '[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]'
  • 14

    我知道这有点旧,但我不知道为什么没有人提到 numpy.array_split

    lst = range(50)
    In [26]: np.array_split(lst,5)
    [array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
     array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
     array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
     array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]),
     array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]
  • 12


    def splitter(l, n):
        i = 0
        chunk = l[:n]
        while chunk:
            yield chunk
            i += n
            chunk = l[i:i+n]


    def isplitter(l, n):
        l = iter(l)
        chunk = list(islice(l, n))
        while chunk:
            yield chunk
            chunk = list(islice(l, n))


    def isplitter2(l, n):
        return takewhile(bool,
                         (tuple(islice(start, n))
                                for start in repeat(iter(l))))


    def chunks_gen_sentinel(n, seq):
        continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
        return iter(imap(tuple, continuous_slices).next,())


    def chunks_gen_filter(n, seq):
        continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
        return takewhile(bool,imap(tuple, continuous_slices))
  • 7


    def chunks(l, n):
        """Yield successive n-sized chunks from l."""
        for i in range(0, len(l), n):
            yield l[i:i + n]

    import pprint
    pprint.pprint(list(chunks(range(10, 75), 10)))
    [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
     [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
     [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
     [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
     [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
     [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
     [70, 71, 72, 73, 74]]

    如果您使用的是Python 2,则应使用 xrange() 而不是 range()

    def chunks(l, n):
        """Yield successive n-sized chunks from l."""
        for i in xrange(0, len(l), n):
            yield l[i:i + n]

    您也可以简单地使用列表理解而不是编写函数 . Python 3:

    [l[i:i + n] for i in range(0, len(l), n)]

    Python 2版本:

    [l[i:i + n] for i in xrange(0, len(l), n)]
  • 39


    def SplitList(list, chunk_size):
        return [list[offs:offs+chunk_size] for offs in range(0, len(list), chunk_size)]


    def IterChunks(sequence, chunk_size):
        res = []
        for item in sequence:
            if len(res) >= chunk_size:
                yield res
                res = []
        if res:
            yield res  # yield the last, incomplete, portion

    在后一种情况下,如果你可以确定序列总是包含给定大小的整数个块(即没有不完整的最后一个块),它可以以更漂亮的方式重新表述 .

  • 29
    def split_seq(seq, num_pieces):
        start = 0
        for i in xrange(num_pieces):
            stop = start + len(seq[i::num_pieces])
            yield seq[start:stop]
            start = stop


    seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    for seq in split_seq(seq, 3):
        print seq
  • 489

    在这一点上,我认为我们需要 recursive generator ,以防万一......

    在python 2中:

    def chunks(li, n):
        if li == []:
        yield li[:n]
        for e in chunks(li[n:], n):
            yield e

    在python 3中:

    def chunks(li, n):
        if li == []:
        yield li[:n]
        yield from chunks(li[n:], n)

    此外,在大规模的外星人入侵的情况下, decorated recursive generator 可能会变得方便:

    def dec(gen):
        def new_gen(li, n):
            for e in gen(li, n):
                if e == []:
                yield e
        return new_gen
    def chunks(li, n):
        yield li[:n]
        for e in chunks(li[n:], n):
            yield e
  • 2312


    在Python 3.5.1上测试

    import time
    batch_size = 7
    arr_len = 298937
    start = time.time()
    arr = [i for i in range(0, arr_len)]
    while True:
        if not arr:
        tmp = arr[0:batch_size]
        arr = arr[batch_size:-1]
    print(time.time() - start)
    arr = [i for i in range(0, arr_len)]
    start = time.time()
    for i in range(0, round(len(arr) / batch_size + 1)):
        tmp = arr[batch_size * i : batch_size * (i + 1)]
    print(time.time() - start)
    #----------batches 1------------
    def batch(iterable, n=1):
        l = len(iterable)
        for ndx in range(0, l, n):
            yield iterable[ndx:min(ndx + n, l)]
    print("\r\nbatches 1")
    arr = [i for i in range(0, arr_len)]
    start = time.time()
    for x in batch(arr, batch_size):
        tmp = x
    print(time.time() - start)
    #----------batches 2------------
    from itertools import islice, chain
    def batch(iterable, size):
        sourceiter = iter(iterable)
        while True:
            batchiter = islice(sourceiter, size)
            yield chain([next(batchiter)], batchiter)
    print("\r\nbatches 2")
    arr = [i for i in range(0, arr_len)]
    start = time.time()
    for x in batch(arr, batch_size):
        tmp = x
    print(time.time() - start)
    def chunks(l, n):
        """Yield successive n-sized chunks from l."""
        for i in range(0, len(l), n):
            yield l[i:i + n]
    arr = [i for i in range(0, arr_len)]
    start = time.time()
    for x in chunks(arr, batch_size):
        tmp = x
    print(time.time() - start)
    from itertools import zip_longest # for Python 3.x
    #from six.moves import zip_longest # for both (uses the six compat library)
    def grouper(iterable, n, padvalue=None):
        "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
        return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
    arr = [i for i in range(0, arr_len)]
    start = time.time()
    for x in grouper(arr, batch_size):
        tmp = x
    print(time.time() - start)


    batches 1
    batches 2
  • 84


    from itertools import izip, chain, repeat
    def grouper(n, iterable, padvalue=None):
        "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
        return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)


    #from itertools import izip_longest as zip_longest # for Python 2.x
    from itertools import zip_longest # for Python 3.x
    #from six.moves import zip_longest # for both (uses the six compat library)
    def grouper(n, iterable, padvalue=None):
        "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
        return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

    我猜Guido的时间机器工作 - 工作 - 将工作 - 将工作 - 再次工作 .

    这些解决方案有效,因为 [iter(iterable)]*n (或早期版本中的等价物)创建了一个迭代器,重复 n 次列表 . izip_longest 然后有效地执行"each"迭代器的循环;因为这是相同的迭代器,所以每个这样的调用都会使它前进,从而导致每个这样的zip-roundrobin生成一个 n 项的元组 .

  • 47



    import matplotlib.cbook as cbook
    segments = cbook.pieces(np.arange(20), 3)
    for s in segments:
         print s
  • 17

    toolz库具有 partition 功能:

    from toolz.itertoolz.core import partition
    list(partition(2, [1, 2, 3, 4]))
    [(1, 2), (3, 4)]
  • 5

    我喜欢tzot和J.F.Sebastian提出的Python doc版本,但它有两个缺点:

    • 它不是很明确

    • 我通常不希望最后一个块中有填充值


    from itertools import islice
    def chunks(n, iterable):
        iterable = iter(iterable)
        while True:
            yield tuple(islice(iterable, n)) or iterable.next()


    from itertools import chain, islice
    def chunks(n, iterable):
       iterable = iter(iterable)
       while True:
           yield chain([next(iterable)], islice(iterable, n-1))
  • 7


    def make_chunks(data, chunk_size): 
        while data:
            chunk, data = data[:chunk_size], data[chunk_size:]
            yield chunk
    >>> for chunk in make_chunks([1, 2, 3, 4, 5, 6, 7], 2):
    ...     print chunk
    [1, 2]
    [3, 4]
    [5, 6]
  • 10


    >>> from utilspie import iterutils
    >>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> list(iterutils.get_chunks(a, 5))
    [[1, 2, 3, 4, 5], [6, 7, 8, 9]]


    sudo pip install utilspie

    免责声明:我是utilspie库的创建者 .

  • 8


    from itertools import zip_longest
    a = range(1, 16)
    i = iter(a)
    r = list(zip_longest(i, i, i))
    >>> print(r)
    [(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]

    你可以为任何n创建n元组 . 如果 a = range(1, 15) ,那么结果将是:

    [(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, None)]

    如果列表均匀分配,则可以将 zip_longest 替换为 zip ,否则三元组 (13, 14, None) 将丢失 . 上面使用了Python 3 . 对于Python 2,请使用 izip_longest .

  • 30


    zip(*[iterable[i::3] for i in range(3)])


    当我的块大小是我可以输入的固定数字时,我会使用它,例如'3',永远不会改变 .

  • 76


    def chunks(l, n):
        n = max(1, n)
        return (l[i:i+n] for i in xrange(0, len(l), n))
  • 4

    另一个更明确的版本 .

    def chunkList(initialList, chunkSize):
        This function chunks a list into sub lists 
        that have a length equals to chunkSize.
        lst = [3, 4, 9, 7, 1, 1, 2, 3]
        print(chunkList(lst, 3)) 
        [[3, 4, 9], [7, 1, 1], [2, 3]]
        finalList = []
        for i in range(0, len(initialList), chunkSize):
        return finalList
  • 259

    既然大家都在谈论迭代器 . boltons有完美的方法,称为iterutils.chunked_iter .

    from boltons import iterutils
    list(iterutils.chunked_iter(list(range(50)), 11))


    [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
     [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
     [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
     [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
     [44, 45, 46, 47, 48, 49]]

    但是如果你不想对内存怜悯,你可以使用旧方式并使用iterutils.chunked将完整的 list 存储起来 .

  • 18
    a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
    CHUNK = 4
    [a[i*CHUNK:(i+1)*CHUNK] for i in xrange((len(a) + CHUNK - 1) / CHUNK )]
  • 15
    [AA[i:i+SS] for i in range(len(AA))[::SS]]

    AA是数组,SS是块大小 . 例如:

    >>> AA=range(10,21);SS=3
    >>> [AA[i:i+SS] for i in range(len(AA))[::SS]]
    [[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
    # or [range(10, 13), range(13, 16), range(16, 19), range(19, 21)] in py3
