首页 文章

使用python按周汇总数据

提问于
浏览
2

我有一个这样的数据,并且我不会在一周内将它聚合为两个日期:我确实有这个元组列表,它从我的交易表中显示每天的销售额

Metadate = [(Day, 'totalSales By dollars')]
data=[('2013-06-21', 14),
 ('2013-06-20', 19),
 ('2013-06-23', 11),
 ('2013-06-22', 13),
 ('2013-06-25', 6),
 ('2013-06-24', 22),
 ('2013-06-27', 22),
 ('2013-06-26', 3),
 ('2013-06-29', 10),
 ('2013-06-28', 15),
 ('2013-04-05', 20),
 ('2013-04-04', 33),
 ('2013-04-07', 20),
 ('2013-04-06', 15),
 ('2013-04-01', 23),
 ('2013-04-03', 5),
 ('2013-04-02', 8),
 ('2013-04-09', 10),
 ('2013-04-08', 24),
 ('2013-05-08', 1),
 ('2013-05-09', 29),
 ('2013-05-02', 17),
 ('2013-05-03', 18),
 ('2013-05-01', 4),
 ('2013-05-06', 16),
 ('2013-05-07', 10),
 ('2013-05-04', 9),
 ('2013-05-05', 12),
 ('2013-05-19', 21),
 ('2013-05-18', 26),
 ('2013-05-11', 8),
 ('2013-05-10', 12),
 ('2013-05-13', 24),
 ('2013-05-12', 9),
 ('2013-05-15', 5),
 ('2013-05-14', 7),
 ('2013-05-17', 20),
 ('2013-05-16', 36),
 ('2013-05-20', 24),
 ('2013-05-21', 5),
 ('2013-05-22', 3),
 ('2013-05-23', 18),
 ('2013-05-24', 8),
 ('2013-05-25', 11),
 ('2013-05-26', 9),
 ('2013-05-27', 13),
 ('2013-05-28', 4),
 ('2013-05-29', 7),
 ('2013-06-18', 9),
 ('2013-06-19', 2),
 ('2013-06-10', 20),
 ('2013-06-11', 4),
 ('2013-06-12', 3),
 ('2013-06-13', 25),
 ('2013-06-14', 16),
 ('2013-06-15', 10),
 ('2013-06-16', 11),
 ('2013-06-17', 17),
 ('2013-04-30', 12),
 ('2013-05-31', 13),
 ('2013-05-30', 29),
 ('2013-06-09', 12),
 ('2013-06-08', 20),
 ('2013-06-07', 47),
 ('2013-06-06', 5),
 ('2013-06-05', 2),
 ('2013-06-04', 3),
 ('2013-06-03', 32),
 ('2013-06-02', 13),
 ('2013-06-01', 9),
 ('2013-04-23', 3),
 ('2013-04-22', 33),
 ('2013-04-21', 14),
 ('2013-04-20', 20),
 ('2013-04-27', 15),
 ('2013-04-26', 17),
 ('2013-04-25', 21),
 ('2013-04-24', 1),
 ('2013-04-29', 34),
 ('2013-04-28', 11),
 ('2013-06-30', 13),
 ('2013-04-16', 5),
 ('2013-04-17', 3),
 ('2013-04-14', 10),
 ('2013-04-15', 22),
 ('2013-04-12', 23),
 ('2013-04-13', 19),
 ('2013-04-10', 1),
 ('2013-04-11', 31),
 ('2013-04-18', 27),
 ('2013-04-19', 14)]

我希望按周计算开始和结束日期的两个日期:
输出示例:

[(2013-05-07, 900), [(2013-05-14, 1800),....., (2013-08-01, 1000)]

提前致谢!

2 回答

  • 1

    试试这个:

    from datetime import datetime
    import itertools
    
    sales = [
        ('2013-05-01', 100),
        ('2013-05-02', 200),
        ('2013-05-03', 150),
        ('2013-05-03', 120),
        ('2013-05-04', 200),
        ('2013-08-01', 250),
    ]
    
    def toWeek(sale):
        '''(date,volume) -> date of the Sunday of that week'''
        sunday = datetime.strptime(sale[0], '%Y-%m-%d').strftime('%Y-%U-0')
        return datetime.strptime(sunday, '%Y-%U-%w').strftime('%Y-%m-%d')
    
    grouped_sales = itertools.groupby(sales, toWeek)
    
    aggregate_sales = (
        (week, sum(day_sales for date, day_sales in week_sales))
        for week, week_sales in grouped_sales)
    
    print list(aggregate_sales)
    
  • 2

    您也可以使用ISO 8601 week calculationsdeque来执行此操作:

    import datetime
    from collections import deque
    
    data=[
        ('2013-05-01', 100),
        ('2013-05-02', 200),
        ('2013-05-03', 150),
        ('2013-05-03', 120),
        ('2013-05-04', 200),
        ('2013-08-01', 250),
    ]
    fmt='%Y-%m-%d'
    data=deque((datetime.datetime.strptime(x,fmt),y) 
                  for x,y in sorted(data,key=lambda t:t[0]))
    grouped={}
    while data:
        week=data[0][0]-datetime.timedelta(days=(data[0][0].isocalendar()[2]-1))
        temp=[data.popleft()]
        while data and week.isocalendar()[:2]==data[0][0].isocalendar()[:2]:
            temp.append(data.popleft())
    
        key='ISO week {}-W{:02}'.format(*week.isocalendar()[:2]) 
        key+=' ({} - {})'.format(week.strftime(fmt),
                            (week+datetime.timedelta(days=6)).strftime(fmt))     
    
        grouped[key]=sum(t[1] for t in temp)
    
    for k in sorted(grouped):
        print '{}: {}'.format(k, grouped[k])
    

    打印:

    ISO week 2013-W18 (2013-04-29 - 2013-05-05): 770
    ISO week 2013-W31 (2013-07-29 - 2013-08-04): 250
    

    编辑

    我看到了你发布的代码,你可以用另一个deque简化:

    D=deque()
    fmt='%Y-%m-%d'
    data=deque((datetime.datetime.strptime(x,fmt),y) 
                  for x,y in sorted(data,key=lambda t:t[0]))
    while data:
        week=data[0][0]-datetime.timedelta(days=(data[0][0].isocalendar()[2]-1))
        temp=[data.popleft()]
        while data and week.isocalendar()[:2]==data[0][0].isocalendar()[:2]:
            temp.append(data.popleft())
    
        D.appendleft((week.strftime(fmt), sum(t[1] for t in temp)))
    
    print list(D)
    

    打印

    [('2013-06-24', 91), ('2013-06-17', 85), ('2013-06-10', 89), ('2013-06-03', 121), ('2013-05-27', 88), ('2013-05-20', 78), ('2013-05-13', 139), ('2013-05-06', 85), ('2013-04-29', 106), ('2013-04-22', 101), ('2013-04-15', 105), ('2013-04-08', 118), ('2013-04-01', 124)]
    

相关问题