首页 文章

有没有简单的方法来获得蟒蛇的峰值和最低点? [等候接听]

提问于
浏览
1

以下列格式考虑我的数据:

20180101,10
20180102,20
20180103,15
....

第一个是日期,第二个是销售的产品数量,而不是将所有这些产品插入到数据库中,并使用select max xxxx SQL语句来查找一段时间内的最大数量,是否有任何简写或有用的库可以达到这个目的吗?谢谢 .

5 回答

  • -1

    如果这是期望的结果,请 .

    data = [{'date':1, 'products_sold': 2}, {'date':2, 'products_sold': 5},{'date':5, 'products_sold': 2}]
    start_date = 1
    end_date = 2
    max_value_in_period = max(x['products_sold'] for x in data if x['date'] >= start_date and x['date'] <= end_date)
    print(max_value_in_period)
    
  • 1

    我试过@Patrick Artner的评论:

    a = (20180101,10)
    b = (20180102,20)
    c = (20180103,15)
    d = (a,b,c)
    maximum = max( d, key = lambda x:x[1])
    minimum = min(d, key= lambda x:x[1])
    print(minimum)
    

    也许这给了一些灵感 .

  • 0

    Pandas是你想要的lib .

    让我举个例子给你看看:

    import numpy as np
    import pandas as pd
    
    # let's build a dummy dataset
    index = pd.date_range(start="1/1/2015", end="31/12/2018")
    df = pd.DataFrame(np.random.randint(100, size=len(index)),
                      columns=["sales"], index=index)
    
    >>> df.head()
                sales
    2015-01-01     32
    2015-01-02      0
    2015-01-03     12
    2015-01-04     77
    2015-01-05     86
    

    现在假设您希望按月汇总销售额:

    >>> df["sales"].groupby(pd.Grouper(freq="1M")).sum()
    
    2015-01-31    1441
    2015-02-28    1164
    2015-03-31    1624
    2015-04-30    1629
    2015-05-31    1427
    [...]
    

    或一学期的基础

    df["sales"].groupby(pd.Grouper(freq="6M", closed="left", label="right")).sum()    
    2015-06-30    8921
    2015-12-31    9365
    2016-06-30    9820
    2016-12-31    8881
    2017-06-30    8773
    2017-12-31    8709
    2018-06-30    9481
    2018-12-31    9522
    2019-06-30      51
    

    出于某种原因,_s1831_带有六个月频率的频率有一些问题,31/12销售,它在2019年将它们放入一个新的垃圾箱,调查它会让你知道我是否找到任何东西......或者如果有人想评论请做

    或者你想知道哪个是最好的学期:

    >>> df["sales"].groupby(pd.Grouper(freq="6M")).sum().idxmax()              
    Timestamp('2016-06-30 00:00:00', freq='6M')
    
  • 0

    你应该使用 pandas

    假设您的日期列称为“日期”,并且它是日期时间dtypes:

    import pandas as pd
    df = pd.DataFrame(data)
    df = df.set_index('date')
    df.groupby(pd.Grouper(freq='1M')).max()
    

    会给你每个月最大freq可以改变为你喜欢的任何频率 .

  • 1

    这可能是一个有偏见的答案,但大熊猫非常适合处理这样的数据 . 虽然您可以使用元组,列表等完成此类操作,但pandas提供了更多功能 . 例如:

    import pandas as pd
    data = [[20180101,15], [20180102,10], [20180103,12],[20180104,10]]
    df = pd.DataFrame(data=data, columns=['date', 'products'])
    # if your data is in csv, excel, database... whatever... you can easily pull
    # df = pd.read_csv('name') || pd.read_excel() || pd.read_sql()
    df
    Out[2]: 
           date  products
    0  20180101        15
    1  20180102        10
    2  20180103        12
    3  20180104        10
    
    # It helps to use datetime format to perform operations on the data
    # Operations make reference to an "index" in the dataframe
    df.index = pd.to_datetime(df['date'], format="%Y%m%d")  #strftime format
    df
    Out[3]: 
                    date  products
    date                          
    2018-01-01  20180101        15
    2018-01-02  20180102        10
    2018-01-03  20180103        12
    2018-01-04  20180104        10
    
    # Now we can drop that date column...
    df.drop(columns='date', inplace=True)
    df
    Out[4]: 
                products
    date                
    2018-01-01        15
    2018-01-02        10
    2018-01-03        12
    2018-01-04        10
    
    # Yes, there are ways to do the above in shorthand... lots of info on pandas on SO
    # I want you to see the individual steps we are taking to keep simple
    
    # Now is when the fun begins
    df.rolling(2).sum()  # prints a rolling 2-day sum
    Out[5]: 
                products
    date                
    2018-01-01       NaN
    2018-01-02      25.0
    2018-01-03      22.0
    2018-01-04      22.0
    
    df.rolling(3).mean()  # prints a rolling 3-day average
    Out[6]: 
                 products
    date                 
    2018-01-01        NaN
    2018-01-02        NaN
    2018-01-03  12.333333
    2018-01-04  10.666667
    
    df.resample('W').sum()  # Resamples the data so you can look on a weekly basis
    Out[7]: 
                products
    date                
    2018-01-07        47
    
    df.rolling(2).max() # max number of products over a rolling two-day period
    Out[9]: 
                products
    date                
    2018-01-01       NaN
    2018-01-02      15.0
    2018-01-03      12.0
    2018-01-04      12.0
    

相关问题