Re: [問題] 用groupby()做累加

看板Python作者ccwang002 (亮)時間11年前 (2015/03/26 00:05)推噓0(0推 0噓 0→)

留言0則, 0人參與討論串2/2 (看更多)

原文恕刪，這類資料處理我都會用 pandas 來做，因為方便 import pandas as pd 資料格式有三個欄位： zipcode date revenue : mock=[[106,20150101,100], : [106,20150101,200], : [106,20150201,300], : [106,20150201,400], : [220,20150201,200], : [220,20150201,300], : [220,20150301,400], : [220,20150301,500]] # 讀入資料 df = pd.DataFrame(mock, columns=['zipcode', 'date', 'revenue']) # 如果有需要的話，可以把 date 轉成 Python datetime 物件 # df.date = pd.to_datetime(df.date, format='%Y%m%d') # 執行你想要的 groupby merged_df = df.groupby(['zipcode', 'date']).sum() # 把 Dataframe 轉成 list of tuple 形式（可以再轉成 list of list） list(merged_df.to_records()) Output: [(106, 20150101, 300), (106, 20150201, 700), (220, 20150201, 500), (220, 20150301, 900)] EDIT 抱歉剛剛沒看到第二部份，我有點沒理解你的意思，是希望同一區每月的營收能逐月累加嗎？ # 讓它變回常見的 DataFrame df = merged_df.reset_index() # 確定按照 zipcode 小->大, date 舊->新排序 df.sort(['zipcode', 'date'], inplace=True) # 用 zipcode 做 groupby，對營收做累加，把結果存在新的 cumsum 欄位 grouped = df.groupby('zipcode', sort=False) df['cumsum'] = grouped.cumsum() # 輸出 list(df.to_records(index=False)) Output: [(106, 20150101, 300, 300), (106, 20150201, 700, 1000), (220, 20150201, 500, 500), (220, 20150301, 900, 1400)] 希望有幫到你的忙~ -- PyCon APAC 2015 募集講題中 ~Mar 31 詳見 https://tw.pycon.org/2015apac/en/call-for-proposals/ （依然徵求志工中，歡迎寫信至 organizers@pycon.tw） -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 114.42.225.184 ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1427299541.A.C37.html ※ 編輯: ccwang002 (114.42.225.184), 03/26/2015 01:17:19