Pandas 窗口函數(shù)的操作實(shí)例
為了處理數(shù)字?jǐn)?shù)據(jù),Pandas提供了一些變體,例如滾動(dòng),擴(kuò)展和按指數(shù)移動(dòng)權(quán)重以進(jìn)行窗口統(tǒng)計(jì)。其中包括和,均值,中位數(shù),方差,協(xié)方差,相關(guān)性等。
現(xiàn)在,我們將學(xué)習(xí)如何將它們分別應(yīng)用于DataFrame對(duì)象。
此功能可以應(yīng)用于一系列數(shù)據(jù)。指定window = n參數(shù),并在其上面應(yīng)用適當(dāng)?shù)慕y(tǒng)計(jì)函數(shù)。
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('1/1/2000', periods=10), columns = ['A', 'B', 'C', 'D']) print(df.rolling(window=3).mean())
運(yùn)行結(jié)果如下:
A B C D 2000-01-01 NaN NaN NaN NaN 2000-01-02 NaN NaN NaN NaN 2000-01-03 0.434553 -0.667940 -1.051718 -0.826452 2000-01-04 0.628267 -0.047040 -0.287467 -0.161110 2000-01-05 0.398233 0.003517 0.099126 -0.405565 2000-01-06 0.641798 0.656184 -0.322728 0.428015 2000-01-07 0.188403 0.010913 -0.708645 0.160932 2000-01-08 0.188043 -0.253039 -0.818125 -0.108485 2000-01-09 0.682819 -0.606846 -0.178411 -0.404127 2000-01-10 0.688583 0.127786 0.513832 -1.067156
由于窗口大小為3,因此對(duì)于前兩個(gè)元素為空,從第三個(gè)元素開始,其值為n,n-1和n-2元素的平均值。因此,我們還可以應(yīng)用上述各種功能。
此功能可以應(yīng)用于一系列數(shù)據(jù)。指定min_periods = n參數(shù),并在其上面應(yīng)用適當(dāng)?shù)慕y(tǒng)計(jì)函數(shù)。
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('1/1/2000', periods=10), columns = ['A', 'B', 'C', 'D']) print(df.expanding(min_periods=3).mean())
運(yùn)行結(jié)果如下:
A B C D 2000-01-01 NaN NaN NaN NaN 2000-01-02 NaN NaN NaN NaN 2000-01-03 0.434553 -0.667940 -1.051718 -0.826452 2000-01-04 0.743328 -0.198015 -0.852462 -0.262547 2000-01-05 0.614776 -0.205649 -0.583641 -0.303254 2000-01-06 0.538175 -0.005878 -0.687223 -0.199219 2000-01-07 0.505503 -0.108475 -0.790826 -0.081056 2000-01-08 0.454751 -0.223420 -0.671572 -0.230215 2000-01-09 0.586390 -0.206201 -0.517619 -0.267521 2000-01-10 0.560427 -0.037597 -0.399429 -0.376886
ewm 應(yīng)用于一系列數(shù)據(jù)。指定com,span,halflife參數(shù)中的任何一個(gè),并在其上面應(yīng)用適當(dāng)?shù)慕y(tǒng)計(jì)函數(shù)。它按指數(shù)分配權(quán)重。
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('1/1/2000', periods=10), columns = ['A', 'B', 'C', 'D']) print(df.ewm(com=0.5).mean())
運(yùn)行結(jié)果如下:
A B C D 2000-01-01 1.088512 -0.650942 -2.547450 -0.566858 2000-01-02 0.865131 -0.453626 -1.137961 0.058747 2000-01-03 -0.132245 -0.807671 -0.308308 -1.491002 2000-01-04 1.084036 0.555444 -0.272119 0.480111 2000-01-05 0.425682 0.025511 0.239162 -0.153290 2000-01-06 0.245094 0.671373 -0.725025 0.163310 2000-01-07 0.288030 -0.259337 -1.183515 0.473191 2000-01-08 0.162317 -0.771884 -0.285564 -0.692001 2000-01-09 1.147156 -0.302900 0.380851 -0.607976 2000-01-10 0.600216 0.885614 0.569808 -1.110113
窗口函數(shù)主要用于通過平滑曲線以圖形方式在數(shù)據(jù)中查找趨勢(shì)。如果日常數(shù)據(jù)變化很大,并且有許多數(shù)據(jù)點(diǎn)可用,則采用樣本和繪圖是一種方法,應(yīng)用窗口計(jì)算并在結(jié)果上繪圖是另一種方法。通過這些方法,我們可以平滑曲線或趨勢(shì)。