Pandas 連接的操作實(shí)例
Pandas提供了各種功能,可以輕松地將Series,DataFrame和Panel對(duì)象組合在一起。
pd.concat(objs,axis=0,join='outer',join_axes=None, ignore_index=False)
objs ? 這是Series的序列或映射,DataFrame或Panel對(duì)象。 axis ? {0,1,...},默認(rèn)為0。這是要串聯(lián)的軸。 join ? {'inner','outer'},默認(rèn)為'outer'。如何處理其他軸上的索引。外部為聯(lián)合,內(nèi)部為交叉。 ignore_index ? 布爾值,默認(rèn)為False。如果為T(mén)rue,則不要在串聯(lián)軸上使用索引值。結(jié)果軸將標(biāo)記為0,...,n-1。 join_axes ? 這是索引對(duì)象的列表。用于其他(n-1)軸的特定索引,而不是執(zhí)行內(nèi)部/外部設(shè)置邏輯。
該CONCAT函數(shù)執(zhí)行所有沿軸線(xiàn)進(jìn)行聯(lián)接操作的重任。讓我們創(chuàng)建不同的對(duì)象并進(jìn)行串聯(lián)。
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) print(pd.concat([one,two])))
運(yùn)行結(jié)果如下:
Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5
假設(shè)我們想將特定的鍵與切碎的DataFrame的每個(gè)片段相關(guān)聯(lián)。我們可以通過(guò)使用keys參數(shù)來(lái)做到這一點(diǎn)-
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) print(pd.concat([one,two],keys=['x','y']))
運(yùn)行結(jié)果如下:
x 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 y 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5
結(jié)果的索引是重復(fù)的;每個(gè)索引重復(fù)。
如果結(jié)果對(duì)象必須遵循其自己的索引,則將ignore_index設(shè)置為T(mén)rue。
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) print(pd.concat([one,two],keys=['x','y'],ignore_index=True))
運(yùn)行結(jié)果如下:
Marks_scored Name subject_id 0 98 Alex sub1 1 90 Amy sub2 2 87 Allen sub4 3 69 Alice sub6 4 78 Ayoung sub5 5 89 Billy sub2 6 80 Brian sub4 7 79 Bran sub3 8 97 Bryce sub6 9 88 Betty sub5
注意,索引完全更改,并且鍵也被覆蓋。
如果需要沿axis = 1添加兩個(gè)對(duì)象,則將添加新列。
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) print(pd.concat([one,two],axis=1))
運(yùn)行結(jié)果如下:
Marks_scored Name subject_id Marks_scored Name subject_id 1 98 Alex sub1 89 Billy sub2 2 90 Amy sub2 80 Brian sub4 3 87 Allen sub4 79 Bran sub3 4 69 Alice sub6 97 Bryce sub6 5 78 Ayoung sub5 88 Betty sub5
Concat有用的快捷方式是Series和DataFrame上的append實(shí)例方法。這些方法實(shí)際上早于concat。它們沿著軸= 0連接,即索引-
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) print(one.append(two))
運(yùn)行結(jié)果如下:
Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5
該附加功能可以采取多個(gè)對(duì)象,以及-
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) print(one.append([two,one,two]))
運(yùn)行結(jié)果如下:
Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5
Pandas 提供了一個(gè)強(qiáng)大的工具來(lái)處理時(shí)間序列數(shù)據(jù),特別是在金融領(lǐng)域。在處理時(shí)間序列數(shù)據(jù)時(shí),我們經(jīng)常遇到以下情況:
產(chǎn)生時(shí)間順序 將時(shí)間序列轉(zhuǎn)換為不同的頻率
提供了一套相對(duì)緊湊且獨(dú)立的工具來(lái)執(zhí)行上述任務(wù)。
datetime.now()為您提供當(dāng)前日期和時(shí)間。
import pandas as pd print(pd.datetime.now())
運(yùn)行結(jié)果如下:
2017-05-11 06:10:13.393147
時(shí)間戳數(shù)據(jù)是將值與時(shí)間點(diǎn)相關(guān)聯(lián)的時(shí)間序列數(shù)據(jù)的最基本類(lèi)型。對(duì)于熊貓對(duì)象,這意味著使用時(shí)間點(diǎn)。讓我們舉個(gè)實(shí)例-
import pandas as pd print(pd.Timestamp('2017-03-01'))
運(yùn)行結(jié)果如下:
2017-03-01 00:00:00
也可以轉(zhuǎn)換整數(shù)或浮點(diǎn)時(shí)間。這些的默認(rèn)單位是納秒(因?yàn)檫@是時(shí)間戳的存儲(chǔ)方式)。但是,通常將紀(jì)元存儲(chǔ)在可以指定的另一個(gè)單元中。再舉一個(gè)實(shí)例
import pandas as pd print(pd.Timestamp(1587687255,unit='s'))
運(yùn)行結(jié)果如下:
2020-04-24 00:14:15
import pandas as pd print(pd.date_range("11:00", "13:30", freq="30min").time)
運(yùn)行結(jié)果如下:
[datetime.time(11, 0) datetime.time(11, 30) datetime.time(12, 0) datetime.time(12, 30) datetime.time(13, 0) datetime.time(13, 30)]
import pandas as pd print(pd.date_range("11:00", "13:30", freq="H").time)
運(yùn)行結(jié)果如下:
[datetime.time(11, 0) datetime.time(12, 0) datetime.time(13, 0)]
若要將類(lèi)似日期的對(duì)象的系列或類(lèi)似列表的對(duì)象(例如字符串,歷元或混合)轉(zhuǎn)換,可以使用to_datetime函數(shù)。傳遞時(shí),將返回一個(gè)Series(具有相同的索引),而類(lèi)似列表的列表將轉(zhuǎn)換為DatetimeIndex??聪旅娴膶?shí)例-
import pandas as pd print(pd.to_datetime(pd.Series(['Jul 31, 2009','2010-01-10', None])))
運(yùn)行結(jié)果如下:
0 2009-07-31 1 2010-01-10 2 NaT dtype: datetime64[ns]
NaT表示不是時(shí)間(相當(dāng)于NaN)
讓我們?cè)倥e一個(gè)實(shí)例。
import pandas as pd print(pd.to_datetime(['2005/11/23', '2010.12.31', None]))
運(yùn)行結(jié)果如下:
DatetimeIndex(['2005-11-23', '2010-12-31', 'NaT'], dtype='datetime64[ns]', freq=None)