Pandas 連接的操作實例
Pandas具有與SQL等關(guān)系數(shù)據(jù)庫非常相似的功能齊全的高性能內(nèi)存中連接操作。
Pandas提供單個功能merge作為DataFrame對象之間所有標(biāo)準(zhǔn)數(shù)據(jù)庫聯(lián)接操作的入口點
pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True)
在這里,我們使用了以下參數(shù):
left ? 一個DataFrame對象。 right ? 另一個DataFrame對象。 on ? 列(名)加入上。必須在左右DataFrame對象中都找到。 left_on ? 左側(cè)DataFrame中的列用作鍵??梢允橇忻?,也可以是長度等于DataFrame長度的數(shù)組。 right_on ? 右側(cè)DataFrame中的列用作鍵??梢允橇忻?,也可以是長度等于DataFrame長度的數(shù)組。 left_index ? 如果為True,則使用左側(cè)DataFrame的索引(行標(biāo)簽)作為其連接鍵。如果DataFrame具有MultiIndex(分層),則級別數(shù)必須與右側(cè)DataFrame中的連接鍵數(shù)匹配。 right_index ? 相同的使用作為left_index為正確的數(shù)據(jù)幀。 how ? “左”,“右”,“外”,“內(nèi)”之一。默認為內(nèi)部。每種方法已在下面描述。 sort ? 排序的結(jié)果數(shù)據(jù)框中加入字典順序按鍵。默認情況下為True,在許多情況下,設(shè)置為False將大大提高性能。
現(xiàn)在讓我們創(chuàng)建兩個不同的DataFrame并對其執(zhí)行合并操作。
# import the pandas library import pandas as pd left = pd.DataFrame({ 'id':[1,2,3,4,5], 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5']}) right = pd.DataFrame( {'id':[1,2,3,4,5], 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5']})) print(left print(right)
運行結(jié)果如下:
Name id subject_id 0 Alex 1 sub1 1 Amy 2 sub2 2 Allen 3 sub4 3 Alice 4 sub6 4 Ayoung 5 sub5 Name id subject_id 0 Billy 1 sub2 1 Brian 2 sub4 2 Bran 3 sub3 3 Bryce 4 sub6 4 Betty 5 sub5
import pandas as pd left = pd.DataFrame({ 'id':[1,2,3,4,5], 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5']}) right = pd.DataFrame({ 'id':[1,2,3,4,5], 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5']}) print(pd.merge(left,right,on='id'))
運行結(jié)果如下:
Name_x id subject_id_x Name_y subject_id_y 0 Alex 1 sub1 Billy sub2 1 Amy 2 sub2 Brian sub4 2 Allen 3 sub4 Bran sub3 3 Alice 4 sub6 Bryce sub6 4 Ayoung 5 sub5 Betty sub5
import pandas as pd left = pd.DataFrame({ 'id':[1,2,3,4,5], 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5']}) right = pd.DataFrame({ 'id':[1,2,3,4,5], 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5']}) print(pd.merge(left,right,on=['id','subject_id']))
運行結(jié)果如下:
Name_x id subject_id Name_y 0 Alice 4 sub6 Bryce 1 Ayoung 5 sub5 Betty
合并的how參數(shù)指定如何確定要在結(jié)果表中包括哪些鍵。如果左側(cè)或右側(cè)表中均未出現(xiàn)組合鍵,則聯(lián)接表中的值為NA。
這里的一個總結(jié)如何選擇和他們的SQL等價的名字:
合并方法 | SQL等效 | 描述 |
left | LEFT OUTER JOIN | 使用左側(cè)對象的key |
right | RIGHT OUTER JOIN | 使用正確對象的key |
outer | FULL OUTER JOIN | 使用聯(lián)合key |
inner | INNER JOIN | 使用key的交集 |
import pandas as pd left = pd.DataFrame({ 'id':[1,2,3,4,5], 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5']}) right = pd.DataFrame({ 'id':[1,2,3,4,5], 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5']}) print(pd.merge(left, right, on='subject_id', how='left'))
運行結(jié)果如下:
Name_x id_x subject_id Name_y id_y 0 Alex 1 sub1 NaN NaN 1 Amy 2 sub2 Billy 1.0 2 Allen 3 sub4 Brian 2.0 3 Alice 4 sub6 Bryce 4.0 4 Ayoung 5 sub5 Betty 5.0
import pandas as pd left = pd.DataFrame({ 'id':[1,2,3,4,5], 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5']}) right = pd.DataFrame({ 'id':[1,2,3,4,5], 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5']}) print(pd.merge(left, right, on='subject_id', how='right'))
運行結(jié)果如下:
Name_x id_x subject_id Name_y id_y 0 Amy 2.0 sub2 Billy 1 1 Allen 3.0 sub4 Brian 2 2 Alice 4.0 sub6 Bryce 4 3 Ayoung 5.0 sub5 Betty 5 4 NaN NaN sub3 Bran 3
import pandas as pd left = pd.DataFrame({ 'id':[1,2,3,4,5], 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5']}) right = pd.DataFrame({ 'id':[1,2,3,4,5], 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5']}) print(pd.merge(left, right, how='outer', on='subject_id'))
運行結(jié)果如下:
Name_x id_x subject_id Name_y id_y 0 Alex 1.0 sub1 NaN NaN 1 Amy 2.0 sub2 Billy 1.0 2 Allen 3.0 sub4 Brian 2.0 3 Alice 4.0 sub6 Bryce 4.0 4 Ayoung 5.0 sub5 Betty 5.0 5 NaN NaN sub3 Bran 3.0
連接將在索引上執(zhí)行。聯(lián)接操作接受調(diào)用它的對象。因此,a.join(b)不等于b.join(a)。
import pandas as pd left = pd.DataFrame({ 'id':[1,2,3,4,5], 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5']}) right = pd.DataFrame({ 'id':[1,2,3,4,5], 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5']}) print(pd.merge(left, right, on='subject_id', how='inner'))
運行結(jié)果如下:
Name_x id_x subject_id Name_y id_y 0 Amy 2 sub2 Billy 1 1 Allen 3 sub4 Brian 2 2 Alice 4 sub6 Bryce 4 3 Ayoung 5 sub5 Betty 5