In pandas, you can merge (or join) two DataFrames on multiple columns using the merge()
function and specifying multiple column names in the on
parameter.
Here’s an example of how to perform a merge (join) of two DataFrames based on multiple columns:
Example: Merge on Multiple Columns
import pandas as pd
# Sample DataFrame 1
df1 = pd.DataFrame({
'col1': ['A', 'B', 'C', 'D'],
'col2': [1, 2, 3, 4],
'value': [100, 200, 300, 400]
})
# Sample DataFrame 2
df2 = pd.DataFrame({
'col1': ['A', 'B', 'C', 'E'],
'col2': [1, 2, 3, 5],
'amount': [10, 20, 30, 50]
})
# Merging the DataFrames on multiple columns ('col1' and 'col2')
merged_df = pd.merge(df1, df2, on=['col1', 'col2'])
print(merged_df)
Explanation:
df1
anddf2
are two DataFrames.- The
merge()
function is used to join them based on the columnscol1
andcol2
. These columns are specified as a list['col1', 'col2']
in theon
parameter. - This merge will combine rows where both
col1
andcol2
values match in both DataFrames.
Result:
col1 col2 value amount
0 A 1 100 10
1 B 2 200 20
2 C 3 300 30
Optional Parameters:
how
: Specifies the type of join (like SQL joins). It can be one of:'inner'
(default): Only keeps rows where there is a match in both DataFrames.'outer'
: Keeps all rows from both DataFrames, filling missing values withNaN
.'left'
: Keeps all rows from the left DataFrame, and fills missing values from the right DataFrame withNaN
.'right'
: Keeps all rows from the right DataFrame, and fills missing values from the left DataFrame withNaN
.
Example of an outer join:
merged_df = pd.merge(df1, df2, on=['col1', 'col2'], how='outer')
left_on
andright_on
: If the column names in the two DataFrames are different, you can specify which columns to join on in each DataFrame.Example:
merged_df = pd.merge(df1, df2, left_on=['col1', 'col2'], right_on=['column_a', 'column_b'])
Conclusion:
The merge()
function in pandas is a powerful tool to join two DataFrames on multiple columns, and you can adjust the type of join based on your needs.