Friday, January 17, 2025
HomeTechPandas: merge (join) two data frames on multiple columns

Pandas: merge (join) two data frames on multiple columns

In pandas, you can merge (or join) two DataFrames on multiple columns using the merge() function and specifying multiple column names in the on parameter.

Here’s an example of how to perform a merge (join) of two DataFrames based on multiple columns:

Example: Merge on Multiple Columns

import pandas as pd

# Sample DataFrame 1
df1 = pd.DataFrame({
    'col1': ['A', 'B', 'C', 'D'],
    'col2': [1, 2, 3, 4],
    'value': [100, 200, 300, 400]
})

# Sample DataFrame 2
df2 = pd.DataFrame({
    'col1': ['A', 'B', 'C', 'E'],
    'col2': [1, 2, 3, 5],
    'amount': [10, 20, 30, 50]
})

# Merging the DataFrames on multiple columns ('col1' and 'col2')
merged_df = pd.merge(df1, df2, on=['col1', 'col2'])

print(merged_df)

Explanation:

  • df1 and df2 are two DataFrames.
  • The merge() function is used to join them based on the columns col1 and col2. These columns are specified as a list ['col1', 'col2'] in the on parameter.
  • This merge will combine rows where both col1 and col2 values match in both DataFrames.
See also  What is Python List Slicing

Result:

  col1  col2  value  amount
0    A     1    100      10
1    B     2    200      20
2    C     3    300      30

Optional Parameters:

  1. how: Specifies the type of join (like SQL joins). It can be one of:
    • 'inner' (default): Only keeps rows where there is a match in both DataFrames.
    • 'outer': Keeps all rows from both DataFrames, filling missing values with NaN.
    • 'left': Keeps all rows from the left DataFrame, and fills missing values from the right DataFrame with NaN.
    • 'right': Keeps all rows from the right DataFrame, and fills missing values from the left DataFrame with NaN.

    Example of an outer join:

    merged_df = pd.merge(df1, df2, on=['col1', 'col2'], how='outer')
    
  2. left_on and right_on: If the column names in the two DataFrames are different, you can specify which columns to join on in each DataFrame.

    Example:

    merged_df = pd.merge(df1, df2, left_on=['col1', 'col2'], right_on=['column_a', 'column_b'])
    

Conclusion:

The merge() function in pandas is a powerful tool to join two DataFrames on multiple columns, and you can adjust the type of join based on your needs.

RELATED ARTICLES
0 0 votes
Article Rating

Leave a Reply

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
- Advertisment -

Most Popular

Recent Comments

0
Would love your thoughts, please comment.x
()
x