Pandas - Join vs Merge - Data Science Parichay (2022)

In this tutorial, we’ll look at the difference between pandas join() and merge() functions and when exactly should you use them. If you’re looking for a refresher on the different types of joins, you can refer to Understanding Joins in Pandas

Difference between pandas join and merge

Both the functions are used to perform joins on pandas dataframes but they’re used in different scenarios. The join() function is generally used to join dataframes on index whereas the merge() function is a more versatile function that lets you join dataframes on indexes as well as columns.

You can, however, use the join() function to join the left dataframe on a column but the right dataframe must be joined on its index. So, if you want to perform database-style joins on dataframes use the more versatile merge() function.

To join dataframes using the join() function, use the following syntax:

df_merged = df_left.join(df_right, how='left')

Note that, the join() function by default performs a join on index. If you want to join the left dataframe on a column, pass the column name to the on parameter. But remember, the right dataframe must be joined on its index. Also note, that the join() function performs a left join by default.

To join dataframes using the merge() function, use the following syntax:

df_merged = df_left.merge(df_right, on='Col1', how='inner')

Here, the on parameter is used to pass the column to merge the dataframes on and the how parameter is used to determine the kind of join to perform. The merge() function performs an inner join by default.

Examples

Let’s look at some example use-cases to illustrate the difference between the two.

1. Merge dataframes on index

Let’s merge two dataframes on their indexes using join() and merge(). For this, we’ll create two dataframes “df_names” and “df_portfolio”. Both having a common column “Symbol”.

import pandas as pd# stock namesdf_names = pd.DataFrame({ 'Name': ['Apple', 'Mircosoft', 'Tesla', 'Google', 'Netflix'], 'Symbol': ['APPL', 'MSFT', 'TSLA', 'GOOG', 'NFLX']})# sample porfoliodf_portfolio = pd.DataFrame({ 'Symbol': ['TSLA', 'APPL', 'GOOG', 'AMZN'], 'Shares': [20, 50, 50, 100]})

The two dataframes look as follows:

Pandas - Join vs Merge - Data Science Parichay (1)

Now, let’s set the “Symbol” column as the index for both the dataframes.

df_names.set_index('Symbol', inplace=True)df_portfolio.set_index('Symbol', inplace=True)

This is how the dataframes look after setting the index:

Pandas - Join vs Merge - Data Science Parichay (2)

Left Join using pandas join()

Now, let’s perform a left join on the two dataframes by merging them on their indexes using the join() function.

df_merged = df_names.join(df_portfolio, how='left')print(df_merged)

Output:

 Name SharesSymbol APPL Apple 50.0MSFT Mircosoft NaNTSLA Tesla 20.0GOOG Google 50.0NFLX Netflix NaN

You can see the resulting dataframe with NaNs where a match was not found in the right dataframe, df_portfolio.

Left Join using pandas merge()

Let’s perform the same left join on index as above but this time using the merge() function.

df_merged = df_names.merge(df_portfolio, left_index=True, right_index=True, how='left')print(df_merged)

Output:

 Name SharesSymbol APPL Apple 50.0MSFT Mircosoft NaNTSLA Tesla 20.0GOOG Google 50.0NFLX Netflix NaN

You can see that we get the same result as we got with the join() function. Note that, the merge() function required passing values for parameters left_index and right_index to specify that we wanted to perform a merge on the indexes.

2. Merge the left dataframe on column and right on index

Now, let’s merge the left dataframe on the column “Symbol” and the right dataframe on its index. First, we’ll recreate the two dataframes and then set the “Symbol” column as the index only for the right dataframe.

import pandas as pd# stock namesdf_names = pd.DataFrame({ 'Name': ['Apple', 'Mircosoft', 'Tesla', 'Google', 'Netflix'], 'Symbol': ['APPL', 'MSFT', 'TSLA', 'GOOG', 'NFLX']})# sample porfoliodf_portfolio = pd.DataFrame({ 'Symbol': ['TSLA', 'APPL', 'GOOG', 'AMZN'], 'Shares': [20, 50, 50, 100]})# set Symbol as axis for df_namesdf_portfolio.set_index('Symbol', inplace=True)

This is how the two dataframes look:

Pandas - Join vs Merge - Data Science Parichay (3)

Left Join using join()

df_merged = df_names.join(df_portfolio, on='Symbol', how='left')print(df_merged)

Output:

 Name Symbol Shares0 Apple APPL 50.01 Mircosoft MSFT NaN2 Tesla TSLA 20.03 Google GOOG 50.04 Netflix NFLX NaN

You can see that we were able to merge the left dataframe on the column “Symbol” and the right datafrome on its index using the pandas join() function.

Left Join using merge()

df_merged = df_names.merge(df_portfolio, left_on='Symbol', right_index=True, how='left')print(df_merged)

Output:

 Name Symbol Shares0 Apple APPL 50.01 Mircosoft MSFT NaN2 Tesla TSLA 20.03 Google GOOG 50.04 Netflix NFLX NaN

The pandas merge() function was able to merge the left dataframe on the column “Symbol” and the right one on its index. Note that, we had to pass right_index=True to indicate that the right dataframe should be merged on its index.

So far so good.

3. Merge the left dataframe on index and right on column

Now, let’s merge the left dataframe on its index and the right dataframe on the column “Symbol”. First, we’ll recreate the two dataframes and then set the “Symbol” column as the index only for the left dataframe.

import pandas as pd# stock namesdf_names = pd.DataFrame({ 'Name': ['Apple', 'Mircosoft', 'Tesla', 'Google', 'Netflix'], 'Symbol': ['APPL', 'MSFT', 'TSLA', 'GOOG', 'NFLX']})# sample porfoliodf_portfolio = pd.DataFrame({ 'Symbol': ['TSLA', 'APPL', 'GOOG', 'AMZN'], 'Shares': [20, 50, 50, 100]})# set Symbol as axis for df_namesdf_names.set_index('Symbol', inplace=True)

This is how the two dataframes look:

Pandas - Join vs Merge - Data Science Parichay (4)

Left Join using join()

If you try to join the left dataframe on index and the right dataframe on a column using join(), it’ll result in an error.

df_merged = df_names.join(df_portfolio, on='Symbol', how='left')print(df_merged)

Output:

---------------------------------------------------------------------------ValueError Traceback (most recent call last) in ----> 1 df_merged = df_names.join(df_portfolio, on='Symbol', how='left') 2 print(df_merged)** lines skipped to keep the error message short **ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

This happened because the join() function requires the right dataframe to be merged on its index.

Left Join using merge()

The merge() function allows you the flexibility to merge the left dataframe on index and the right one on a column.

df_merged = df_names.merge(df_portfolio, left_index=True, right_on='Symbol', how='left')print(df_merged)

Output:

 Name Symbol Shares1.0 Apple APPL 50.0NaN Mircosoft MSFT NaN0.0 Tesla TSLA 20.02.0 Google GOOG 50.0NaN Netflix NFLX NaN

Note that, we had to pass left_index=True to indicate that the left dataframe should be merged on its index.

4. Merge dataframes on column

Now, let’s merge the two dataframes directly on the common column “Symbol”.

import pandas as pd# stock namesdf_names = pd.DataFrame({ 'Name': ['Apple', 'Mircosoft', 'Tesla', 'Google', 'Netflix'], 'Symbol': ['APPL', 'MSFT', 'TSLA', 'GOOG', 'NFLX']})# sample porfoliodf_portfolio = pd.DataFrame({ 'Symbol': ['TSLA', 'APPL', 'GOOG', 'AMZN'], 'Shares': [20, 50, 50, 100]})

This is how the dataframes look:

Pandas - Join vs Merge - Data Science Parichay (5)

Left join using join()

The pandas join() function results in an error if you try to merge the two dataframes on a common column.

df_merged = df_names.join(df_portfolio, on='Symbol', how='left')print(df_merged)

Output:

---------------------------------------------------------------------------ValueError Traceback (most recent call last) in ----> 1 df_merged = df_names.join(df_portfolio, on='Symbol', how='left') 2 print(df_merged)** lines skipped to keep the error message short **ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

Left Join using pandas merge

If the column to merge on has the same name in the two dataframes, pass it to the on parameter. If they have different names, pass the respective names to left_on and right_on parameters.

df_merged = df_names.merge(df_portfolio, on='Symbol', how='left')print(df_merged)

Output:

 Name Symbol Shares0 Apple APPL 50.01 Mircosoft MSFT NaN2 Tesla TSLA 20.03 Google GOOG 50.04 Netflix NFLX NaN

Conclusion

The pandas merge() and join() functions are used to join dataframes but have different use-cases. If you want to perform database-style joins on dataframes prefer the merge() function as it gives you more flexibility on how to perform the merge. If you, instead, want to join dataframes specifically on their indexes use the join() function.

The following table compares the two functions on the different combinations of joining dataframe on column or index.

DataFrame Mergejoin()merge()
On index✔️✔️
Left on column, right on index✔️✔️
Left on index, right on column✔️
On column✔️

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in aJupyter Notebookwith a python (version 3.8.3) kernel having pandas version 1.0.5


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.

Tutorials on combining data in pandas –

  • Pandas – Join vs Merge
  • Pandas – Merge DataFrames on Multiple Columns
  • Understanding Joins in Pandas
  • Append Rows to a Pandas DataFrame
  • Concat DataFrames in Pandas

References

  1. Pandas merge() function documentation.
  2. Pandas join() function documentation.

You might also like

Latest Posts

Article information

Author: Fr. Dewey Fisher

Last Updated: 09/25/2022

Views: 6602

Rating: 4.1 / 5 (62 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Fr. Dewey Fisher

Birthday: 1993-03-26

Address: 917 Hyun Views, Rogahnmouth, KY 91013-8827

Phone: +5938540192553

Job: Administration Developer

Hobby: Embroidery, Horseback riding, Juggling, Urban exploration, Skiing, Cycling, Handball

Introduction: My name is Fr. Dewey Fisher, I am a powerful, open, faithful, combative, spotless, faithful, fair person who loves writing and wants to share my knowledge and understanding with you.