Okay, after the real chat in the comments, and you sent me the csv's, "I solved your problem" (do not repeat it, ahahaha), come on. The problem is that you need to compare each row of dataframes, and you're comparing the entire columns, that can not; even because the dataframes have different sizes (so the error), I made tests with the csv's that you sent me, no line of the smaller csv is totally identical to the relative line (same index) to the larger csv, see the code: p>
Click here to view the code below.
import pandas as pd
df1 = pd.read_csv("JonnyTheBoy10.csv", usecols=['ART_TIPO', 'ART_DESIG', 'PORTA', 'CP4', 'CP3', 'LOCALIDADE'])
df2 = pd.read_csv("JonnyTheBoyFull.csv", usecols=['ART_TIPO', 'ART_DESIG', 'PORTA', 'CP4', 'CP3', 'LOCALIDADE'])
all_equals=[]
cp3_equal=[]
cp4_equal=[]
for index, row in df1.iterrows():
if str(row.CP4)==str(df2.CP4[index]) and str(row.CP3)==str(df2.CP3[index]):
all_equals.append(row)
if str(row.CP3)==str(df2.CP3[index]):
cp3_equal.append(row)
if str(row.CP4)==str(df2.CP4[index]):
cp4_equal.append(row)
print ('Igualdades em ambos: ', len(all_equals))
print ('Igualdades em CP3: ', len(cp3_equal))
print ('Igualdades em CP4: ', len(cp4_equal))
Igualdades em ambos: 0
Igualdades em CP3: 9
Igualdades em CP4: 0
I'll leave the initial explanation (below) because it can serve other people in other contexts.
See this example:
df1 = pd.DataFrame([['A', 'B'], ['C', 'D']])
df2 = pd.DataFrame([['C', 'D'], ['A', 'B']])
df1==df2
0 1
0 False False
1 False False
Now, the same example, but with the 'indexed' df2.
df1 = pd.DataFrame([['A', 'B'], ['C', 'D']])
df2 = pd.DataFrame([['C', 'D'], ['A', 'B']], index=[1,0])
df1==df2
....
raise ValueError('Can only compare identically-labeled '
...
See that an exception was raised with the same error as you report, I deleted the entire msg to get easier.
Solution 1: Drop the indices:
df1.reset_index(drop=True) == df2.reset_index(drop=True)
0 1
0 False False
1 False False
Solution 2: Do sort in axis = 0:
df1.sort_index()==df2.sort_index()
0 1
0 True True
1 True True
Note that == is 'sensitive' to the order of the columns.
Applying to the example of your question.
In case of your question, just try to change if
to:
if df1.reset_index(drop=True)["CP4"] == df2.reset_index(drop=True)["CP4"] and
df1.reset_index(drop=True)["CP3"] == df2.reset_index(drop=True)["CP3"]