Python No if error

0

I'm having an error in if and I do not know how to fix this error, I'm using Python 3.6 and Pandas for reading, writing and data analysis.

df1 = pd.read_csv("JonnyTheBoy10.csv", usecols=['ART_TIPO', 'ART_DESIG', 'PORTA', 'CP4', 'CP3', 'LOCALIDADE'])
df2 = pd.read_csv("JonnyTheBoyFull.csv", usecols=['ART_TIPO', 'ART_DESIG', 'PORTA', 'CP4', 'CP3', 'LOCALIDADE'])

    if df1["CP4"] == df2["CP4"] and df1["CP3"] == df2["CP3"]

I have this error:

Traceback (most recent call last):
  File "C:/Users/User01/Desktop/Normmm/Norm.py", line 11, in <module>
    if df1["CP4"] == df2["CP4"] and df1["CP3"] == df2["CP3"]:
  File "C:\anaconda\lib\site-packages\pandas\core\ops.py", line 818, in wrapper
    raise ValueError(msg)
ValueError: Can only compare identically-labeled Series objects
    
asked by anonymous 09.06.2017 / 11:05

1 answer

0
  

Okay, after the real chat in the comments, and you sent me the csv's, "I solved your problem" (do not repeat it, ahahaha), come on. The problem is that you need to compare each row of dataframes, and you're comparing the entire columns, that can not; even because the dataframes have different sizes (so the error), I made tests with the csv's that you sent me, no line of the smaller csv is totally identical to the relative line (same index) to the larger csv, see the code: p>

Click here to view the code below.

import pandas as pd
df1 = pd.read_csv("JonnyTheBoy10.csv", usecols=['ART_TIPO', 'ART_DESIG', 'PORTA', 'CP4', 'CP3', 'LOCALIDADE'])
df2 = pd.read_csv("JonnyTheBoyFull.csv", usecols=['ART_TIPO', 'ART_DESIG', 'PORTA', 'CP4', 'CP3', 'LOCALIDADE'])

all_equals=[]
cp3_equal=[]
cp4_equal=[]

for index, row in df1.iterrows():
    if str(row.CP4)==str(df2.CP4[index]) and str(row.CP3)==str(df2.CP3[index]):
        all_equals.append(row)

    if  str(row.CP3)==str(df2.CP3[index]):
        cp3_equal.append(row)

    if str(row.CP4)==str(df2.CP4[index]):
        cp4_equal.append(row)  


print ('Igualdades em ambos: ', len(all_equals))
print ('Igualdades em CP3: ', len(cp3_equal))
print ('Igualdades em CP4: ', len(cp4_equal))

Igualdades em ambos:  0
Igualdades em CP3:  9
Igualdades em CP4:  0
  

I'll leave the initial explanation (below) because it can serve other people in other contexts.

See this example:

df1 = pd.DataFrame([['A', 'B'], ['C', 'D']])
df2 = pd.DataFrame([['C', 'D'], ['A', 'B']])
df1==df2
       0      1
0  False  False
1  False  False

Now, the same example, but with the 'indexed' df2.

df1 = pd.DataFrame([['A', 'B'], ['C', 'D']])
df2 = pd.DataFrame([['C', 'D'], ['A', 'B']], index=[1,0])
df1==df2
....
raise ValueError('Can only compare identically-labeled '
...

See that an exception was raised with the same error as you report, I deleted the entire msg to get easier.

  

Solution 1: Drop the indices:

df1.reset_index(drop=True) == df2.reset_index(drop=True)
       0      1
0  False  False
1  False  False
  

Solution 2: Do sort in axis = 0:

df1.sort_index()==df2.sort_index()
      0     1
0  True  True
1  True  True

Note that == is 'sensitive' to the order of the columns.

  

Applying to the example of your question.

In case of your question, just try to change if to:

if df1.reset_index(drop=True)["CP4"] == df2.reset_index(drop=True)["CP4"] and
df1.reset_index(drop=True)["CP3"] == df2.reset_index(drop=True)["CP3"]
    
09.06.2017 / 11:33