Find equal values between 2 different .csv columns and update another column X

1

Hello,

I have two .csv files as below:

.csv1

POS,ID
28000167,.
28000263,.
28000484,.
28000711,.
28000728,.
28000885,.
28089922,.
28089927,.
28090173,.
28090325,.
28090326,.
28090331,.
28090415,.
28090467,.
28096247,.
28096264,.
28096284,.

.csv2

POS,ID
28000167,rs75043266
28000263,rs151210258
28000484,rs7175266
28000627,.
28000711,.
28000728,rs140405700
28000885,.
28089732,.
28089847,.
28089876,.
28089898,.
28089922,rs12592271
28089927,rs113937352
28090008,.
28090173,rs12592307
28090325,rs187389297
28090326,rs74005248
28090331,rs113905655

I would like to update the values of the row [1] of the .csv1 file with the values of row [1] of the .csv2 files if row [0] of .csv1 is present in row [0] of .csv2. / p>

In this case my .csv1 file would become as:

.csv1

POS,ID
28000167,rs75043266

and so on for all the other iterations he finds ...

What I have so far of code is not much since I could not get any iteration ...

import csv

csv1 = open("arquivo1.csv")
reader1 = csv.reader(csv1, dialect='excel-tab')
csv2 = open("arquivo2.csv")
reader2 = csv.reader(csv2, dialect='excel-tab')

next(reader1, None)
for row1 in reader1:
    next(reader2, None)
    for row2 in reader2:

Any help would be welcome! Thank you.

    
asked by anonymous 02.03.2017 / 02:57

1 answer

5

Do as follows:

  • Read data from both files in a dictionary. The best way to do this is, in my opinion, using DictReader .
  • Process each created dictionary item from CSV2, and add / update the value on the same "key" (POS column) in CSV1.
  • Regrave the CSV1 file.
  • Code sample:

    import csv
    from collections import OrderedDict
    
    # Lê ambos os arquivos e armazena em um dicionário mapeando a chave (POS) ao conteúdo (ID)
    csv1 = OrderedDict()
    csv2 = OrderedDict()
    
    with open('csv1.csv', 'r') as f:
        reader = csv.DictReader(f)
        for row in reader:
            csv1[row['POS']] = row['ID']
    
    with open('csv2.csv', 'r') as f:
        reader = csv.DictReader(f)
        for row in reader:
            csv2[row['POS']] = row['ID']
    
    # Processa cada item de CSV2 e atualiza o conteúdo em CSV1
    for p, i in csv2.items():
        if p in csv1: # Remova essa verificação se quiser adicionar os itens de CSV2 inexistentes em CSV1
            csv1[p] = i
    
    # Regrava o arquivo CSV1
    with open('csv1.csv', 'w', newline='') as f:
        writer = csv.DictWriter(f, fieldnames=['POS', 'ID'])
        writer.writeheader()
        for p, i in csv1.items():
            writer.writerow({'POS': p, 'ID': i})
    

    If you are using the fantastic library Pandas , a solution that I prefer and that is much more succinct and easy is this:

    import pandas as pd
    
    # Lê ambos os arquivos e armazena em tabelas do Pandas
    csv1 = pd.read_csv('csv1.csv')
    csv2 = pd.read_csv('csv2.csv')
    
    # Filtra os elementos de CSV2 cujo ID seja diferente de '.' e cuja POS exista em CSV1
    query = csv2[(csv2['ID'] != '.') & (csv2['POS'].isin(csv1['POS']))]
    
    # Copia apenas a coluna ID da view filtrada de CSV2 para a tabela CSV1
    csv1['ID'] = query['ID']
    
    # Regrava o arquivo CSV1
    csv1.to_csv('csv1.csv', index=False)
    
        
    02.03.2017 / 03:39