Python Pandas: rewriting pd.read_table () with the original comments

2

I have a separate tab file in which the first few lines are comments designated with '#'. I use the following code to pull the file without the comments ... The file looks something like:

#comentario
#comentario
#comentario
#comentario
#comentario
Header1 Header2 Header3
a b c
d e f
g h i

And then I use the code below to load it without the comments ...

import pandas as pd
file_in = pd.read_table('arquivo.tsv', comment='#')

In this way:

Header1 Header2 Header3
a b c
d e f
g h i

After that I make some changes in the Header1 column based on the information from another file, and rewrite the file file_in :

file_in.to_csv('arquivo.csv', index=False, sep='\t')

The point here is that I would like the comments to come back as in the original, but the saved file starts with the Header and no longer with the comments!

    
asked by anonymous 03.03.2017 / 14:36

1 answer

3

The problem is that comments are simply being ignored when reading. Pandas does not represent comments internally because this is something specific to that storage format (ie CSV, if you save the table to a SQL database, for example, there are no "comments"). So the most you can do is to ask the read function to ignore the lines with the comment character.

If you want to keep the comments, I suggest reading them along with the table (in a separate code snippet), storing it in a list, and then writing them down before doing the table recording.

Here is an example code:

import pandas as pd

commentChar = '#'

# Primeiro, lê os comentários do arquivo original
comments = []
with open('arquivo.tsv', 'r') as f:
    for line in f.readlines():
        if line[0] == commentChar:
            comments.append(line)

# Agora, lê a tabela ignorando os comentários
file_in = pd.read_table('arquivo.tsv', comment=commentChar)

# Abre o arquivo de destino para escrita, grava os comentários antes
# e só depois grava a tabela (note que ao invés de receber o nome do arquivo,
# a chamada de to_csv recebe o handler do arquivo aberto, já posicionado onde
# deve começar a gravação).

with open('arquivo.csv', 'w') as f:
    for comment in comments:
        f.write(comment)

    file_in.to_csv(f, index=False, sep='\t')
    
03.03.2017 / 16:02