How to open a unicode file inside a zip?

5

I've tried

with zipfile.ZipFile("5.csv.zip", "r") as zfile:
    for name in zfile.namelist():
        with zfile.open(name, 'rU') as readFile:
                line = readFile.readline()
                print(line)
                split = line.split('\t')

but the result is:

b'$0.0\t1822\t1\t1\t1\n'
Traceback (most recent call last)
File "zip.py", line 6
    split = line.split('\t')
TypeError: Type str doesn't support the buffer API

How do I open this file as unicode instead of binary?

    
asked by anonymous 16.12.2013 / 01:45

2 answers

5
If you know the correct encoding of the file, just use the decode function in the file's contents ( string if it's Python 2, bytes or bytearray if it's Python 3):

with zfile.open(name, 'rU') as readFile:
    conteudo = readFile.read().decode(codificacao)

As mentioned in a answer to your same question in the English OS, try breaking the content on lines before decoding is problematic, since different encodings represent line breaks differently. However, once you have read and decoded all the content of the file (through read ), you can break it into lines normally since it will be represented as a unicode string ( unicode if it is Python 2, string if it is Python 3):

line = conteudo.split('\n')[0]

Or by means of a regular expression (to support \n , \r or \r\n ):

line = re.split('\r?\n|\r', conteudo)[0]
    
16.12.2013 / 02:59
1

The gringos response in the OS was

The reason you're seeing this error is because you're trying to mix bytes with unicode. The argument to split must also be byte-string:

>>> line = b'$0.0\t1822\t1\t1\t1\n'
>>> line.split(b'\t')
[b'$0.0', b'1822', b'1', b'1', b'1\n']

To get a string unicode string, use decode :

>>> line.decode('utf-8')
'$0.0\t1822\t1\t1\t1\n'

If you are iterating over the file you can use codecs.iterdecode , but that will not work with 'readline ()'.

with zfile.open(name, 'rU') as readFile:
    for line in codecs.iterdecode(readFile, 'utf8'):
        print line
        # etc
    
16.12.2013 / 02:49