UnicodeDecodeError: 'utf-8'

5

I'm having UnicodeDecodeError: 'utf-8' problems in a python file and I'm not able to solve it. This is the error:

Traceback (most recent call last):
  File "file.py", line 448, in <module>
    fileOriginal.sliceFile(url) #Separa os arquivos para evitar MemoryError
  File "file.py", line 188, in sliceFile
    line = fileOriginal.readline()

  File "C:\Python34\lib\codecs.py", line 313, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid
continuation byte

It occurs when reading a txt file. The file is encoded with UTF-8 without BOM. And I do not understand why you give this error. The error occurs in the following line: "line = fileOriginal.readline ()", according to the following code:

Code:

for(path, dirs, files) in os.walk(url): 
        contDec = 0 #Conta as declarações  
        contTempFiles = 0 #Conta os arquivo temporários                                            

        for file in files:                                                
            fileOriginal = open(os.path.join(url,file),encoding = "utf8")                                             

            endFile   = False
            contLines = 0
            contDec = 0
            cont    = 0
            line = ''
            while not 'ZZZZZ|' in line:                                     
                if cont == 0:
                    contTempFiles += 1                        

                    tempFile = open(os.path.join('separados',str(contTempFiles)+'_'+str(self.getFileName(file))+'.txt'),'w', encoding='utf-8')                                                
                line = fileOriginal.readline()#Erro nessa linha                                                
                if line[0:5] == '99999':
                    tempFile.write(line)
                    contDec += 1                                                                        
                if contDec <= 200000:                                                
                    tempFile.write(line)                        
                    cont += 1
                else:
                    contDec = 0
                    cont = 0
                    tempFile.close()                             
            fileOriginal.close()

Python version: 3.4.0 Can anyone help me with this? Thanks!

    
asked by anonymous 23.05.2016 / 22:37

2 answers

1

Instead of the line of code:

open(os.path.join(url,file), encoding = "utf8")

Try to put the following:

path = os.path.join(url,file).decode("utf8")
open(path, encoding = "utf-8")

Do not forget to also put it at the beginning of the code:

# -*- coding: utf-8 -*-
    
24.05.2016 / 11:15
1

I managed to solve it. Taking the hook from @Rui Lima's answer. Replace line:

fileOriginal = open(os.path.join(url,file), encoding = "utf8")

by:

fileOriginal = open(url+file,encoding = "utf-8")

I do not know why joining made the mistake. Thank you all for your help! Strong Embrace!

    
24.05.2016 / 15:13