Script to remove BOM signature from UTF-8 file

3

I have several problems with UTF-8 files with BOM, several tokens are being generated at the beginning of the pages, this causes several problems in json file reads and de-emphasis of HTML components. Almost impossible to discover because the tokens are invisible. I looked in google for a way to change all files to UTF-8 without BOM and found a perl script to remove the BOM signature but it did not work. Someone could help. I need a script that will change all project files.

More information on the problem and the script can be found here

My solution for now is to get my files cleaned and saved in UTF-8 without BOM, but there are several files, so I thought of a script, but I have no idea how to do it.

The momentary solution to the json token problems I did so to solve (POG):

1.Retrieve the string from the first key found. For tokens are generated before this key. That solves momentarily. But it's a gambiarra.

json = json.substring(json.indexOf("{"),json.length);
objeto = $.parseJSON(json);
    
asked by anonymous 26.05.2015 / 17:32

2 answers

3

A file UTF-8 with BOM is simply a file in UTF-8 encoding where the first 3 bytes are EF BB BF . Identifying the BOM is therefore a matter of reading the first 3 bytes and see if they match that format. And to eliminate BOM, just copy the rest of the file to output, not including those 3 bytes.

An example in Python (3), very simplified ( Disclaimer: did not test!

import os, sys

def tem_bom(arq):
    with open(arq, mode="rb") as f:
        bom = f.read(3)
        resto = f.read()
        if bom == b"\xef\xbb\xbf":
            return True, resto
        else:
            return False, bom + resto

def copiar_pasta(origem, destino, copiar_sempre=True):
    for nome in os.listdir(origem):
        path1 = os.path.join(origem, nome)
        path2 = os.path.join(destino, nome)

        if os.path.isfile(path1):
             bom, resto = tem_bom(path1)
             if bom or copiar_sempre:
                 with open(path2, "wb") as f:
                     f.write(resto)
                 if bom:
                     print("Corrigido arquivo {}".format(path1))

        elif os.path.isdir(path1):
            os.mkdir(path2)
            copiar_pasta(path1, path2, copiar_sempre)

if __name__ == "__main__":
    copiar_pasta(sys.argv[1], sys.argv[2])

This example would take a source folder and copy all files to a destination folder, recursively. Each file that had BOM, it would copy without the BOM. I did so (without changing anything in the original folder) so as not to risk overwriting anything important, just make sure the destination folder is a new, empty folder. Adjust if necessary.

    
26.05.2015 / 21:00
0
#!/usr/bin/perl -pi
s/^(\xEF\xBB\xBF)//;  ## remove BOM !

This version changes the files themselves. Example usage:

rmbom *.js or

perl rmbom file1 file2 *.js dir/*

    
20.07.2015 / 16:37