How to read files with accents on Android?

0

Hello, I'm having trouble reading files with accents on Android.

I'm using the following method to read:

public String lerAquivo(File arquivo) {
    String texto;
    String linha;
    BufferedReader br;

    try {
        texto = "";
        br = new BufferedReader(new FileReader(arquivo));

        while ((linha = br.readLine()) != null) {
            if (!texto.equals("")) {
                texto += "\n";
            }

            texto += linha;
        }
    } catch (Exception e) {
        texto = "";
    }

    return texto;
}

It's working, but with a problem. For some files that contain accents and special characters, these characters are not read correctly. Does anyone know the solution?

I know it probably has to do with encoding. I searched and saw that I would have to set the file encoding at the time of reading. But if this is really how do I figure out the encoding of the file?

    
asked by anonymous 10.02.2018 / 21:14

1 answer

1

After a long search, I was able to solve the problem.

The problem was file encoding, FileReader defaults to using the default encoding of the operating system, in the case of Android UTF-8. When the file did not come in UTF-8 the accents were lost.

Searching extensively I found a library (which I believe belongs to Mozilla staff) at Juniversalchardet . This library in most cases determines in which encoding the file was saved. I say in most cases, because from what I researched, not always can identify the encoding and even when it identifies, not always it hits. I've tested for a dozen files created in different OSs and different programs, she hit every time , so I'm pretty happy with this "most of the time".

To add the library to the Android project, just import it into the Gradle dependencies:

compile group: 'com.googlecode.juniversalchardet', name: 'juniversalchardet', version: '1.0.3'

Here's a parentage, Juniversalchardet is a Java library, so you can either download the JAR from it or add it from the Maven .

The method that reads the file reads according to the encoding identified by the getEncoding () method:

public String lerAquivo(File arquivo) {
    String texto;
    String linha;
    BufferedReader br;

    try {
        texto = "";
        br = new BufferedReader(new InputStreamReader(new FileInputStream(arquivo), getEncoding(arquivo)));

        while ((linha = br.readLine()) != null) {
            if (!texto.equals("")) {
                texto += "\n";
            }

            texto += linha;
        }
    } catch (Exception e) {
        texto = "";
    }

    return texto;
}

And finally the getEncoding () method, which identifies in which encoding a file was written:

private String getEncoding(File arquivo) {
    UniversalDetector detector;
    String encoding;
    byte[] buf;
    java.io.FileInputStream fis;
    int nread;

    try {
        buf = new byte[4096];
        fis = new java.io.FileInputStream(arquivo);
        detector = new UniversalDetector(null);

        while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
            detector.handleData(buf, 0, nread);
        }

        detector.dataEnd();

        encoding = detector.getDetectedCharset();

        if (encoding == null) {
            encoding = "UTF-8";
        }

        detector.reset();
    } catch (Exception e) {
        encoding = "UTF-8";
    }

    return encoding;
}

Note that if the encoding is not found I set UTF-8 as the default, because in my case reading without accents is better than not reading anything.

    
11.02.2018 / 03:14