Check encoding of an XML

2

I need to process a series of XML files. One of the requirements is that encoding is UTF-8 . Any other type of enconding should be rejected.

This is accepted:

<?xml version="1.0" encoding="UTF-8" ?>

That's not

<?xml version="1.0" encoding="Qualquer outra coisa diferente de UTF-8" ?>

I am using javax.xml to read, validate and process my files but it is not mandatory. If you know another lib or method that makes it great!

I've already flipped over the internet and could not find anything like it. Have any of you ever had to do this? How did they resolve?

    
asked by anonymous 25.08.2014 / 22:51

1 answer

2

Just check the encoding method with getEncoding() present in XMLStreamReader :

Below is a complete example:

import java.io.InputStream;
import java.net.URL;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;

public class LeitorXML  {
    public boolean isUTF8(InputStream entrada) throws XMLStreamException {
        XMLInputFactory factory = XMLInputFactory.newInstance();
        XMLStreamReader xmlReader = factory.createXMLStreamReader(entrada);
        System.out.println(xmlReader.getEncoding());
        return xmlReader.getEncoding().equalsIgnoreCase("UTF-8");
    }
    public static void main(String[] args) {
        LeitorXML reader = new LeitorXML();
        try {
            URL url  = LeitorXML.class.getClassLoader().getResource("exemplo2.xml"); 
            InputStream strm=null;
            strm = url.openStream();
            if(reader.isUTF8(strm)){
                System.out.println("O documento é UTF-8");
            }else{
                System.out.println("O documento não é UTF-8");
            }
        } catch (Exception e) {
            e.printStackTrace();
        } 
    }
}

example.xml

<?xml version="1.0" encoding="UTF-8"?>
<config>
  <item date="2009">
    <mode>1</mode>
  </item>
  <item date="2010">
    <mode>2</mode>
  </item>
</config> 

example2.xml

<?xml version="1.0" encoding="UTF-16"?>
<config>
  <item date="2009">
    <mode>1</mode>
  </item>
  <item date="2010">
    <mode>2</mode>
  </item>
</config> 
    
26.08.2014 / 00:07