What is the best way to do the verification? (Try-catch, multiples if, ...?)

0

I'm reading thousands of XML files with python. The problem is, there is not always the field in every file.

    resumo_cv = root.find("DADOS-GERAIS").find("RESUMO-CV").get("TEXTO-RESUMO-CV-RH")
    resumo_cv_ingles = root.find("DADOS-GERAIS").find("RESUMO-CV").get("TEXTO-RESUMO-CV-RH-EN")
    palavras_chave_mestrado = root.find("DADOS-GERAIS").find("FORMACAO-ACADEMICA-TITULACAO").find("MESTRADO").find("PALAVRAS-CHAVE")
    list_palavras_chave_mestrado = ""
    if palavras_chave_mestrado is not None:
         for palavra, valor in palavras_chave_mestrado.items():
         if valor is not None and valor != "":
              list_palavras_chave_mestrado = 

In the case, the code above would look like this:

dados_gerais = root.find("DADOS-GERAIS")
if dados_gerais is not None:
    resumo_cv = dados_gerais.find("RESUMO-CV")
    if resumo_cv is not None:
        texto_resumo_cv = resumo_cv.get("TEXTO-RESUMO-CV-RH")
        if texto_resumo_cv is None:
            texto_resumo_cv = ''
        texto-resumo_cv_ingles = resumo_cv.get("TEXTO-RESUMO-CV-RH-EN")
        if resumo_cv_ingles is None:
            texto_resumo_cv_ingles = ''

That is, a check for each field (find) and (get). Not to mention, some XML fields have to go through lists ... Is there any optimized way using try-cacth or anything else? Hahah obg.

    
asked by anonymous 14.12.2018 / 15:14

1 answer

0

The best way to validate XML files is to use a validation scheme.

Generally, DTDs are used. A basic tutorial on using DTDs for validating XML files can be found here . More information on DTDs can be found in this Wikipedia article . More information on other XML validation methods can be found in this other article, also from Wikipedia .

In the above case, you try to access the fields of your XML. There are more general ways of parsing in XML as the implemented in this GitHub project . Maybe this last link is enough to solve your problem in a much more elegant way.

    
14.12.2018 / 17:12