.txt file parsing using Pandas from external rules in a JSON

0

I have a dataset in .txt format that has its own formatting with the rules described by a separate JSON file itself.

Is there any direct way to tell Pandas to use this JSON as the basis for decoding .txt?

This is a piece of the .json file - it has several objects of this type, but with different things in the value of each key.

[
    {
      "codigo": "V0101",
      "inicio": 1,
      "tamanho": 4,
      "descricao": "Ano de referência",
      "rotulo": "ano",
      "valores": "str"
    },
    {
      "codigo": "UF",
      "inicio":5,
      "tamanho":2,
      "descricao": "Unidade da Federação",
      "rotulo": "UF",
      "valores": {"11": "Rondônia", "12": "Acre", "13": "Amazonas", "14": "Roraima", "15": "Pará", "16": "Amapá", "17": "Tocantins", "21": "Maranhão", "22": "Piauí", "23": "Ceará", "24": "Rio Grande do Norte", "25": "Paraíba", "26": "Pernambuco", "27": "Alagoas", "28": "Sergipe", "29": "Bahia", "31": "Minas Gerais", "32": "Espírito Santo", "33": "Rio de Janeiro", "35": "São Paulo", "41": "Paraná", "42": "Santa Catarina", "43": "Rio Grande do Sul", "50": "Mato Grosso do Sul", "51": "Mato Grosso", "52": "Goiás", "53": "Distrito Federal"}
    },

This is a part print of the .txt file - lines are not regular:   

Thank you very much if you can help!

    
asked by anonymous 16.10.2018 / 20:37

1 answer

0
def extrai_txt(arquivo, layout):
    for linha in arquivo:
        yield {c['codigo']: linha[c['inicio']-1:c['inicio']-1+c['tamanho']]
            for c in layout}

How to use:

with open('seu_arquivo.txt') as f:
    for reg in extrai_txt(f, seu_json):
        print(reg)

The result is several dict s:

{'ano': '2015', 'uf': '11', ...}
{'ano': '2016', 'uf': '13', ...}
    
17.10.2018 / 19:06