Read more than one .xls file in python

3

I asked a very similar question a few days ago, but this time, I'm trying to do something much more generic. I want to get the information from line 0 and 1. The code below is complete, and is outputting the input.xls file correctly.

import xlrd
import xlwt

# Open the input worksheet 
inputWorkbook = xlrd.open_workbook('entrada.xls', formatting_info=True)

# select the correct sheet for input
inputSheet = inputWorkbook.sheet_by_name('idade')

# Create the output worksheet
outputWorkbook = xlwt.Workbook(encoding="utf-8")

# Add a new sheet to the output worksheet
outputSheet = outputWorkbook.add_sheet('idade')

# Output sheet header's style
headerStyle = xlwt.easyxf('font: height 200, bold 1, color white;''pattern: pattern solid, fore_color black; ')

for coluna in range(inputSheet.ncols):
    outputSheet.write(0, coluna, inputSheet.cell_value(1, coluna), headerStyle);
    outputSheet.write(1, coluna, inputSheet.cell_value(2, coluna));

outputWorkbook.save("resultado.xls")

The problem: Read more than 10 .xls files at once. I do not want information only from the input.xls, but also from, input1.xls, input2.xls, input3.xls ...

    
asked by anonymous 25.09.2014 / 13:47

2 answers

3

I suggest you take a look at the zip function and - if you have not already dominate this concept - in the list comprehensions list comprehensions . This will help you a lot in structuring your code.

If you have a ready code that deals with a single file:

workbook_r = open_workbook('arquivo.xls', formatting_info=True)
worksheet_r = workbook_r.sheet_by_index(0)
...

The "natural" way to move to another file is to use functions:

def ler_arquivo(nome):
    workbook_r = open_workbook(nome, formatting_info=True)
    worksheet_r = workbook_r.sheet_by_index(0)
    ...
    return resultados # Aquilo que você quer extrair do arquivo

ler_arquivo('arquivo.xls')

If you have N files, then you will call this function N times, extracting the individual results from each one (if the files have different structures then you will have to do different functions for each "type" of file, of course ):

arquivos = ['arquivo1.xls', 'arquivo2.xls', 'arquivo3.xls']
resultados = [ler_arquivo(nome) for nome in arquivos]

Okay, so far I probably have not said anything that you did not already know. The problem now is - how to combine the results of processing each file? This is where the zip function comes in. Let's say some of your results are values from a file column:

def ler_arquivo(nome):
    ...
    coluna_que_me_interessa = []
    ... # Extrai os valores de uma coluna que te interesse e coloque na lista
    return {
        "minha_coluna":coluna_que_me_interessa,
        ... # outros dados
    }

For example, let's say that the Excel file A contains "weights" and the file B contains "height measures" - and you want to use both to do a calculation.

resultados1 = ler_arquivo('pessoas_pesos.xls')
resultados2 = ler_arquivo('pessoas_alturas.xls')

pesos = resultados1['minha_coluna']
alturas = resultados2['minha_coluna']

for i in range(len(pesos)):
    imc = pesos[i] / alturas[i]*alturas[i]

Instead of doing this "strange" loop, you can use the zip function to combine the elements of the pesos and alturas two to two lists:

pesos = [50,70,60]
alturas = [160,180,165]

zip(pesos, alturas) # [(50,160), (70,180), (60,165)]

So that you can use the pair in a list understanding:

def calcular_imc(peso, altura): # Exemplo; na prática, suas funções serão mais complexas
    ...

imcs = [calcular_imc(x,y) for (x,y) in zip(pesos, alturas)]

Finally, answering your question: What if I have an arbitrary number of files? In that case (and assuming the files are homogeneous, i.e. with the same structure) you can pass a list of arguments to zip dynamically. Example:

arquivos = ['arquivo1.xls', 'arquivo2.xls', 'arquivo3.xls']
resultados = [ler_arquivo(nome) for nome in arquivos]

minha_coluna = [r['minha_coluna'] for r in resultados]
calculo = [meu_calculo(x) for x in zip(*minha_coluna)]

This will cause meu_calculo to receive a list containing the values of each column [of each file] corresponding to line 1, then a list for line 2, etc.

Update: then adapting the technique described for your specific case (in this case, zip was unnecessary):

def ler_arquivo(nome):
    inputWorkbook = xlrd.open_workbook(nome, formatting_info=True)
    inputSheet = inputWorkbook.sheet_by_name('idade')
    return {
        "ncols":inputSheet.ncols,
         # As duas linhas que interessam
        "linha1":[inputSheet.cell_value(1, coluna) for coluna in range(inputSheet.ncols)],
        "linha2":[inputSheet.cell_value(2, coluna) for coluna in range(inputSheet.ncols)],
    }

arquivos = ["entrada{}.xls".format(i+1) for i in range(10)]
resultados = [ler_arquivo(nome) for nome in arquivos]

outputWorkbook = xlwt.Workbook(encoding="utf-8")
outputSheet = outputWorkbook.add_sheet('idade')
headerStyle = xlwt.easyxf('font: height 200, bold 1, color white;''pattern: pattern solid, fore_color black; ')

# Percorre os resultados de cada arquivo, incrementando a linha em 2
linha = 0
for resultado in resultados:
    for coluna in range(resultado["ncols"]):
        outputSheet.write(linha+0, coluna, resultado["linha1"][coluna], headerStyle)
        outputSheet.write(linha+1, coluna, resultado["linha2"][coluna])
    linha = linha + 2

outputWorkbook.save("resultado.xls")
    
25.09.2014 / 16:15
-1

Hello, I would like to do something like this. but instead of getting information from two lines, I want it to take two whole columns from the input file and write to the output file. Could you help me?

    
26.06.2016 / 23:00