How to "clean" a csv file with Python?

0

Hello! I am extremely new to programming, so I apologize if I can not explain what I am trying to do or if my code is very wrong. I have a recurring task in my job which is to open a list of restaurants in Ifood, collect the prices of each of them and calculate the average. It's simple, but it takes considerable time. I then decided to create a program in Python that collects prices for me, so I would only have the job select everything in Excel and have the average calculated. Here is the program:

import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup

print('Olá! Entre com o link do restaurante do iFood agora.')
site = input()

html = urlopen(site)
bs = BeautifulSoup(html, 'html.parser')
precosLista = bs.findAll('div',{'class':'result-actions'})
csvFile = open('Preços.csv', 'wt+')
writer = csv.writer(csvFile)

try:
    for precos in precosLista:
        print(precos.get_text())
        csvPreco = []
        csvPreco.append(precos.get_text())
        writer.writerow(csvPreco)

finally:
    csvFile.close()

The code works, but the returned data has a strange format. An example:

"


                                                        R$ 71,90


                                                        R$ 59,90














"

"


                                                        R$ 45,90


                                                        R$ 32,90














"

"


                                                        R$ 29,90


                                                        R$ 24,90














"

"


                                                        R$ 29,90


                                                        R$ 24,90














"

"


                                                        R$ 29,90


                                                        R$ 24,90














"

"


                                                        R$ 29,90


                                                        R$ 24,90














"

If I try to open this directly in Excel, it is extremely tedious and difficult to manipulate. What I have done then is to open the generated csv in Word and remove all spaces, blank lines, $ and quotation marks. That done, I copy the numbers to Excel and then I order the average. My idea then was to create another program to do this cleaning for me, using the replace function, the problem is that I can not make it work. Could someone help me?

    
asked by anonymous 18.10.2018 / 15:32

1 answer

1

You can create a function that cleans the text before saving the CSV. In the function I put it to replace ',' and $, but you can add more rules as you need.

def clean_up_text(value):

    value = value.replace("R$", "")
    value = value.replace("'", "")
    value = value.replace('"', '')
    value = value.strip() # remove espaços em branco e quebras de linha

    return value

Just call this function at the time of saving the value: clean_up_text(precos.get_text())

    
18.10.2018 / 16:10