Incorrect value conversion in python

1

I have a crawler that takes the value STRING $ 560,000.00 .

I need to convert this value to FLOAT because I will use this value to perform queries of this type:

Selecionar todos os carros com o valor entre 100000 até 560000

I am converting the value this way:

float(price[2:].replace(',', ''))

And it is converting the value of $ 560,000 to 560.0

I'd like the converted values like this:

  • $ 17,000 to 17,000
  • $ 100,000 to 100,000
  • R $ 560,000.00 for 560000
asked by anonymous 03.01.2017 / 15:08

2 answers

2

Would not it be nice to give replace to the point as well? 100,000 will be equal to 100 reais and not to 100 thousand reais.

float(price[2:].replace('.', ''))
    
03.01.2017 / 15:15
2

The colleague's answer @Priscilla is enough and, in fact, the best choice for the vast majority of cases. However, if your crowler needs to handle money in different formats, it may be helpful to consider the locale / language of the page accessed. One way to do this is by using the locale package.

Here is an example code:

import re
import locale

#--------------------------------------------------
def extractMonetaryValue(text):

    cs = locale.localeconv()['currency_symbol']
    expr = '{}[ ]*[0-9.,]+'.format(cs.replace('$', '\$'))

    m = re.search(expr, text)
    if m:
        s = m.group(0).replace(cs, '').replace(' ', '')
        return locale.atof(s)
    else:
        return 0.0
#--------------------------------------------------

s = 'Este teste testa um valor (por exemplo: R$ 560.200,40) expresso em Reais.'
locale.setlocale(locale.LC_ALL, 'ptb_bra') # 'pt_BR' se não estiver no Windows
n = extractMonetaryValue(s)
print('Para "{}" o valor é: {}'.format(s, n))

s = 'This test tests a value (let us say U$ 482,128.33) given in US Dolars.'
locale.setlocale(locale.LC_ALL, 'enu_usa') # 'en_US' se não estiver no Windows
n = extractMonetaryValue(s)
print('Para "{}" o valor é: {}'.format(s, n))

In this code, the principal is the extractMonetaryValue function. It receives any text and searches for it by a subtext that necessarily contains the monetary symbol of the configured country / language (followed by zero or more spaces), and then a number composed of digits, periods, and commas. To do so, it uses a fairly comprehensive regular expression: it does not care if the numeric "format" is correct or not, as this will be done later by locale.atof (which throws the ValueError exception if the format is incorrect according to the configured country / language).

The output of the above code is as follows:

Para "Este teste testa um valor (por exemplo: R$ 560.200,40) expresso em Reais." o valor é: 560200.4
Para "This test tests a value (let us say U$ 482,128.33) given in US Dolars." o valor é: 482128.33

Notice how the numbers printed at the end use both dot as the decimal separator (after all, they are values represented as float internally, same regardless of the source treated).

  

Q.:

     
  • To detect the default% of operating system%, use locale
  •   
  • To detect the locale.getdefaultlocale() of a webpage, make sure it has this information in the tag    locale .   If it does not, you will need to try to infer the language. For yours   (wow! hehe) lucky, there's this size of the Google language detector   to called Python    lang .
  •   
        
    03.01.2017 / 18:16