Regex for monetary values

Question

Regex for monetary values

Navigation

#1 by (5 votes)
#2 by (3 votes)

1

I would like to know how to make a regex to capture a monetary value with semicolons. Ex: 7.300.250,00

python regex

asked by anonymous 12.11.2018 / 16:04

2 answers

3

You can use this expression:

 ^(([1-9]\d{0,2}(\.\d{3})*)|(([1-9]\.\d*)?\d))(\,\d\d)?

12.11.2018 / 16:21

Randomizing the position of images Mineration of text with R (stringr)

score 5 · Accepted Answer

To validate if the string is in this format, you can use ^[1-9]\d{0,2}(\.\d{3})*,\d{2}$ :

^ and $ are bookmarks for the beginning and end of the string. This ensures that the string will only have what is specified in the regex
[1-9] is a character class . The square brackets indicate that you want anything that is inside them. In this case, 1-9 is "any digit from 1 to 9"
\d is a shortcut for [0-9] (digits 0-9) and {0,2} is a < a href="https://www.regular-expressions.info/repeat.html"> quantifier which means "between zero and two occurrences"
- therefore [1-9]\d{0,2} means that I have a digit from 1 to 9, followed by zero, one or two digits from 0 to 9. This ensures that the string does not start with zero

Then we have (\.\d{3})* :

\. means the dot character ( . ). The point has special significance in regex (meaning "any character"), but with \ before, it "loses its powers" and becomes a common character.
\d{3} are 3 occurrences of any digit from 0 to 9
- The "followed by 3-digit" string is in parentheses and then we have * , which means "zero or more occurrences". This means that we can have several occurrences (or none) of "dot followed by 3 digits" (this is to check the .300.250 sequence of your input). * also checks for zero occurrences, which is useful for values less than 1000.
Finally, we have the comma followed by 2 digits ( ,\d{2} )

This ensures that the entry is in the desired format. See here for regex running.

To get the numeric value, you can simply remove anything other than a digit and convert it to int . For this, we use regex \D (which is the opposite of \d , that is, it is anything other than digits 0 through 9).

This will give you the total amount of cents . Below I transform the value to int , since it is best to use integer types to work with monetary values . If you want the value without the cents, just divide by 100, and if you want the cents value, use the % operator:

import re

s = "7.300.250,00"
# se está no formato desejado
if re.match(r"^[1-9]\d{0,2}(\.\d{3})*,\d{2}$", s):
    # retira tudo que não for dígito e converte para int
    valor = int(re.sub(r"\D", "", s))
    print("Valor (quantidade total de centavos): {}".format(valor))
    print("Valor sem os centavos: {}".format(valor // 100))
    print("Valor dos centavos: {}".format(valor % 100))

The output is:

Value (total amount of cents): 730025000
  Value without the cents: 7300250
  Cents value: 0

Only one detail about \d : it may also correspond to other characters that represent digits , such as ٠١٢٣٤٥٦٧٨٩ characters (see this answer for more details).

Example:

s = "1٩,10"
if re.match(r"^[1-9]\d{0,2}(\.\d{3})*,\d{2}$", s):
    valor = int(re.sub(r"\D", "", s))
    print("Valor (quantidade total de centavos): {}".format(valor))

I used the character ٩ ( arabic-indic digit nine ), which despite appearing with the digit 9 , is another character. The output is:

Amount (total amount of cents): 1910

That's because \d also gets this character. If you want only the digits of 0 to 9 to be considered, change \d to [0-9] :

if re.match(r"^[1-9][0-9]{0,2}(\.[0-9]{3})*,[0-9]{2}$", s):
    ... o resto é igual