I would like to know how to make a regex to capture a monetary value with semicolons. Ex: 7.300.250,00
I would like to know how to make a regex to capture a monetary value with semicolons. Ex: 7.300.250,00
To validate if the string is in this format, you can use ^[1-9]\d{0,2}(\.\d{3})*,\d{2}$
:
^
and $
are bookmarks for the beginning and end of the string. This ensures that the string will only have what is specified in the regex [1-9]
is a character class . The square brackets indicate that you want anything that is inside them. In this case, 1-9
is "any digit from 1 to 9" \d
is a shortcut for [0-9]
(digits 0-9) and {0,2}
is a < a href="https://www.regular-expressions.info/repeat.html"> quantifier which means "between zero and two occurrences"
[1-9]\d{0,2}
means that I have a digit from 1 to 9, followed by zero, one or two digits from 0 to 9. This ensures that the string does not start with zero Then we have (\.\d{3})*
:
\.
means the dot character ( .
). The point has special significance in regex (meaning "any character"), but with \
before, it "loses its powers" and becomes a common character. \d{3}
are 3 occurrences of any digit from 0 to 9
*
, which means "zero or more occurrences". This means that we can have several occurrences (or none) of "dot followed by 3 digits" (this is to check the .300.250
sequence of your input). *
also checks for zero occurrences, which is useful for values less than 1000. ,\d{2}
) This ensures that the entry is in the desired format. See here for regex running.
To get the numeric value, you can simply remove anything other than a digit and convert it to int
. For this, we use regex \D
(which is the opposite of \d
, that is, it is anything other than digits 0 through 9).
This will give you the total amount of cents . Below I transform the value to int
, since it is best to use integer types to work with monetary values . If you want the value without the cents, just divide by 100, and if you want the cents value, use the %
operator:
import re
s = "7.300.250,00"
# se está no formato desejado
if re.match(r"^[1-9]\d{0,2}(\.\d{3})*,\d{2}$", s):
# retira tudo que não for dígito e converte para int
valor = int(re.sub(r"\D", "", s))
print("Valor (quantidade total de centavos): {}".format(valor))
print("Valor sem os centavos: {}".format(valor // 100))
print("Valor dos centavos: {}".format(valor % 100))
The output is:
Value (total amount of cents): 730025000
Value without the cents: 7300250
Cents value: 0
Only one detail about \d
: it may also correspond to other characters that represent digits , such as ٠١٢٣٤٥٦٧٨٩
characters (see this answer for more details).
Example:
s = "1٩,10"
if re.match(r"^[1-9]\d{0,2}(\.\d{3})*,\d{2}$", s):
valor = int(re.sub(r"\D", "", s))
print("Valor (quantidade total de centavos): {}".format(valor))
I used the character ٩
( arabic-indic digit nine ), which despite appearing with the digit 9
, is another character. The output is:
Amount (total amount of cents): 1910
That's because \d
also gets this character. If you want only the digits of 0
to 9
to be considered, change \d
to [0-9]
:
if re.match(r"^[1-9][0-9]{0,2}(\.[0-9]{3})*,[0-9]{2}$", s):
... o resto é igual
You can use this expression:
^(([1-9]\d{0,2}(\.\d{3})*)|(([1-9]\.\d*)?\d))(\,\d\d)?