"Simply simplify" using (\s+)?
for spaces to be optional , regex can not be very simple, but in your case, like this:
(\d+(,\d+)?)(\s+)?(cm)?(\s+)?x(\s+)?(\d+(,\d+)?)(\s+)?(cm)?(\s+)?x(\s+)?(\d+(,\d+)?)(\s+)?(cm)?
Example online in RegEr: link
Explaining the regex
The first part of the regex would be this:
(\d+(,\d+)?)(\s+)?(cm)?
-
The (,\d+)?
optionally searches for the number of the comma after the comma
-
The (\s+)?
looks for one or more spaces optionally
-
The (cm)?
looks for the measurement optionally
Now, after that, just use% w / w between repeating the expression, of course you can do it in other ways, but the result would be almost the same, so it is repetitive but more understanding
If the goal is to fetch one entry at a time then applying the x
at the beginning and end should already solve, for example:
\b(\d+(,\d+)?)(\s+)?(cm)?(\s+)?x(\s+)?(\d+(,\d+)?)(\s+)?(cm)?(\s+)?x(\s+)?(\d+(,\d+)?)(\s+)?(cm)?\b
Multiple values
Now if the entry has multiple values so do it this way:
import re
expressao = r'(\d+(,\d+)?)(\s+)?(cm)?(\s+)?x(\s+)?(\d+(,\d+)?)(\s+)?(cm)?(\s+)?x(\s+)?(\d+(,\d+)?)(\s+)?(cm)?'
entrada = '''
23,6 x 34 x 17,1
14,5 x 55 x 22
14,5cm x 55 x 22cm
14,5cmx55x22cm
14,5 cmx55 cmx22 cm
''';
resultados = re.finditer(expressao, entrada)
for resultado in resultados:
valores = resultado.groups()
print("Primeiro:", valores[0])
print("Segundo:", valores[6])
print("Terceiro:", valores[12])
print("\n")
Note that the group in the regex is 6 in 6 to get each number between \b
, that is, each group returns something like:
('23,6', ',6', ' ', None, None, ' ', '34', None, ' ', None, None, ' ', '17,1', ',1', '\n', None)
('14,5', ',5', ' ', None, None, ' ', '55', None, ' ', None, None, ' ', '22', None, '\n', None)
('14,5', ',5', None, 'cm', ' ', ' ', '55', None, ' ', None, None, ' ', '22', None, None, 'cm')
('14,5', ',5', None, 'cm', None, None, '55', None, None, None, None, None, '22', None, None, 'cm')
('14,5', ',5', ' ', 'cm', None, None, '55', None, ' ', 'cm', None, None, '22', None, ' ', 'cm')
Then you will only use the X
, valores[0]
and valores[6]
, example in repl.it: link
Using values for math operations
Note that valores[12]
does not make the number be considered a "number" for Python, so if a mathematical operation is to be converted to ,
, like this:
float('1000,00001'.replace(',', ','))
It should look something like this:
for resultado in resultados:
valores = resultado.groups()
primeiro = float(valores[0].replace(',', '.'))
segundo = float(valores[6].replace(',', '.'))
terceiro = float(valores[12].replace(',', '.'))
print("Primeiro:", primeiro)
print("Segundo:", segundo)
print("Terceiro:", terceiro)
print("Resultado:", primeiro * segundo * terceiro)
print("\n")