Ignore any white space in the middle of a string

Question

Ignore any white space in the middle of a string

Navigation

#1 by (2 votes)
#2 by (0 votes)

0

I'm trying to create a regex that is able to find a string in a text even though there is some white space in the middle of the words. For example, I search the text for the following excerpts

"impaired conciliation" or even "irreconcilable"

But since everything is not always beautiful there may be some lost space in the middle of words, for example:

"with impaired ciliation" or "i n c o cated"

I did as follows:

padrao = re.search(r'i\s*n\s*c\s*o\s*n\s*c\s*i\s*l\s*i\s*a\s*d\s*o\s*s|'
                     r'c\s*o\s*n\s*c\s*i\s*l\s*i\s*a\s*ç\s*ã\s*o\s*(p\s*r\s*e\s*j\s*u\s*d\s*i\s*c\s*a\s*d\s*a|r\s*e\s*j\s*e\s*i\s*t\s*a\s*d\s*a)', text)

My question is .. is there a less ugly and gigantic way of ignoring these spaces?

python regex python-3.x

asked by anonymous 30.01.2018 / 16:29

2 answers

0

A less ugly way is to let a function generate the regex for you:

def to_regex(s):
    return '\s*'.join(s)

print(to_regex('teste'))

30.01.2018 / 16:55

Validate geographic coordinates How to change the default name of the bootbox OK button?

score 2 · Accepted Answer

Or you can remove any space first and perform the search after:

texto = re.sub("\s", "", texto)

You can then fetch the text normally by using its regular expression. Depending on your goals, you may want to put everything into capture groups:

resultado = re.search("(conciliação)(prejudicada)|(inconciliados)", texto)

And if you want all the results, you can use re.findall :

resultados = re.findall("(conciliação)(prejudicada)|(inconciliados)", texto)