Ignore any white space in the middle of a string

0

I'm trying to create a regex that is able to find a string in a text even though there is some white space in the middle of the words. For example, I search the text for the following excerpts

"impaired conciliation" or even "irreconcilable"

But since everything is not always beautiful there may be some lost space in the middle of words, for example:

"with impaired ciliation" or "i n c o cated"

I did as follows:

padrao = re.search(r'i\s*n\s*c\s*o\s*n\s*c\s*i\s*l\s*i\s*a\s*d\s*o\s*s|'
                     r'c\s*o\s*n\s*c\s*i\s*l\s*i\s*a\s*ç\s*ã\s*o\s*(p\s*r\s*e\s*j\s*u\s*d\s*i\s*c\s*a\s*d\s*a|r\s*e\s*j\s*e\s*i\s*t\s*a\s*d\s*a)', text)

My question is .. is there a less ugly and gigantic way of ignoring these spaces?

    
asked by anonymous 30.01.2018 / 16:29

2 answers

2

Or you can remove any space first and perform the search after:

texto = re.sub("\s", "", texto)

You can then fetch the text normally by using its regular expression. Depending on your goals, you may want to put everything into capture groups:

resultado = re.search("(conciliação)(prejudicada)|(inconciliados)", texto)

And if you want all the results, you can use re.findall :

resultados = re.findall("(conciliação)(prejudicada)|(inconciliados)", texto)
    
31.01.2018 / 00:24
0

A less ugly way is to let a function generate the regex for you:

def to_regex(s):
    return '\s*'.join(s)

print(to_regex('teste'))
    
30.01.2018 / 16:55