Python - Split words delimited by white space or bracket

1

I have a string with several words inside, some words are separated by space, but some are compound words and are protected by brackets.

EX:

string="Goiânia Vitória Brasília [Campo Grande] Fortaleza [São Paulo] Manaus"

I need to separate these words by returning a list with them separated.

Output EX:

"Goiânia"

"Victory"

"Brasilia"

"Big Field"

"Fortress"

"São Paulo"

"Manaus"

How do I create a regular expression that does this in python?

    
asked by anonymous 10.12.2017 / 21:36

1 answer

2

Well, the idea is basically to work with groupings.

The first step is to identify the data pattern to mount the appropriate regex.

Based on the information provided, I have identified the following pattern:

\[(.*?)\]|(\S+)

Basically it is any grouping between [], or (|) any grouping of words.

You can test regex in real time on the Rubular ,

This regex will basically return you in case of match in group 1 the names between [], and in group 2 the other words.

Using the python3 programming language would look something like:

import re
text = """Goiânia Vitória Brasília [Campo Grande] Fortaleza [São Paulo] Manaus [Santa Bárbara d'Oeste]"""
regex = re.compile('\[(.*?)\]|(\S+)')
matches = regex.finditer(text)
for match in matches:
    if(match.group(1) is None):
        print(match.group(2))
    else:
        print(match.group(1))

See working at Ideone

    
11.12.2017 / 19:03