I need to extract only the sentences that contain ADMINISTRATION - JUDGE OF OUTSIDE - NIGHT - SISU - GROUP B, for example. That is, I need to get only the course name, city, shift, O SISU, and the group name of the following string:
string = </li><li><a href=http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=46A&id_grupo=70>ADMINISTRAÇÃO - JUIZ DE FORA - NOTURNO - SISU - GRUPO A</a></li><li><a href=http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=46A&id_grupo=71>ADMINISTRAÇÃO - JUIZ DE FORA - NOTURNO - SISU - GRUPO B</a></li><li><a href=http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=46A&id_grupo=72>
The string is huge, that's just a bit. I managed to make one but she is returning bitten things, and also, she is not picking up letters with an accent, like for example the accented "O" of HISTORY. The expression I did was
cursos = re.findall(([A-Z])\w+g)
I need to get this out:
ADMINISTRAÇÃO - JUIZ DE FORA - NOTURNO - SISU - GRUPO A
But it returns me this:
GEOGRAFIA - JUIZ DE FORA - DIURNO - SISU - GRUPO( não está pegando qual grupo é)
and in HISTORY for example it does not take the "O" accented.