I need to do a regular expression to extract the links from this string:
links =('href=http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=01GV&id_grupo=70>ADMINISTRAÇÃO - GOVERNADOR VALADARES - DIURNO - SISU - GRUPO A</a></li><li><a href=http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=01GV&id_grupo=71>ADMINISTRAÇÃO - GOVERNADOR VALADARES - DIURNO - SISU - GRUPO B</a></li>
The string is much larger. I put only one part because the rest repeats. Here's what I've tried:
campus1 = re.findall("href", links)
campus2 = re.findall("http", links)
campus3 = re.findall("href=http", links)
campus4 = re.findall("hre", links)
campus5 = re.findall("a", links)
campus6 = re.findall("<a> <\a>", links)
When I give a print or leave the letters separated or leave the link and these names (which later I will also have to think of an expression to get only those names of colleges) Anyone any ideas? What comes out is this when I run campus1 = re.findall ("href", links), for example: 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href', That is, it returns all the href's of the string. I would like to extract only the links, for example:
">All links as they are in this string.