I have an HTML that I need to retrieve the values from a set of <li>
.
This is part of HTML:
<ul id="minhas-tags">
<li><em>Tagged: </em></li>
<li><a href="/tags/tag1">tag1</a>, </li>
<li><a href="/tags/tag2">tag2</a>, </li>
<li><a href="/tags/tag3">tag3</a>, </li>
<li><a href="/tags/tag4">tag4</a>, </li>
I want to get the content of <li>
like tag1 , tag2 , etc.
After much reading here I came up with this regular expression:
tags/[a-zA-Z]+">[a-zA-Z]+<+
This can isolate the HTML I want from all the rest, but I do not know how to transform this expression so that it finds the values and returns only the contents of <li>
.
This expression returns me for example: /tags/tag1">tag1<
, and I only want tag1
.
How would I do this? And could you explain how the suggested expression would work, please?
Update
Sorry, I did not put the language, I'm using C #, my routine is something like this:
public string retorna_Tags_HTML(string html)
{
Regex ER = new Regex(@"tags?([\w]+)<\/a>", RegexOptions.None);
Match m = ER.Match(html);
}