Regex to extract HTML information

0

I'm trying to extract information that comes from reading email. But when it passes the match line, it bursts the following error:

  

{"analyzing \" (? si: (Information Type [^ \ d] + (? [\ d] +) | Information Type (? [\ d] +)   - Invalid group name: group names must start with an alphabetic character. "}

I have already done several tests and could not identify, if anyone has an idea thank you.

string texto = @"<P CLASS=CS95E872D0><SPAN CLASS=CSE27513221><SPAN STYLE='FONT-SIZE:10.0PT'>&NBSP;</SPAN></SPAN><O:P></O:P></P>
<P CLASS='CS95E872D0'><SPAN CLASS='CSE27513221'><SPAN STYLE='FONT-SIZE:10.0PT'>TIPO DE INFORMAÇÃO: INFORMAÇÃO A SER RECUPERADA</SPAN></SPAN><O:P></O:P></P>
<P CLASS='CS95E872D0'><SPAN CLASS='CSE27513221'><SPAN STYLE='FONT-SIZE:10.0PT'>PERIODO: &NBSP;31/10/2013 A 31/10/2018</SPAN></SPAN><O:P></O:P></P>";

string pattern = @"(?si:({0}[^\d]+(?<Tipo de Informação>[\d]+)|{0}(?<Tipo de Informação>[\d]+)))";

pattern = string.Format(pattern, "Tipo de Informação");

Match match = new Regex(pattern).Match(texto);
    
asked by anonymous 03.01.2019 / 13:14

1 answer

0

Although it is not recommended to use Regex for very large texts (usually html is very large), it is possible to use yes as long as you do not create very complex expressions (always use fixed text in the regex that helps a lot =). >

According to the MSDN documentation: Regular Expression Grouping Constructs

  

(? subexpression)
  or:   (? 'name'subexpression)
  where name is a valid group name and subexpression is any valid regular expression pattern. name should not contain punctuation characters or start with a number .

Space also counts as punctuation! = /

I've set your regex to work:

string pattern = @"(?si:({0}[^\d]+(?<TipoDeInformacao>[\d]+)|{0}(?<TipoDeInformacao>[\d]+)))";

Good luck!

Tip: Use the Visual Studio extension for Regex , because you can do several configuration in addition to generating 1 example code ^^.

    
03.01.2019 / 17:08