Editing to include a disclaimer: Obviously at some point in your process a thing is done called scraping on the recipe page. As Maniero spoke in his reply and comment, this is not very reliable. My (incomplete) solution below looks for a CPF or CNPJ in any text, which may or may not contain HTML together. It is only because of this consideration that I responded as follows. In general, who does parsing of HTML or does not know what he is doing, or is desperate #ProntoFalei .
If all you want is to extract a CNPJ, a regular expression can work. Just note that the expression will help because you will not treat HTML, but rather just extract a number from the text.
The expression you are looking for is something like:
[0-9]+\.[0-9]+\.[0-9]\[0-9]+-[0-9]+
And to those who understand REGEX: yes, I know that my expression is somewhat lazy. I give a positive vote to everyone who post a response with a more accurate expression.
Explanation:
- Each block
[0-9]
means "a numeric character here";
The +
means that the left character of +
must occur at least once, but may occur multiple times. A more correct and efficient way to capture a CPF or CNPJ would be to repeat the numeric block, type [0-9][0-9][0-9]
. I leave it to you to do this;
- The backslash serves to escape certain characters that have special meanings, so that their literal values will be used (in this case,
.
and the slash itself).
Note that because there are backslashes in the expression, you should also escape them when you put this in a string - or put an arroba in front of the string . You can use code similar to the one below:
string input; // isso deve conter o seu texto de entrada
Regex foo = new Regex(@"[0-9]+\.[0-9]+\.[0-9]\[0-9]+-[0-9]+");
Match m = foo.Match(input);
if (m.Success) {
string resultado = m.Groups[0]; // Suponho um único CNPJ por entrada.
}
Good luck!