Error while validating with Regex

2

I'm trying to format this text with regex but I do not have the desired return:

From:

  

"ST STN, SET J, * STORES T-40 / T41, - TER-REO, SHOPPING &   BOULEVARD KM 28,5 VALUE 450.00 CENTRAL. "

To:

  

"ST STN CONJUNTO J LOJAS T40 T41 TERREO SHOPPING AND BOULEVARD KM 28,5   450.00 CENTRAL "

My code:

String padrao = @"(?i)(,|.)?[^A-Za-z0-9]\s"; String padrao =
@"(?i)[^0-9a-z]\s]";

Regex rg = new Regex(_texto, " ");

var arrayTexto =
resultado.Normalize(NormalizationForm.Formd).toCharArray();
foreach(char letter in arrayTexto) { if
(CharUnicodeInfo.GetUnicodeCategory(letter) !=
UnicodeCategory.NonSpacingMark) sb.Append(letter); }

What's wrong?

    
asked by anonymous 20.04.2017 / 20:49

2 answers

0

I advise you to use replace as quoted by the user @Marconi

I think it's worth remembering that one thing makes your case very difficult, you want to eliminate several special characters ( -,*/& ) and leave some in specific points as in the digits after KM, this makes it very difficult to create a general logic that will solve your problem .

But if you want to continue using regex, you can use multiple OUs ( | ) for cases, leaving the rarest at the beginning.

(\d*\.\d*?|\d*,\d|\w*|\s)

The regex above will capture everything you want, it first checks the case of the sequence being digits followed by . with digits after to satisfy the case (450.00) then checks if the sequence is digit followed by , digit then check the cases of non-special characters being lowercase or uppercase and then check if those characters are spaces.

    
05.05.2017 / 00:46
0

Maybe this regex can attend you

string pattern = @"(?i)(,|\.)?[^a-z0-9]\s|(\/|\-)";

Using Regex.Replace () you can remove special characters.

private static string PreprocessingText(string input)
{
    string pattern = @"(?i)(,|\.)?[^a-z0-9]\s|(\/|\-)";
    return Regex.Replace(input, pattern, " ");
}

See working at .NetFiddle

    
05.05.2017 / 03:06