Duplicate Normalizer + Regex

Question

Duplicate Normalizer + Regex

Navigation

#1 by (2 votes)
#2 by (1 votes)

4

Could someone explain the code below?

returnStr = Normalizer.normalize(returnStr, Normalizer.Form.NFD)
    .replaceAll("[^\p{ASCII}]", "");

returnStr has a url as its initial value.

java regex

asked by anonymous 23.06.2016 / 21:46

2 answers

1

Looking at the regular expression, this is a denied list. He will replace any non-ASCII character instance by empty (remove).

Take the URL

Remove any non-ASCII characters.

ASCII is a table of characters, containing letters numbers and symbols and computational code corresponding to it.

23.06.2016 / 22:00

How to do Scaffolding in ASP.Net MVC with texts and resources in pt-BR? Doubt in the method of using Distinct and Count together!

score 2 · Accepted Answer

Analyzing the sentence `[^\p{ASCII}]`

It is in a String that will be converted to REGEX, and \ in String is an escape character, which makes it literal next , so after the conversion the result will be [^\p{ASCII}] .
[] means a string that will be captured, [^ ] is a negated string, ie instead of just capturing what fits in the string, it will capture everything that does not fit.
\p *, as well as \w , \d , \x is an anchor, serves to write less, instead of doing [A-Za-z0-9_] just a \w
{ASCII} is a condition for \p , this will depend on the REGEX library of the language / compiler you are using.

Note

Using the REGEX101 , in Quick reference > Meta sequences we have the \p .

Matches a unicode character passed as a parameter.

In this page we have some drivers supported, however comparing with the ASCII table , not all sentences beat, this is because ASCII table is only 0-127 standard , after this it does not follow an absolute standard so it will depend on the language which is using, for example

Addendum

The {ASCII} has an anchor denied \p , so you could just convert String to \p , not needing the denied string \P .

Duplicate Normalizer + Regex

2 answers

Analyzing the sentence [^\p{ASCII}]

Note

Addendum

Analyzing the sentence `[^\p{ASCII}]`