Duplicate Normalizer + Regex

4

Could someone explain the code below?

returnStr = Normalizer.normalize(returnStr, Normalizer.Form.NFD)
    .replaceAll("[^\p{ASCII}]", "");

returnStr has a url as its initial value.

    
asked by anonymous 23.06.2016 / 21:46

2 answers

2

Analyzing the sentence [^\p{ASCII}]

  • It is in a String that will be converted to REGEX, and \ in String is an escape character, which makes it literal next , so after the conversion the result will be [^\p{ASCII}] .
  • [] means a string that will be captured, [^ ] is a negated string, ie instead of just capturing what fits in the string, it will capture everything that does not fit.
  • \p *, as well as \w , \d , \x is an anchor, serves to write less, instead of doing [A-Za-z0-9_] just a \w
  • {ASCII} is a condition for \p , this will depend on the REGEX library of the language / compiler you are using.

Note

Using the REGEX101 , in Quick reference > Meta sequences we have the \p .

  

Matches a unicode character passed as a parameter.

In this page we have some drivers supported, however comparing with the ASCII table , not all sentences beat, this is because ASCII table is only 0-127 standard , after this it does not follow an absolute standard so it will depend on the language which is using, for example

Addendum

The {ASCII} has an anchor denied \p , so you could just convert String to \p , not needing the denied string \P .

    
24.06.2016 / 14:13
1

Looking at the regular expression, this is a denied list. He will replace any non-ASCII character instance by empty (remove).

  • Take the URL
  • Remove any non-ASCII characters.
  • ASCII is a table of characters, containing letters numbers and symbols and computational code corresponding to it.

        
    23.06.2016 / 22:00