Could someone explain the code below?
returnStr = Normalizer.normalize(returnStr, Normalizer.Form.NFD)
.replaceAll("[^\p{ASCII}]", "");
returnStr has a url as its initial value.
Could someone explain the code below?
returnStr = Normalizer.normalize(returnStr, Normalizer.Form.NFD)
.replaceAll("[^\p{ASCII}]", "");
returnStr has a url as its initial value.
[^\p{ASCII}]
[^\p{ASCII}]
. []
means a string that will be captured, [^ ]
is a negated string, ie instead of just capturing what fits in the string, it will capture everything that does not fit. \p
*, as well as \w
, \d
, \x
is an anchor, serves to write less, instead of doing [A-Za-z0-9_]
just a \w
{ASCII}
is a condition for \p
, this will depend on the REGEX library of the language / compiler you are using. Using the REGEX101 , in Quick reference > Meta sequences we have the \p
.
Matches a unicode character passed as a parameter.
In this page we have some drivers supported, however comparing with the ASCII table , not all sentences beat, this is because ASCII table is only 0-127 standard , after this it does not follow an absolute standard so it will depend on the language which is using, for example
The {ASCII}
has an anchor denied \p
, so you could just convert String to \p
, not needing the denied string \P
.
Looking at the regular expression, this is a denied list. He will replace any non-ASCII character instance by empty (remove).
ASCII is a table of characters, containing letters numbers and symbols and computational code corresponding to it.