Normally, the default \w
(word) matches any letter, number, and underscore ( _
). That way, \w*
would match a string with any number of characters of that type (including the empty string), and \w+
the same thing, but would require at least one character.
This would be the most complete solution because defining letters in a range (eg, [a-zA-Z]
) would only consider ASCII characters, without accepting accented letters ( á
). To make Java agree to match \w
with Unicode letters, just prefix the default with (?U)
[source] . If you are not interested in Unicode (i.e. just want ASCII letters yourself), just omit this prefix (or use the alternative solutions given in the other answers - which would also be correct in that case).
If the presence of the underscore is a problem, we can eliminate it through a "double negation":
[^\W_]
That is: "Case everything that is not a 'no word' nor an underscore." Rubular example . (Note: if it is not clear, \w
- in lowercase - sets the character class "word"; \W
- in uppercase - reverses, matching everything that is not of that character class; [...]
home a of a set of characters; [^...]
reverses, marrying anything that is not one of these characters)
To use it, the simplest method is String.matches
" (checks whether the integer string matches the expression passed as a parameter), or if using Pattern
and Matcher
for other behaviors:
"abc".matches("(?U)[^\W_]*"); // true
Pattern p = Pattern.compile("(?U)[^\W_]*");
p.matcher("abc").matches(); // ok, a string inteira casa com o padrão
p.matcher("$a$").find(); // ok, o padrão pode ser achado na string
p.matcher("ab$").lookingAt(); // ok, a string começa com o padrão