You can use this Regex:
(?<!\.)(?:\s([A-Z\u00C0-\u00dd][A-Z\u00C0-\u00dd]*[a-z\u00E0-\u00ff][a-zA-Z\u00C0-\u00ffA-Z]*)|\s(A|O|À)(?=\s|\.))
or
(?<!\.)(?:\s([A-ZÀ-Ý][A-ZÀ-Ý]*[a-zà-ÿ][a-zA-ZÀ-ÿ]*)|\s(A|O|À)(?=\s|\.))
And the demo in Regex101 .
However, there is the problem with Proper Names, but if you do not use them, this Regex can capture what you want.
This Regex captures general words and texts in general and not just the example sentence, I suggest in the next questions you post to formulate more examples of Regex and that are "error proof".
Explanation
1st Alternative
(?<!\.)\s([A-Z\u00C0-\u00dd][A-Z\u00C0-\u00dd]*[a-z\u00E0-\u00ff]+[a-z\u00E0-\u00ffA-Z\u00C0-\u00dd]*)
-
(?<!\.)
- Negative Lookbehind - If there is the character .
before the word, it does not capture the string.
-
\s
- Capture any whitespace (equal to [\r\n\t\f\v ]
).
-
([A-Z\u00C0-\u00dd][A-Z\u00C0-\u00dd]*[a-z\u00E0-\u00ff]+[A-Z\u00C0-\u00dd]*)
- Capture group () - Capture words that are not completely capitalized.
-
[A-Z\u00C0-\u00dd]
- First letter uppercase - Corresponds a letter between A-Z and between Unicode index 192 and 221.
-
[A-Z\u00C0-\u00dd]*
- The second letter can be uppercase or not - Corresponds zero to infinite letters between A-Z and between Unicode indexes 192 and 221.
-
[a-z\u00E0-\u00ff]
- Lowercase letter required in word - Corresponds to a letter between a-z and between Unicode index 224 and 255.
-
[a-zA-Z\u00C0-\u00ffA-Z]*
- Lowercase, lowercase letters - Match zero to infinite letters between a-z and between A-Z and Unicode index 192 and 255.
It does not capture all upper case letters, as they can be acronyms.
Or
|
2nd Alternative
In cases with the pronouns o, a or chase upper case. Which are "alone" lyrics.
\s(A|O|À)(?=\s|\.)
-
\s
- Capture any whitespace (equal to [\r\n\t\f\v ]
).
-
(A|O|À)
- Capture Group - Capture literally A or O or A.
-
(?=\s|\.)
- Positive Lookahead - After the capture group, a blank space \s
or |
is required \.
.