REGEX - Uppercase words in the middle of the sentence

Question

REGEX - Uppercase words in the middle of the sentence

Navigation

#1 by (10 votes)
#2 by (2 votes)
#3 by (1 votes)

7

Is there a regex / replace to make uppercase words in the middle of the sentence for lower case?

(Yes, I could pass everything to Lower) but it has a however in it, the rule should be ignored if the word is after point (.).

Example:

User Not Authenticated. Contact ADM.

To:

Unauthenticated user. Contact ADM.

javascript regex intellij-idea

asked by anonymous 19.01.2018 / 16:30

3 answers

2

If you want a clean solution using regex, use this expression to capture:

(^.)|(ADM)|((?<!\. )[A-zÀ-ÿ ])

And this expression for replacement:

$1$2\L$3

In your JS code, you can use:

str.replace(/(^.)|(ADM)|((?<!\. )[A-zÀ-ÿ ])/, "$1$2\L$3")

You can try these expressions on this link

Caption explanation

(^.) - Capture the first character to prevent it from being lowercase.
| - OR
(ADM) - Capture exactly ADM
| - OR% - negative lookbehind, prevents the following sequence from capturing if there is (?<!\. ) before.% . - captures all the characters, being emphasized lowercase or upper case.

Replacement explanation

% of [A-zÀ-ÿ ] - capture group 2 ( $1 ) - (^.) - capture group 3 ( $2 ) - (ADM) / p>

19.01.2018 / 20:22

1

You can use this Regex:

(?<!\.)(?:\s([A-Z\u00C0-\u00dd][A-Z\u00C0-\u00dd]*[a-z\u00E0-\u00ff][a-zA-Z\u00C0-\u00ffA-Z]*)|\s(A|O|À)(?=\s|\.))

or

(?<!\.)(?:\s([A-ZÀ-Ý][A-ZÀ-Ý]*[a-zà-ÿ][a-zA-ZÀ-ÿ]*)|\s(A|O|À)(?=\s|\.))

And the demo in Regex101 .

However, there is the problem with Proper Names, but if you do not use them, this Regex can capture what you want.

This Regex captures general words and texts in general and not just the example sentence, I suggest in the next questions you post to formulate more examples of Regex and that are "error proof".

Explanation

1st Alternative

(?<!\.)\s([A-Z\u00C0-\u00dd][A-Z\u00C0-\u00dd]*[a-z\u00E0-\u00ff]+[a-z\u00E0-\u00ffA-Z\u00C0-\u00dd]*)

(?<!\.) - Negative Lookbehind - If there is the character . before the word, it does not capture the string.
\s - Capture any whitespace (equal to [\r\n\t\f\v ] ).
([A-Z\u00C0-\u00dd][A-Z\u00C0-\u00dd]*[a-z\u00E0-\u00ff]+[A-Z\u00C0-\u00dd]*) - Capture group () - Capture words that are not completely capitalized.
- [A-Z\u00C0-\u00dd] - First letter uppercase - Corresponds a letter between A-Z and between Unicode index 192 and 221.
- [A-Z\u00C0-\u00dd]* - The second letter can be uppercase or not - Corresponds zero to infinite letters between A-Z and between Unicode indexes 192 and 221.
- [a-z\u00E0-\u00ff] - Lowercase letter required in word - Corresponds to a letter between a-z and between Unicode index 224 and 255.
- [a-zA-Z\u00C0-\u00ffA-Z]* - Lowercase, lowercase letters - Match zero to infinite letters between a-z and between A-Z and Unicode index 192 and 255.

It does not capture all upper case letters, as they can be acronyms.

Or

|

2nd Alternative

In cases with the pronouns o, a or chase upper case. Which are "alone" lyrics.

\s(A|O|À)(?=\s|\.)

\s - Capture any whitespace (equal to [\r\n\t\f\v ] ).
(A|O|À) - Capture Group - Capture literally A or O or A.
(?=\s|\.) - Positive Lookahead - After the capture group, a blank space \s or | is required \. .

19.01.2018 / 20:16

What makes join () be so superior compared to other concatenation techniques? Run program within Try / Catch

score 10 · Accepted Answer

You can use this regex:

/^[^]([^.]*)/

It captures the text from the beginning to the last character before the first . , ignoring the first character ( [^] ) storing in group 1. Then you convert to lowercase with .toLowerCase() in replace :

var string = "UsuÁrio Não AUtenticadO. Contate o ADM.";

var res = string.match(/^[^]([^.]*)/)[1];
string = string.replace(res,res.toLowerCase());

console.log(string);

Or you can take everything up to the word "Contact":

/[^](.+?(?=.\sContate))/

var string = "UsuÁrio Não AUtenticadO. Contate o ADM.";

var res = string.match(/[^](.*?(?=.\sContate))/)[1];
string = string.replace(res,res.toLowerCase());

console.log(string);

EDIT

If there are periods in the middle of the string, this regex ignores the first letter after the period. As the result returns an array with more than 1 match , it was necessary to loop in the array by ignoring the last match ( Contate o ADM ):

/([^.\sA-Z][^.]*)/g

var string = "Usuário Não Autenticado. Contate o ADM. Em Caso De Ou O A. Vou À Bahia. UsuÁrio Não AUtenTicadO. Ok Vamos testa. Contate o ADM.";

var regex = /([^.\sA-Z][^.]*)/g
var res = string.match(regex);

for(let x=0; x<res.length-1; x++){
   string = string.replace(res[x],res[x].toLowerCase());
}

console.log(string);