Filter filenames in uppercase and with a certain length at the end

4

I have some files named with person names, but some are completely capitalized, others completely in lowercase and some even with mixed case.

I would like to set up a regex to filter only those file names that were totally uppercase, without containing the - Cópia extension before the extension.

The snippet at the end I can detect with the regex this answer from Guillaume in another question I had asked, but now I need to merge a regex to check if the filename is all uppercase, denying the regex of the linked response, in case you have the quoted passage.

To demonstrate what I want to do:

EDSON ARANTES DO NASCIMENTO.jpg -> passa
EDsON ARANTEs DO NASCIMENTO.jpg -> não passa
EDSON ARANTES DO NASCIMENTO - Cópia.jpg -> não passa
EDSON ARANTES DO NASCIMENTO. - Cópia - Cópia.jpg -> não passa

The regex I made so far was:

^([A-Z]{2,}+).*( - C[oó]pia\.[^.]+)$

but this lets pass all the cases above. I even found this other answer in SOEn but I do not know how to apply. How do I adapt this code so that only the first example passes?

    
asked by anonymous 25.07.2017 / 13:00

2 answers

2

The solution I found was this:

^([^a-z]{1,}[A-Z]{2,}+)(?:(?! - C[oó]pia\.[^.]+).)+$

Basically they are two groups, where the first one does not allow lowercase letters in any quantity, and only uppercase letters from 2 characters in a row (strangely it only works with this limitation, if remove does not work correctly). The second group denies the Guilherme's reply .

Validation can be checked at regex101 .

    
25.07.2017 / 15:51
1

REGEX Expression

(([A-Za-zÁÀÂÃÉÈÊÍÏÓÔÕÖÚÇÑáàâãéèêíïóôõöúçñ])([A-Za-zÁÀÂÃÉÈÊÍÏÓÔÕÖÚÇÑáàâãéèêíïóôõöúçñ]{3,}))|([A-Za-zÁÀÂÃÉÈÊÍÏÓÔÕÖÚÇÑáàâãéèêíïóôõöúçñ])

Replacement Expression

\U$2\E\L$3\E\L$4\E

Explanation

Let's split the regex expression into its parts:

1 (
2    ([A-Za-zÁÀÂÃÉÈÊÍÏÓÔÕÖÚÇÑáàâãéèêíïóôõöúçñ])
3    ([A-Za-zÁÀÂÃÉÈÊÍÏÓÔÕÖÚÇÑáàâãéèêíïóôõöúçñ]{3,})
4 )|
5 ([A-Za-zÁÀÂÃÉÈÊÍÏÓÔÕÖÚÇÑáàâãéèêíïóôõöúçñ])
  • [A-Za-zÁÀÂÃÉÈÊÍÏÓÔÕÖÚÇÑáàâãéèêíïóôõöúçñ] : extended character set for accents used in Portuguese and some bonus letters.
  • The first group defined in lines 1 to 4 above captures all words with 4 or more characters and divides these words into two groups, $2 with the first letter of the word (which must be uppercase) and $3 with the remainder of the word (which must be lowercase).
  • The group $4 defined in the 5 line above captures all previously untaken characters (which will belong to words with 3 or fewer characters)
  • The substitution expression uses the groups and special conditions:

    • \U\E : indicates that what is between \U and \E must be
    • \L\E : indicates that what is between \L and \E must be small
  • 25.07.2017 / 16:21