How to create a regex to filter and delete files with a certain length in the name

3

I'm trying to figure out a way to delete files that windows duplicates when making multiple copies. I was able to do something by creating the code below:

import java.util.*;
import java.io.*;


public class FileCreator{

    public static void main(String[] args) throws Exception{

        File f = f = new File(".");
        File[] files = f.listFiles();

        for(File fl : files){
            String fileName = fl.getName();
            if(fileName.contains("Copia - Copia")){
                    System.out.println(fileName);
            }

        }
    }
}

I have created some files, as follows the print below:

Andtheresultwas:

C:\Users\diego\Desktop\Nova pasta>java FileCreator File 0 - Copia - Copia.txt File 10 - Copia - Copia.txt File 12 - Copia - Copia.txt File 14 - Copia - Copia.txt File 16 - Copia - Copia.txt File 18 - Copia - Copia.txt File 2 - Copia - Copia.txt File 4 - Copia - Copia.txt File 6 - Copia - Copia.txt File 8 - Copia - Copia.txt

This form even caters to me, since I just replace the text output of the condition within the loop with a simple fl.delete(); but I would like to have more control over what is deleted by using a regex.

I started to do something as below, but I could not create a regex that could detect " Copia - Copia " exactly at the end of the file name, and then delete it.

    Pattern p = Pattern.compile("");
    Matcher m;

    f = new File(".");
    File[] files = f.listFiles();

    for(File fl : files){
        String fileName = fl.getName();
        m = p.matcher(fileName);
        if(m.find()){
            //fl.delete();
            System.out.println(fileName + " deletado");
        }
    }

How do I make a regex that filters these files?

Note: detecting the extension is irrelevant, I only need to detect the Copia - Copia which is how windows renames duplicates of duplicates, adding at the end of the file name.

    

asked by anonymous 25.11.2016 / 22:57

4 answers

3

The regex can be thus Copia - Copia.[^.]+$

Explanation:

Copia - Copia\.[^.]+$
^                ^   ^
1                2   3
  • The Copia - Copia\. is the part you want to find

  • [^.] the sign of ^ if it is within [...] indicates negation, ie any character within [^....] will be ignored in match , then after the points I used it so that anything could be the file extension, minus another point.

  • The $ is what defines that the file name (the String ) should end exactly as it comes before, in case it should end with Copia - Copia.[qualquer extensão]

  •   

    As an alternative you can use C[oó]pia - C[oó]pia\.[^.]+$ if there are situations with accents and no accents, note that it varies if it is unicode

         

    The usage would look something like final Pattern regex = Pattern.compile("C[oó]pia - C[oó]pia\.[^.]+$");

    An example with List<String> to test:

    import java.util.ArrayList;
    import java.util.List;
    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    
    class Exemplo
    {
        public static void main(String[] args)
        {
            final Pattern regex = Pattern.compile("Copia - Copia\.[^.]+$");
    
            List<String> files = new ArrayList<String>();
    
            files.add("File 123 - Copia.txt");
            files.add("File 10 - Copia - Copia.java");
            files.add("File 12 - Copia.java");
            files.add("File 14 - Copia - Copia.txt");
            files.add("File 16 - Copia.txt");
            files.add("File 18 - Copia - Copia.log");
            files.add("File 2 - Copia.txt");
            files.add("File 4 - Copia.log");
            files.add("File 6 - Copia - Copia.txt");
            files.add("File 8 - Copia.txt");
    
            for (String file : files)
            {
                if (regex.matcher(file).find())
                {
                    System.out.println("Encontrado: " + file);
                }
            }
        }
    }
    

    Example on link

    It is also possible to use String.matches , but with it it will be necessary to add .* up front, because for some reason it ignores if this is not done, thus .*Copia - Copia\.[^.]+$ . However as @VictorStafusa said, maybe this might compromise performance a bit, depending on how many times you will run (I still can not confirm)

    Explanation:

    .*Copia - Copia\.[^.]+$
    ^  ^               ^   ^
    1  2               3   4
    

    It would look something like:

    for (String file : files)
    {
        if (file.matches("Copia - Copia\.[^.]+$"))
        {
            System.out.println("Encontrado: " + file);
        }
    }
    
  • .* looks for any (group of) characters that come before the desired text

  • The Copia - Copia\. is the part you want to find

  • [^.] the sign of ^ if it is within [...] indicates negation, ie any character within [^....] will be ignored in the match, so after points I used it so anything can be the file extension, minus another point.

  • The $ is what defines that the file name (the String ) should end exactly as it comes before, in case it should end with [qualquer caractere]Copia - Copia.[qualquer extensão]

  •   

    As an alternative you can use .*C[oó]pia - C[oó]pia\.[^.]+$ if there are situations with accents and no accents, note that it varies if it is unicode

         

    The usage would look something like if (file.matches("C[oó]pia - C[oó]pia\.[^.]+$")) {

        
    26.11.2016 / 01:45
    2

    You can use the following regular expression:

    Pattern.compile("Copia - Copia\.[a-zA-Z]{3,4}$");
    

    Where:

    • Copia - Copia is the text you are looking for;
    • \. is the literal character . . The normal would be only \ however as the expression is in a string we have to escape it once;
    • [a-zA-Z] delimits that the character must be between a and z or A and Z ;
    • {3, 4} is related to the number of characters, which must be 3 or 4;
    • $ means that it is at the end of string ;

    That is:

      

    Search for the text Copia - Copia followed by a . , 3 or 4 letters from a to Z at the end of a string ;

        
    26.11.2016 / 02:55
    1

    Use this expression:

      

    (Copy - Copy)

    The parentheses define a group of characters to be 'captured' from the string.

    Enter this site link to see it working.

        
    25.11.2016 / 23:34
    0
    String trechoParaRemover = "(Copia - Copia)";
    fileName = filename.replace(trechoParaRemover,"");
    

    That would solve your case, I do not think you need a regular expression or delete ().

        
    26.11.2016 / 00:37