How to give split in a String that contains blanks in the beginning?

8

For the problem in question, I need to remove all special characters and spaces and count the possible new outputs. My intention is to separate the String with the split() method. For this, based on another expression I saw, I created this:

String[] d = s.split("[,.!?'@_] *| +");

It works. The problem is that if they have too many blanks before the expression starts, it does not delete them. I tried to put space before, in the expression, but it did not work. Can someone help me? Here is the complete code:

        String s = "           YES      leading spaces        are valid,    problemsetters are         evillllll";
        String[] d = s.split("[,.!?'@_] *| +");
        int i, c = d.length;
        System.out.println(c);
        for(i = 0; i < d.length; i++){
            System.out.println(d[i]);
        }

The output produced is this:

9

YES
leading
spaces
are
valid
problemsetters
are
evillllll

But it must be this:

8
YES
leading
spaces
are
valid
problemsetters
are
evillllll
    
asked by anonymous 20.08.2018 / 22:18

2 answers

10

This is a known problem with split when String starts with spaces, already discussed in SOen .

The simplest solution is to use method trim() , which removes the start and end spaces of String , and then split :

String[] d = s.trim().split("[,.!?'@_] *| +");

This will give you the desired output (the array with 8 elements).

    
20.08.2018 / 23:01
6

Let's look at this simpler case:

class TesteRegex {
    public static void main(String[] args) {
        String s = " A B C ";
        String[] d = s.split(" ", 5);
        System.out.println(d.length);
        for (int i = 0; i < d.length; i++) {
            System.out.println("[" + d[i] + "]");
        } 
    }
}

It produces output:

5
[]    
[A]
[B]
[C]
[]

See this running on ideone.

The problem is that the space is seen as a separator . So the first space separates the start of String from A , the second space separates A from B , the third space separates the B of C , the fourth space separates the C from the end of the string. In this way, we would have 5 resulting particles: the beginning, A , B , C and final.

However, if you remove , 5 from the above code, only the first four will come. The reason can be seen in the javadoc method split that says this:

  

Trailing empty strings are therefore not included in the resulting array.

Translating:

  

Empty strings in the end therefore are not included in the resulting array.

However, there is no rule for empty strings at the beginning ( leading empty strings ), there is only rule for the strings at the end.

Looking at the code of the split(String, int) method, it is concerned with removing empty strings at the end when limit (which is the second parameter of split ) is zero:

        // Add remaining segment
        if (!limited || list.size() < limit)
            list.add(substring(off, length()));

In the analog method of class java.util.regex.Pattern also:

    // Add remaining segment
    if (!matchLimited || matchList.size() < limit)
        matchList.add(input.subSequence(index, input.length()).toString());

But he does not bother to do this with empty strings at first.

I'm not sure what the reason for this behavior is. I think it might have something to do with this , but I'm not sure. However, whatever the motivation, this is a purposeful behavior and is not something accidental. Also, this behavior could not be changed due to compatibility issues.

So, the solution is you use trim() or else check if the first element is blank and ignore it or remove it if that is the case.

    
20.08.2018 / 23:05