Why does not regex work?

0

I'm following the booklet OOP Programming with PHP5 by Hasin Hayder (2007) and got to the part of Unit Tests. In a given exercise, it builds a wordCount() method and creates some tests for this method,

class WordCount
{
    public function countWords($sentence)
    {
        return count(split(" ",$sentence));
    }
}

It creates a test in case the variable has more spaces: $this->assertEquals(4, $wordcount); (notice the space after John), which when we run returns "my name is john" .

To solve, it modifies the method and adds "my name is john " and that regex there Failure and its code works as expected, but I used the same thing and it did not work .

It still creates another test, where the variable is preg_replace and the same regex account.

class WordCount
{
    public function countWords($sentence)
    {
        $newsentence = preg_replace("~\s+~"," ",$sentence);
        return count(split(" ",$newsentence));
    }
}

I have already checked my code in order to find possible syntax or structure errors, but everything is ok, at least, the same as what it sends in the exercise.

To solve the first 2 tests, I found "~\s+~" that worked, but the test where "my name is \n\r john" did not passed.

Then I would like to know:

  • Why did not the regex it used work?
  • What would be a regex that removes extra spaces and the carriage return / new line ( preg_replace('/\s*$/','',$sentence); );
  • The complete codes used are here:

    asked by anonymous 11.03.2016 / 03:43

    1 answer

    5

    When trying to execute your code I had the following return:

    <?php
    
    function countWords($sentence)
    {
        $newsentence = preg_replace("/\s+/"," ",$sentence);
        return count(split(" ", $newsentence));
    }
    
    echo countWords("my name is john");
    echo countWords("my name \n\r is john");
    echo countWords("my name \n is john");
    echo countWords("my name \n\n\r is john");
    
      

    Deprecated: Function split () is deprecated in / in / MSmEj on line 6

         

    4

         

    Deprecated: Function split () is deprecated in / in / MSmEj on line 6

         

    4

         

    Deprecated: Function split () is deprecated in / in / MSmEj on line 6

         

    4

         

    Deprecated: Function split () is deprecated in / in / MSmEj on line 6

         

    4

    We have the first symptom that something is wrong. The split() function has been deprecated. That's not to say it will not work. But she expects another regular expression to break her string, and you're passing a blank, which is not the same as a meta space character \s to regex.

    The regex ~\s+~ itself is valid. PHP accepts any non-alphanumeric character or whitespace as regex delimiter. Usually% re_x is used in regex. Note also that // is a completely different regex from '/\s*$/' , basically by cause of quantifiers /\s+/ and + . Finally, prefer single quotation marks when using regex. In your example you use double quotation marks, which is not ideal.

    You can refactor this method to use * :

    function countWords($sentence)
    {
        return count(preg_split('~\s+~', $sentence));
    }
    

    Finally, there is a point of attention. Seek to study with updated materials . The book in question is from 2007 and a lot has changed in PHP in those 9 years. The chance that you are learning something that is no longer used today is too big.

    Looking further, the problem with this test case is implementation itself. The preg_split of the extra space case will even return an array with 5 elements, the last position being an empty string.

    To get the expected result, give split before in trim()

    return count(preg_split('~\s+~', trim($sentence)));
    
        
    11.03.2016 / 04:19