Divide a string that contains scores

4

I'm trying to split the following string

Eu irei amanhã à casa. E tu vens?

To get the following result inside an array in php

array(
    [0] => eu
    [1] => irei
    [2] => amanhã
    [3] => à
    [4] => casa
    [5] => .
    [6] => E
    [7] => tu
    [8] => vens
    [9] => ?
)

Thanks for any help.

    
asked by anonymous 25.01.2017 / 19:38

3 answers

9

If they were just spaces, it would be a case of

$partes = explode( ' ', $todo );

One solution, depending on what you want, would be to force a space before the characters you want to treat as isolates:

$todo = str_replace( array( '.', ',' ,'?' ), array( ' .', ' ,', ' ?'), $todo );
$partes = explode( ' ', $todo );

See working at IDEONE .

Note that I've placed valid separators directly in replace, but if you want to do this with a series of characters, it makes up for a more complex function.

If you prefer to consider all of the alphanumeric characters separated from the symbols, you can use a RegEx , and solve on one line:

preg_match_all('~\w+|[^\s\w]+~u', $todo, $partes ); 

See working at IDEONE .

In addition, it would be the case to add spaces before and after the symbols, remove double spaces, depends on the criteria. The intention of the response was to take an initial turn.

    
25.01.2017 / 19:44
7

A more general approach would be to use regex to solve the problem.

$string = "Eu irei amanhã à casa. E tu vens?";

/*
    Adiciona um espaço em todos os boundaries da string
    Ex.: Início e fim de palavras, pontuações, etc...
    O modificador 'u' e para tratar a string como Unicode
*/
$resultado = preg_replace('/\b/u', ' ', $string);

// Cria um array usando como delimitador um regex que casa com qualquer espaço
$resultado = preg_split('/\s+/', trim($resultado));

var_dump($resultado);

output :

array(10) {
  [0]=>
  string(2) "Eu"
  [1]=>
  string(4) "irei"
  [2]=>
  string(7) "amanhã"
  [3]=>
  string(2) "à"
  [4]=>
  string(4) "casa"
  [5]=>
  string(1) "."
  [6]=>
  string(1) "E"
  [7]=>
  string(2) "tu"
  [8]=>
  string(4) "vens"
  [9]=>
  string(1) "?"
}

I made this sample code to illustrate.

    
25.01.2017 / 20:57
3

Based on 99.9% of @bacco's response ... blatantly!

preg_match_all('~\b\w[\w\-.*#]*\w\b|\w|\.\.\.|[,.:;()[\]?!]|\S~u', $t, $ps);
print_r($ps)

( [0] => Array
   (    [0] => Baseando-me
        [1] => 99.9
        [2] => %
        [3] => na
        [4] => resposta
        [5] => do
        [6] => @
        [7] => bacco
        [8] => ...
        [9] => descaradamente
        [10] => !
    )
)

Upgrade Actually tokenize text in your elements is sometimes complex: text is not just simple words ...

A somewhat more robust approach (employing% of better readability):

  preg_match_all('~
           https?://\S+             ## url
         | \d+/\d+/\d+              ## data
         | \b\w [\w\-.*#]* \w\b     ## vou-me 12.2 f.html
         | \w
         | \.\.\.                   ## ...
         | [!?]+                    ## ???   ?!
         | [,.:;()[\]]
         | \S
         ~ux', $todo, $partes );
     print_r($partes)

Danger: (not tested ...)

    
28.01.2017 / 22:49