How to separate a string into pieces?

7

In other languages there is split , explode or something similar that chucks string into chunks according to some separator. Is there something C ready or do I have to do on hand?

    
asked by anonymous 17.03.2017 / 12:20

2 answers

6

There's something so ready, but there's strtok() that parses string and replaces a delimiter specified by a null character and so what was a single string will become several, since the null ends the string at that point .

But note that it does not return an array of strings as is common in other languages, so it does not do all delimiters. It does only with the first one you find, so in the second you need to strtok() wheel again and so on. Of course, every C programmer does some utilitarian function (s) to facilitate and deliver what you want.

#include <stdio.h>
#include <string.h>

int main(void) {
    char frutas[] = "banana,laranja,morango";
    int tamanho = strlen(frutas); //isto funciona só para delimitador de 1 caractere
    char *token = strtok(frutas, ",");
    for (int i = 0; i < tamanho; i++) {
        printf(token[i] == 0 ? "\0" : "%c", token[i]);
    }
    while(token != NULL) {
        printf("\n%s", token);
        token = strtok(NULL, ",");
    }
}

See running on ideone . And at Coding Ground . Also put it on GitHub for future reference .

    
17.03.2017 / 12:20
1

In C ++ there is no native 'split' function for strings.

Searching the subject finds a huge variety of ways to separate a string.

Some examples I find interesting.

Example 1

#include <iostream>
#include <sstream>
#include <string>
#include <vector>
using namespace std;

int main()
{
   // string a ser separada
   string tokenString { "aaa     bbb ccc" };

   // as sub-strings separadas vão ser colocadas neste vetor
   vector<string> tokens;

   // stream de strings de input inicializado com a string a ser separada
   istringstream tokenizer { tokenString };

   // variável de trabalho
   string token;

   // separa as string por espaço e coloca no vetor destino
   while (tokenizer >> token)
     tokens.push_back(token);

   // mostra na tela as sub-strings separadas
   for (const string& token: tokens)
       cout << "* [" << token << "]\n";
}                                                            

Result of example 1:

* [aaa]
* [bbb]
* [ccc]

Example 2

#include <iostream>
#include <sstream>
#include <string>
#include <vector>
using namespace std;

int main()
{
   // string a ser separada
   string tokenString { "aaa, bbb, ccc,,ddd   ,   eee" };

   // as sub-strings separadas vão ser colocadas neste vetor
   vector<string> tokens;

   // stream de strings de input inicializado com a string a ser separada
   istringstream tokenizer { tokenString };

   // variável de trabalho
   string token;

   // separa as sub-strings por vírgula e coloca no vetor destino
   while (getline(tokenizer, token, ','))
      tokens.push_back(token);

   // mostra na tela as sub-strings separadas
   for (const string& token: tokens)
       cout << "* [" << token << "]\n";
}

Result of example 2:

* [aaa]
* [ bbb]
* [ ccc]
* []
* [ddd   ]
* [   eee]

Note that the spaces in the destination sub-strings have been kept. (It would be the case to use another common function for strings called 'trim' that does does not exist in C ++).

Example 3

#include <iostream>
#include <regex>
#include <string>
#include <vector>
using namespace std;

int main()
{
   // string a ser separada
   string tokenString { "aaa, bbb, ccc,,ddd   ,   eee" };

   // as sub-strings separadas vão ser colocadas neste vetor
   vector<string> tokens;

   // expressão regular contendo os delimitadores: espaço e vírgula
   regex delimiters { "[\s,]+" };

   // cria um iterator para um objeto contendo as sub-strings separadas
   // obs. estou usando uma "receita" pronta, não sei o motivo exato do parametro '-1'
   sregex_token_iterator tokens_begin { tokenString.begin(), tokenString.end(), delimiters, -1 };

   // iterator finalizador
   auto tokens_end = sregex_token_iterator {};

   // copia as sub-strings separadas para o vetor destino
   for (auto token_it = tokens_begin; token_it != tokens_end; token_it++)
      tokens.push_back(*token_it);

   // mostra na tela as sub-strings separadas
   for (const string& token: tokens)
       cout << "* [" << token << "]\n";
}

Result of example 3:

* [aaa]
* [bbb]
* [ccc]
* [ddd]
* [eee]

That's all for now folks.

    
19.03.2017 / 00:24