How to split a string in C ++?

8

I received this simple (I really thought it was!) challenge of creating a "tokenizer". I had to split string " O rato roeu a roupa do rei de roma " into spaces. So, after a long time, I developed the following algorithm using vectors, the header algorithm and string s.

#ifndef STR_PARSE_HPP
#define STR_PARSE_HPP
#include <algorithm>
#include <string>
#include <vector>
using std::reverse;
using std::vector;
using std::string;
    vector<string> split(string str, char delimiter = ' ')
    {
        vector<string> ret;
        if((str.find(delimiter) == string::npos) && (str.find_first_not_of(delimiter) == string::npos)) throw nullptr;
        else if ((str.find(delimiter) == string::npos)) ret.push_back(str);
        else if(str.find_first_not_of(delimiter) == string::npos) ret.push_back(string(""));
        else
        {
            unsigned i = 0;
            string strstack;
            while(str[0] == delimiter) {str.erase(0,1);}
            reverse(str.begin(), str.end());
            while(str[0] == delimiter) {str.erase(0,1);}
            reverse(str.begin(), str.end());
            while(!str.empty())
            {
                ret.push_back(str.substr(i, str.find(delimiter)));
                str.erase(0,str.find(delimiter));
                while(str[0] == delimiter) {str.erase(0,1);}
            }
        }
        return ret;
    }
#endif // STR_PARSE_HPP

The test:

#include <iostream>
#include "str_parse.hpp"
using std::string;
using std::cout;


int main()
{
    string a = "        O    rato roeu a roupa do rei de roma             ";
    for(int i = 0; i < split(a).size(); i++)
    cout << split(a)[i];
}

The output was as expected:

O
rato
roeu
a
roupa
do
rei
de
roma

So, since I lost a "bit" of time, I decided to test with other delimiters. The crash is instantaneous, and the debugger here is "spoiled" (the breakpoints pass straight). What's wrong with my code?

    
asked by anonymous 03.03.2014 / 20:31

4 answers

5

Looking at your code, the first thing I noticed was that it does not have its main function to be the way you posted it, that does not even compile, I think something should be missing from your post. Assuming you did something similar to this:

int main()
{
    string a = "        O    rato roeu a roupa do rei de roma             ";
    vector<string> split_vector = split(a);
    for(unsigned int i = 0; i < split_vector.size(); i++) {
    cout << split_vector[i];
    }
}

The problem of your split function is that in the following snippet:

while(!str.empty())
 {
    ret.push_back(str.substr(i, str.find(delimiter)));
    str.erase(0,str.find(delimiter));
     while(str[0] == delimiter) {str.erase(0,1);}
 }

This while

while(str[0] == delimiter) {str.erase(0,1);}

Do not test if the string has reached the end, so the crash occurs. The correct would be to only execute this if it is not an empty string.

while(!(str.empty()) && (str[0] == delimiter)) {str.erase(0,1);}

In fact it would be wise to take the test whenever you have code similar to that. I have refined your function with what I believe to be correct, I've removed some tests that I find unnecessary:

vector<string> split(string str, char delimiter = ' ')
{
    vector<string> ret;
    if(str.empty()) 
    {
        ret.push_back(string(""));
        return ret;
    }

    unsigned i = 0;
    string strstack;
    while(!(str.empty()) && (str[0] == delimiter)) {str.erase(0,1);}
    reverse(str.begin(), str.end());
    while(!(str.empty()) && (str[0] == delimiter)) {str.erase(0,1);}
    reverse(str.begin(), str.end());
    while(!str.empty())
    {
        ret.push_back(str.substr(i, str.find(delimiter)));
        str.erase(0,str.find(delimiter));
        while(!(str.empty()) && (str[0] == delimiter)) {str.erase(0,1);}
    }

    return ret;
}
    
04.03.2014 / 00:16
7

It is interesting to note that stdlib often has quite similar algorithms that can be used for its purposes. If the delimiter is always the space, you can rely on reading strings from streams. Here's how:

vector<string> split(const string& str) {
    stringstream ss(str);
    vector<string> vec {istream_iterator<string>{ss}, istream_iterator<string>{}};
    return vec;
}

Or: (as suggested @pepper_chico )

vector<string> split(const string& str) {
    stringstream ss(str);
    return {istream_iterator<string>{ss}, istream_iterator<string>{}};
}

Example: coliru .

In case you have different delimiters, you can use another ready-made function: a getline . Although the name does not look like it does exactly what you want it to do: break a string. The detail is that the default delimiter is the line break, hence the name. Use this:

vector<string> split(const string& str, char delim=' ') {
    stringstream ss(str);
    string tok;
    vector<string> vec;
    while (getline(ss, tok, delim)) {
        if (!tok.empty())
            vec.push_back(tok);
    }
    return vec;
}

Example: coliru .

    
04.03.2014 / 01:02
2

Look at @Selma has already answered the problem with your code, so I will only share an alternative implementation.

#include <string>
#include <vector>

using namespace std;

vector<string> split(string str, char delimiter = ' ')
{
    vector<string> ret;

    int start = 0;

    for(int i = 0; i < str.length(); ++i) {
        if(str[i] == delimiter) {
            ret.push_back(str.substr(start, i-start));
            start = i+1;
        }
    }

    ret.push_back(str.substr(start, start - str.length()));

    return ret;
}

This implementation returns the "items".

#include <string>
#include <vector>

using namespace std;

vector<string> split(string str, char delimiter = ' ')
{
    vector<string> ret;

    int start = 0;

    for(int i = 0; i < str.length(); ++i) {
        if(str[i] == delimiter) {
            if(i - start != 0)
                ret.push_back(str.substr(start, i-start));
            start = i+1;
        }
    }

    if(str.length() - start != 0)
        ret.push_back(str.substr(start, start - str.length()));

    return ret;
}

In this I added two if 's to ignore empty "items."

    
04.03.2014 / 00:44
0

I did not want to wake up this dead question, but I've refined the algorithm now that I know more about iterators.

#include <iostream>
#include <string>
#include <vector>
std::vector<std::string> tokenize(std::string str, char delimiter = ' ')
{
    std::vector<char> string_ret;
    std::vector<std::string> ret;
    for(auto a : str)
    {
        if(a == delimiter)
        {
            std::cout << "Delimiter found >>" << a << "<<" <<  std::endl;
            if(!string_ret.empty())
            {
                std::cout << "Pushing string to ret!\n";
                std::string push;
                for(auto b : string_ret)
                {
                    push.push_back(b);
                }
                string_ret.clear();
                ret.push_back(push);
            }
            else std::cout << "Delimiter found, but string return is empty!" << std::endl;
        }
        else
        {
            std::cout << "char which is not delimiter found! >>" << a << "<<" << std::endl;
            string_ret.push_back(a);
        }
    }
    if(!string_ret.empty())
    {
        std::string push;
        for(auto b : string_ret)
        {
            push.push_back(b);
        }
        ret.push_back(push);
    }
    std::cout << "\n\n\n\n\n";
    return ret;
}

int main()
{
    for(auto a : tokenize("O rato roeu a roupa do rei de roma", 'r'))
    std::cout << a << std::endl;
}
    
06.07.2014 / 16:28