Count how many columns there are in a CSV file with C ++

3

I am doing a project for an electronic ballot box, and for this I need to read a csv file where there is information about each candidate.

As this csv file has a lot of information that is not relevant, I decide to use only certain columns of the csv file, for example: NM_CANDIDATO, NM_PARTIDO, ...

The solution I thought was to start a counter to save the "index" of the desired columns, but I can not determine the end of the first row, and the index continues to be incremented with all the data.

"DT_GERACAO"; "SG_PARTIDO"  ;  "HH_GERACAO"
"03/09/2018"; "DC"          ;   "08:01:43"  
"03/09/2018"; "MDB"         ;   "08:01:43"
"03/09/2018"; "PODE"        ;   "08:01:43"

In this example, only the column SG_PARTIDO interests me. Thus, a counter i is initialized i = 1 and during the getline () of the first row is incremented. When a desired column is found, the position of that column is saved, so when the counter is initialized i = 1 on the next line some action is performed on the desired column.

The code I wrote is the one below:

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main()
{
    ifstream file("presidentes.csv");

    if(!file.is_open())
    {
        cout<<"Erro! Não foi possível abrir esse arquvio"<<'\n';
    }

    string buffer;
    int i=1 p_SG_PARTIDO = 0;

    while(!file.eof())
    {
        getline(file, buffer, ';');

        if(buffer == "SG_PARTIDO" || i == p_SG_PARTIDO)
        {
            p_SG_PARTIDO = i;
            cout << buffer;
        }

        i++;

        if(buffer == "\n") i=1;
    }

    file.close();
    return 0;

}

This buffer condition is never true. I suspect the reason is a double quotation mark, "" SG_PARTIDO "". When I removed the first and last character, before the comparison, this condition becomes true, but I continue with the problem of not knowing when the first line ends.

The code that removes the character is this below:

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main()
{
    ifstream file("presidentes.csv");

    if(!file.is_open())
    {
        cout<<"Erro! Não foi possível abrir esse arquvio"<<'\n';
    }

    string buffer;
    int i=1;
    int p_ds_cargo = 0;

    while(!file.eof())
    {
        getline(file, buffer, ';');

        if(buffer[0]=='"') // remover """"
        {
            buffer.erase(0,1);
            buffer.erase(buffer.size() - 1);
        }

        if(buffer == "DT_GERACAO")
        {
            p_ds_cargo = i;
            cout << buffer << endl;   
        }

        if(buffer == "\n") i = 1;

        i++;
    }



    file.close();
    return 0;
}

I would appreciate it if anyone knows an easier way to read only specific columns in a csv file.

The link to the csv I'm using is this: link

    
asked by anonymous 08.09.2018 / 05:55

1 answer

2

Problems

The csv file has all the contents inside double quotation marks ( " ) so the comparison it has will never work:

if(buffer == "SG_PARTIDO"

Work around this problem by comparing with quotation marks:

if(buffer == "\"SG_PARTIDO\""

Or remove quotation marks from the read value before comparing:

if(buffer.substr(1, buffer.size() - 2) == "SG_PARTIDO"

However, the line wrap test also does not work:

if(buffer == "\n") i=1;

Because reading with getline is always done up to ; soon will never read only the line break.

Another Approach

I suggest another approach to the problem, which turns out to be more robust and also allow you to get multi-column data, something that would get more complicated the way you were doing.

The idea is:

  • Read each line normally with getline prevailing the normal delimiter \n
  • Get each column of each line through getline by changing the delimiter to ;
  • Stores each row in vector<string> and all rows in vector<vector<string> >

Reading

There are many ways to try to read a csv in an array, but I have chosen one that I consider simple and that allows you to get the information you want.

Implementation:

int main()
{
    //...
    string buffer;
    vector<vector<string> > linhas; //vetor de vetor para toads as linhas

    while(!file.eof())
    {
        getline(file, buffer); //ler cada linha
        stringstream ss(buffer); //colocar a linha lida num stringstream

        vector<string> linha; //iniciar o vetor da linha
        while (getline(ss, buffer, ';')) { //ler cada coluna
            linha.push_back(buffer); //adicionar ao vetor da linha
        }

        linhas.push_back(linha); 
    }
    //...

Note that I made use of both vector and stringstream and so I needed two includes additional:

#include <vector>
#include <sstream>

Another possibility here is to add only the text already without the quotation marks, which will facilitate later when you need to show and compare, changing linha.push_back(buffer); to linha.push_back(buffer.substr(1, buffer.size() - 2); , but in the rest of the answer I assume they were added with the quotation marks .

Using a Column

A naive but not very efficient yet simple way to get information for all rows relative to the SG_PARTIDO column is:

for (size_t i = 0; i < linhas.size(); ++i){
    for (size_t j = 0; j < linhas[i].size(); ++j){
        //se na primeira linha desta coluna tem SG_PARTIDO
        if (linhas[0][j] == "\"SG_PARTIDO\""){ 
            cout << linhas[i][j].substr(1, linhas[i][j].size() - 2); //mostrar sem as "
        }
    }
    cout << endl;
}

Of course this assumes that the first line has the csv headers. I used substr to show only the content excluding the double quotes.

Test example on my machine:

Usingmultiplecolumns

Ifyouareinterestedinseveralcolumnsyoucanbuildavectorwiththeindicesofthecolumnsyouareinterestedinandthenjustiterateoverthose:

vector<int>colunasRelevantes;for(size_ti=0;i<linhas[0].size();++i){stringnomeCol=linhas[0][i].substr(1,linhas[0][i].size()-2);if(nomeCol=="SG_PARTIDO" || nomeCol == "NM_CANDIDATO" || nomeCol == "NM_PARTIDO"){
        colunasRelevantes.push_back(i);
    }
}

for (size_t i = 0; i < linhas.size(); ++i){
    for (size_t j = 0; j < colunasRelevantes.size(); ++j){
        int coluna = colunasRelevantes[j];
        string texto = linhas[i][coluna].substr(1, linhas[i][coluna].size() - 2);
        cout << texto << "\t";
    }
    cout << endl;
}

It's important to mention that I had to remove the last blank line from the file so that I can not get error accessing columns that do not exist.

In C ++ 11 these loops are simpler, but I did not do it initially so I was not already showing syntax that might be new. Still I'll leave it to stay as a reference:

//Esta parte fica igual
vector<int> colunasRelevantes;
for (size_t i = 0; i < linhas[0].size(); ++i){
    string nomeCol = linhas[0][i].substr(1, linhas[0][i].size() - 2);
    if (nomeCol == "SG_PARTIDO" || nomeCol == "NM_CANDIDATO" || nomeCol == "NM_PARTIDO"){
        colunasRelevantes.push_back(i);
    }
}

//Aqui c++ enhanced for loop
for (auto linha : linhas){
    for (auto coluna: colunasRelevantes){
        string texto = linha[coluna].substr(1, linha[coluna].size() - 2);
        cout << texto << "\t\t\t";
    }
    cout << endl;
}

If you need to use many columns then it becomes easier to use a vector for these names and construct the indices based on double for :

vector<string> nomesColunasRelevantes = {"SG_PARTIDO", "NM_CANDIDATO", "NM_PARTIDO"};
vector<int> colunasRelevantes;
for (size_t i = 0; i < linhas[0].size(); ++i){
    string nomeCol = linhas[0][i].substr(1, linhas[0][i].size() - 2);
    for (string nome : nomesColunasRelevantes){
        if (nome == nomeCol){
            colunasRelevantes.push_back(i);
        }
    }
}

Count columns

Now it's equally easy to answer the question you have in the title of the question:

  

Count how many columns there are in a CSV file with C ++

Just access size() of any of the lines:

cout << linhas[0].size();

That gives 58 to the displayed file.

    
08.09.2018 / 13:31