Delete words from a file

1

I have a file CSV and I have this list of words here: link

The CSV file has 3 columns, Text, user.name, and Class, and has about 100K of rows. I need to exclude from the first column of CSV all the words that appear in the list.

Can you help me?

    
asked by anonymous 02.04.2017 / 19:25

1 answer

1

Using Perl (sorry ...) but easy to translate for awk

$ cat stoplist.txt 
de
a
o ....

$ cat ex.cvs 
meu caro amigo;jjoao;classe a
eu ando a aprender weka;Thyago;classe b
mas a sua sintaxe dá-me algumas dores de cabeça;Thyago;classe a

Be rmstopwords the following Perl file:

BEGIN{  $patt="que";      ## contruir uma regexp reg com as palavras
  open(G,"stoplist.txt");
  while(<G>){chomp; 
    $patt.="|$_" if $_    ## patt="que|de|a|o|..."
  }
}

$F[0] =~ s/\b($patt)\b//g;  ## no primeiro campo, subst(patt por "") 
print join(";",@F)

Applied to our file ex.csv gives:

$ perl -naC -F';' rmovestopwords t.cvs 
 caro amigo;jjoao;classe a
 ando  aprender weka;Thyago;classe b
    sintaxe dá- algumas dores  cabeça;Thyago;classe a
    
06.04.2017 / 20:16