How to tokenize words in Portuguese using NLTK?

4

I'm having a hard time understanding this mechanism.

In English it would just be:

import nltk
tag_word = nltk.word_tokenize(text)

Being text is the English text that I would like " tokenizar ", which happens very well, but in Portuguese I still can not find any examples. I am disregarding here the previous steps of stop_words and sent_tokenizer , just to make it clear that my question is regarding tokenization .

    
asked by anonymous 21.07.2017 / 00:40

1 answer

4
import nltk    
from nltk import tokenize    
palavras_tokenize = tokenize.word_tokenize(text, language='portuguese')    
    
15.10.2017 / 14:31