I'm having a hard time understanding this mechanism.
In English it would just be:
import nltk
tag_word = nltk.word_tokenize(text)
Being text
is the English text that I would like " tokenizar ", which happens very well, but in Portuguese I still can not find any examples.
I am disregarding here the previous steps of stop_words
and sent_tokenizer
, just to make it clear that my question is regarding tokenization .