How to tokenize words in Portuguese using NLTK?

Question

How to tokenize words in Portuguese using NLTK?

Navigation

#1 by (4 votes)

4

I'm having a hard time understanding this mechanism.

In English it would just be:

import nltk
tag_word = nltk.word_tokenize(text)

Being text is the English text that I would like " tokenizar ", which happens very well, but in Portuguese I still can not find any examples. I am disregarding here the previous steps of stop_words and sent_tokenizer , just to make it clear that my question is regarding tokenization .

python-3.x natural-language

asked by anonymous 21.07.2017 / 00:40

1 answer

Set MIME-type of a file on Android preventDefault does not work

score 4 · Accepted Answer

4

import nltk    
from nltk import tokenize    
palavras_tokenize = tokenize.word_tokenize(text, language='portuguese')

15.10.2017 / 14:31