I'm trying to apply the NMF algorithm in a csv and then extract the phrases attached to each topic
import pandas
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF
def display_topics(model, feature_names, no_top_words):
for topic_idx, topic in enumerate(model.components_):
print "Topic %d:" % (topic_idx)
print " ".join([feature_names[i]
for i in topic.argsort()[:-no_top_words - 1:-1]])
textos = pandas.read_csv('teste_nmf.csv', encoding = 'utf-8')
textos_limpos = textos['frase_limpa']
textos_bruts = textos['frase_brut']
textos_bruts_list = textos_bruts.values.tolist()
textos_limpos_list = textos_limpos.values.tolist()
tfidf_vectorizer = TfidfVectorizer()
tfidf = tfidf_vectorizer.fit_transform(textos_limpos_list)
tfidf_feature_names = tfidf_vectorizer.get_feature_names()
#n_components: numero de topicos
nmf = NMF(n_components = 2, random_state = 1, alpha = .1, l1_ratio = .5, init = 'nndsvd').fit(tfidf)
#Numero de palavras por topico
no_top_words = 2
#Visualizaçao dos topicos com as palavras
print 'NMF'
topics = display_topics(nmf, tfidf_feature_names, no_top_words)
print topics
#extrair frases ligadas aos topicos
for topic in range(len(topics)): #TypeError: object of type 'NoneType' has no len()
print "Topic {}:".format(topic)
docs = np.argsort(document_topics[:, topic])[::-1]
for text in docs[:3]:
text_brut = " ".join(textos_bruts_list[text].split(",")[:2])
print " ".join(textos_limpos_list[text].split(",")[:2]) + ',' + text_brut
A sample (rough) dataset:
frase_limpa,frase_brut
manga fruta gostosa,a manga é uma fruta gostosa
computador objeto importante,o computador é um objeto importante
banana fruto popular,a banana é um fruto popular
lapis coisa importante,o lapis é uma coisa importante
uva roxa,a uva é roxa
telefone objeto mundial,o telefone é um objeto mundial
My result:
NMF
Topic 0:
important object
Topic 1:
purple grape
None
Traceback (most recent call last): File "test_NMF.py", line 55, in
TypeError: object of type 'NoneType' has no len ()
What I expected more or less:
Topic 0:
important object
Topic 1:
purple grape
Topic 0:
Important computer object, the computer is an important object
world object phone, the phone is a worldwide object
Lapis important thing, lapis is an important thing
Topic 1:
Purple grape, the grape is purple