How to create a dictionary with a word and its adjacent one from a string?

0

I have the following string :

texto = "We are not what we should be We are not what we need to be But at least we are not what we used to be"

This would be the return I wanted: for every word in the text, it receives the word next to it.

Example:

retorno = {‘we’: [‘are’, ‘should’, ‘are’, ‘need’, ‘are’, ‘used’], ‘are’: [‘not’, ‘not’]}
    
asked by anonymous 11.07.2018 / 00:08

3 answers

4

To get each word and its adjacent word, we can divide the string in the blanks and use the zip function to group them in pairs:

texto = "We are not what we should be We are not what we need to be But at least we are not what we used to be"
palavras = texto.split()

for a, b in zip(palavras, palavras[1:]):
    ...

Since you want to generate a list dictionary, we can use collections.defaultdict to simplify:

from collections import defautdict

resultado = defaultdict(list)

texto = "We are not what we should be We are not what we need to be But at least we are not what we used to be"
palavras = texto.split()

for a, b in zip(palavras, palavras[1:]):
    resultado[a.lower()].append(b.lower())

Thus, resultado will be the representation of:

{
    'we': ['are', 'should', 'are', 'need', 'are', 'used'], 
    'are': ['not', 'not', 'not'], 
    'not': ['what', 'what', 'what'], 
    'what': ['we', 'we', 'we'], 
    'should': ['be'], 
    'be': ['we', 'but'], 
    'need': ['to'], 
    'to': ['be', 'be'], 
    'but': ['at'], 
    'at': ['least'], 
    'least': ['we'], 
    'used': ['to']
}
    
11.07.2018 / 00:20
2

First you break the sentence into words:

words = texto.lower().split()

With this list of words, just iterate over it, attaching the next word. So that you do not have much work, you can use collections.defaultdict , which will create a list dictionary for you. The code looks like this:

import collections
adjacente = collections.defaultdict(list)

for (i, word) in enumerate(words[:-1]):
    next_word = words[i + 1]
    adjacente[word].append(next_word)

Remembering that we do -1 to get n - 1 words, since the last word has no words adjacent to it.

And the result:

adjacente
defaultdict(list,
        {'But': ['at'],
         'We': ['are', 'are'],
         'are': ['not', 'not', 'not'],
         'at': ['least'],
         'be': ['We', 'But'],
         'least': ['we'],
         'need': ['to'],
         'not': ['what', 'what', 'what'],
         'should': ['be'],
         'to': ['be', 'be'],
         'used': ['to'],
         'we': ['should', 'need', 'are', 'used'],
         'what': ['we', 'we', 'we']})

If you wanted the words to be unique, change it from defaultdict from list to set and instead of append, use update passing a vector with next_word.

    
11.07.2018 / 00:19
0

I did it too.

texto = "We are not what we should be We are not what we need to be But at least we are not what we used to be"

lista = texto.lower().split()

dic = {}

for i in range(len(lista) - 1):
    current = lista[i]
    next_ = lista[i + 1]
    if current not in dic:
        dic[current] = []

    dic[current].append(next_)

print(dic)
    
11.07.2018 / 20:15