Doubt how to measure the number of words between two specific words in a string in R

5

Hello, People!

I'm working on a function in R that measures the amount of words between two specific words, I'm calling the function worDistance , it works as follows, you insert two arguments, given a string t, for example, palavra1 and palavra2 and it returns the number of words between word 1 and word 2, for example, since:

t <- "bom dia posso ajudar nao viu zunkz sabe tava pagar"

worDistance("bom","ajudar") # ela retorna o número 2. 

Note that the function reads the string t from left to right, when I reverse the word order to

worDistance("ajudar","bom")

it returns the number 0 . Instead of returning 2 , again, how can I solve this ??

I'll put the structure of the function below:

worDistance <- function( palavra1, palavra2 , direcao ) {#

###Legenda
#A função vai retornar "-1" quando uma das palavras inseridas no input não existir na string t
#A função vai retornar "-2" quando ambas as palavras inseridas no input não existir na string t



 if( direcao == 1 ) {##

    # 1 = Esquerda para direita

    total_palavras <- sapply(strsplit(transcricao, " "), length) 

    a <- gsub( paste0('^.*',palavra1,'\s*|\s*',palavra2,'.*$'), '', 
    transcricao)

    b <- sapply(strsplit(a, " "), length)

    if( b == total_palavras ) {

      return(-2)

    }else if( b == (total_palavras) - 1) {

      return(-1)

    }else if( b != total_palavras ){

      return(b)

    }

  }##

}#
    
asked by anonymous 01.11.2018 / 13:34

1 answer

4

One possibility is to use the %in% operator to find the position of palavra1 and palavra2 and then calculate the distance between the two:

t <- "bom dia posso ajudar nao viu zunkz sabe nao tava pagar"
frase <- unlist(strsplit(t, " "))
palavras <- c('dia', 'zunkz')

# posicao das palavras na frase
pos <- which(frase %in% palavras)
pos
# [1] 2 7

# calcular distância
diff(pos) - 1
# [1] 4

Note that even if the words are not in the same order, the position will not change and then the distance can be easily calculated:

palavras <- c('zunkz', 'dia')
which(frase %in% palavras) # mesma posição que antes
# [1] 2 7 

You will have to adjust the function to deal with possible repeated words, but this is the subject of another question.

    
01.11.2018 / 14:32