Identify string between two known strings in a string

3

I would like you to help me with the following:

Given this

CGC UUC GCU UUG GAA AAU UUG UGU GUU UUU UGU GGC UGC UCG CUG CUC AAA UUG UUC GCU GCU UUU UGU GUC CUG GCU GCU UUU AUU AUU UUA CGC UGC UUG GCG CUG CUY UUA CGC UGC UUG GGC UUG UUG UGG CUU UGG UUG UUU GUU UAU UAY GCU GCU CUU GUU GUU GUU GCU UGU UGU GCC UAU GGC 

I have to do a program that reads this sequence and when I find a UAG , save all the letters until I find a UAA .

For example, UAG UGG GAU UUA UAA .

How do I do this?

    
asked by anonymous 31.10.2017 / 21:46

2 answers

4

You can build a finite state machine with only 2 states to solve your problem: / p>

def pesquisar( seq, inicio, fim ):
    estado = 0
    ret = []
    aux = []

    for x in seq:
        if estado == 0:
            if x == inicio:
                aux = [ x ]
                estado = 1
        elif estado == 1:
            aux.append( x );
            if x == fim:
                ret.append(aux)
                estado = 0

    return ret


sequencia = ['CGC','UUC','GCU','UUG','GAA','AAU','UUG','UGU','GUU','UUU','UGU','GGC','UGC','UCG','CUG','CUC','AAA','UUG','UUC','GCU','GCU','UUU','UGU','GUC','CUG','GCU','GCU','UUU','AUU','AUU','UUA','CGC','UGC','UUG','GCG','CUG','CUY','UUA','CGC','UGC','UUG','GGC','UUG','UUG','UGG','CUU','UGG','UUG','UUU','GUU','UAU','UAY','GCU','GCU','CUU','GUU','GUU','GUU','GCU','UGU','UGU','GCC','UAU','GGC']

print(pesquisar( sequencia, inicio = 'UGU', fim = 'UGC' ))

Output:

[['UGU', 'GUU', 'UUU', 'UGU', 'GGC', 'UGC'],
 ['UGU', 'GUC', 'CUG', 'GCU', 'GCU', 'UUU', 'AUU', 'AUU', 'UUA', 'CGC', 'UGC']]

EDIT:

State machines can be built on Python with the use of yield , there is an alternative way to solve the problem with an even more compact code:

def pesquisar( seq, inicio, fim ):
    ret = []
    for i in seq:
        if i == inicio or ret:
             ret.append(i)
        if i == fim and ret:
            yield ret
            ret = []

sequencia = ['CGC','UUC','GCU','UUG','GAA','AAU','UUG','UGU','GUU','UUU','UGU','GGC','UGC','UCG','CUG','CUC','AAA','UUG','UUC','GCU','GCU','UUU','UGU','GUC','CUG','GCU','GCU','UUU','AUU','AUU','UUA','CGC','UGC','UUG','GCG','CUG','CUY','UUA','CGC','UGC','UUG','GGC','UUG','UUG','UGG','CUU','UGG','UUG','UUU','GUU','UAU','UAY','GCU','GCU','CUU','GUU','GUU','GUU','GCU','UGU','UGU','GCC','UAU','GGC']

print(list(pesquisar( sequencia, inicio = 'UGU', fim = 'UGC')))

Output:

[['UGU', 'GUU', 'UUU', 'UGU', 'GGC', 'UGC'],
 ['UGU', 'GUC', 'CUG', 'GCU', 'GCU', 'UUU', 'AUU', 'AUU', 'UUA', 'CGC', 'UGC']]
    
31.10.2017 / 22:24
2

I think this might help you.

comeco = "CGC"
fim = "AAA"
string = "CGC UUC GCU UUG GAA AAU UUG UGU GUU UUU UGU GGC UGC UCG CUG CUC AAA UUG UUC GCU GCU UUU UGU GUC CUG GCU GCU UUU AUU AUU UUA CGC UGC UUG GCG CUG CUY UUA CGC UGC UUG GGC UUG UUG UGG CUU UGG UUG UUU GUU UAU UAY GCU GCU CUU GUU GUU GUU GCU UGU UGU GCC UAU GGC"
seqs = string.split(" ")
resp = ""
for i in range(0, len(seqs)):
    if seqs[i] == comeco:
        resp += seqs[i]
        while seqs[i] != fim:
            i += 1
            if i == len(seqs):
                break
            else:
                resp += " "+seqs[i]
        break

print(resp)
    
31.10.2017 / 22:22