Implementation of a Lexical Analyzer

0

Hello, guys! I'm doing the implementation of a lex analyzer of a grammar I created (using Gals). The validation of the grammar was done using the concept of finite automata, according to the image:

AtthemomentIhavethefollowingcodes:

lexico.pyfile:

withopen("transicao.txt", "r") as programa:
    arquivo = programa.readlines()
    lista = []
    for linha in arquivo:
        line = linha.strip('\n').replace(' ', '')
        for i in line:
            lista.append(i)

    with open("tokens/transicao.txt", "r") as arquivo:
        linhas = arquivo.readlines()

    def concatenar(list):
        return ''.join(list)

    def ftransicao(s, l):
        for v in linhas:  # Navega pelas linhas do arquivo de transição
            linha = v.split(',')  # Divide a string em 3 partes (estado atual, letra, próximo estado)
            if linha[0] == s:  # Se o primeiro valor da string dividida for igual o estado atual
                for z in linha[1]:  # É navegado apenas pelos valores das letras da função transição
                    if z == l:  # Se a letra lida no arquivo for igual a letra lida da palavra então é retornado o próximo estado
                        # print('{} contém {} então vai para {}'.format(s, l, linha[2]))
                        return linha[2].strip()
        return None


    atual = 'q0'
    efinal = ['q3', 'q7', 'q28', 'q9', 'q15', 'q18', 'q25', 'q33', 'q34', 'q35',
              'q36', 'q37', 'q39', 'q41', 'q42', 'q43', 'q44', 'q45', 'q46', 'q47', 'q51', 'q53']
    lexema = []

    for i in range(len(lista)):

        atual = ftransicao(atual, lista[i].strip())
        lexema += lista[i]

        if atual in efinal:
            print('Caractere identificado [ {} ]'.format(concatenar(lexema)))
            atual = 'q0'
            lexema.clear()
            continue
        elif atual is None:
            atual = 'q0'
            print('Caractere não identificado [ {} ]'.format(lista[i]))
            lexema.clear()
            continue

Transicao.txt file containing an example of the grammar:

run
integer f = 22;
integer g = 3;
integer n;
display();
input(n);
while(g <= numero){
f = f * i;
g = g + 1;
}
display(f);
exit

And the transition function described by the image:

q0,r,q1
q1,u,q2
q2,n,q3

q0,e,q4
q4,x,q5
q5,i,q6
q6,t,q7

q4,l,q26
q26,s,q27
q27,e,q28

q0,i,q8
q8,f,q9
q8,n,q10
q10,t,q11
q11,e,q12
q12,g,q13
q13,e,q14
q14,r,q15

q10,p,q16
q16,u,q17
q17,t,q18

q0,d,q19
q19,i,q20
q20,s,q21
q21,p,q22
q22,l,q23
q23,a,q24
q24,y,q25

q0,w,q29
q29,h,q30
q30,i,q31
q31,l,q32
q32,e,q33

q0,-,q34
q0,+,q35
q0,*,q36
q0,/,q37

q0,<,q38
q38,=,q39

q0,>,q40
q40,=,q41

q0,=,q42
q42,=,q43

q0,(,q44
q0,),q45
q0,{,q46
q0,},q47
q0,;,q53

The problem is as follows. How can I validate the String, digits, identifiers according to the image shown? The others are working perfectly and being recognized in the listing.

Thanks to anyone who can help. Good night!

    
asked by anonymous 28.09.2018 / 08:06

1 answer

1

Your language is too complex to be defined in a regular grammar. You should use a context-free grammar , so you can even define the origin of mathematical operators and check parentheses.

Once you've created your GLC, create a parser LL to validate the syntax and generate the <

05.11.2018 / 12:56