What is the difference between automata and grammars?

9

I'm studying compilers and relating programming languages. What is the difference between automata and grammars?

    
asked by anonymous 08.08.2014 / 17:14

1 answer

11

Formal Languages

In the study of formal languages , a language is a set of strings over an alphabet. A string is a sequence of letters obtained from a given alphabet 1 .

note: Normally string is translated as "word" but perhaps "phrase" is more correct, since spaces are a letter like any other from the point of view of formal languages and "AB" is a single string. >

Grammars

A grammar serves to describe the structure of a formal language. Grammars consist of a series of rules that describe how to form words of the language. For example, one of the grammars describing the language of well-balanced parenthesis strings is the following grammar:

regra inicial: S
regras de geração: 
S -> ()     // Um par de parênteses é bem balanceado
S -> ( S )  // Uma string balanceada cercada de parênteses é bem balanceada
S -> S S    // Duas strings balanceadas em sequência são bem balanceadas.

Grammars can be used to generate strings, but the most interesting operation is, given a string, get a syntax tree that describes the structure of the string in relation to the grammar. For example, in the language of arithmetic expressions, the string "1 + 2 * 3" will be translated into the following tree:

  soma
  /  \
1   produto
      /  \
    2     3

It is about these syntax trees that the compiler of our programming language or the expression evaluator of our calculator will operate. It is much more structured than a flattened text.

There are several questions we can ask about grammars:

  • Is grammar ambiguous? Is there any string that can be derived using two different derivation trees, or does every string have a unique derivation? For example, if we do not use precedence rules for operators, in 1 + 2 * 3 is ambiguous if the sum or multiplication comes first.

  • Is grammar regular? Is grammar context-free? Is grammar recursive on the left? The more general the general grammar is the language it can describe but at the same time the more complex and expensive the parsing algorithm will be.

Automata

Automata are state machines that receive a word and return YES or NO at the end. In the context of formal languages, automata are associated with the language of the strings that they accept with a SIM.

Automata are classified with the computational power they can use: The available memory is finite, in stack, unlimited ...? Is the processing deterministic or non-deterministic (parallel)?

Some examples of questions we can ask about automata:

  • Do two automata recognize the same language?
  • Can we reduce the number of states of an automaton without changing the recognized language?
  • What happens if we decide to combine two automata?
  • What is the difference in terms of recognizable languages of deterministic and non-deterministic automata, with more or less memory? And in terms of execution cost and number of states required?

Classification of grammars vs automata. Where do compilers fit in?

A very interesting classification is the Chomsky hierarchy , which associates some important classes of formal languages with which types of grammars are able to generate these languages and which automata can be used to recognize them:

       | Linguagens                 | Automato                | Regras de produção da gramática
Tipo 0 | Recursivamente enumeráveis | Máquina de Turing       | (sem restrições)
Tipo 1 | Sensíveis ao contexto      | MT linearmente limitada | a X b -> c Y d
Tipo 2 | Livres de contexto         | Pilha ñ determinístico  | X -> a Y b
Tipo 3 | Regulares                  | Finito                  | X -> a Y 

From this list, the most important for a compiler are context-free grammars and finite automata.

Context-free grammars are the ones you'll typically use to describe your programming language, since programming languages need to be processed efficiently, and the absence of context effects leads to more predictable syntax trees. In addition to avoiding ambiguities, you will also have to pay attention if your grammar is compatible with the type of parser you are going to use. If you're going to write a top down parser, (recursive, handwritten), your grammar will need to be an LL grammar, with no recursion left. If you are going to use a bottom up parser - such as yacc or bison, your grammar might be an LR grammar that has fewer restrictions.

As for the automata, the ones you will spend the most time studying are the finite automata. They are used to recognize regular expressions, which are used extensively in lexer, which is the first step in compiling.

    
08.08.2014 / 20:14