Regular expression to recognize language: words that do not contain "bbab"

18

@LINQ challenged me to write a regular expression to recognize the following language L :

Languageonthedesiredalphabet{a,b,c,d},anywordthatdoesnotcontainthesubexpression"bbab"

I was able to build on this premise the following finite state machine:

Machineoffinitestatestorecognizethewordthatdoesnotcontain"bbab"

  

Miguel Angelo has detected a typing error in the AFD originally described. The qa ->q0 transition occurs in [^b] , according to new figure; in the old figure with [^a] was wrong

And also the following regular grammar (tail recursion):

I already have this information in hand, but I can not write a regular expression to detect L . So I have the following questions:

  • What is the regular expression (mathematically speaking) that recognizes L ?
  • Is it true that regex is computationally equivalent to regular grammars? Or regular grammars is a super-set itself?
  • What algorithm can I use to transform my AFD into a regex?
  • What algorithm can I use to transform my regular deterministic grammar into regex?
asked by anonymous 28.09.2017 / 03:20

1 answer

13

Well let's see that Language L accepts by following the premise:

  • {a,b,c,d}-sowehavethattheonlycharacterspresentinthelanguagearea,b,c,d
  • *-allpossiblewordsthatcanbeformedbythesecharacters

Butthereisonerule:

  • pdoesnotcontainssubsequencebbab-Thatis

Bothstatemachinesandregulargrammararecorrect,howeverwhenyou'resettingupREGEXyoumaykeepinmindthatjustgivingmatchtobbabalreadymakesInvalidsequence.

Usingthe JFlap I mounted the finite automaton on which we can analyze better

Regular Grammar

If the language you are using already has the alternatives look-ahead , it becomes simple mount a validation by REGEX.

(?=^[abcd]*$)(?!^.*bbab.*$).*
  • (?=^[abcd]*$) - will only accept the characters involved
  • (?!^.*bbab.*$) - is found in any part of the current sentence bbab will refuse the match .

See more in debug

Now that you are mounting on pure REGEX, which does not have look-ahead it becomes a bit complex to mount the regex, since the process is based on acceptance rather than denial ie your sentence should start and go beating all possible states until arriving at the end of the sentence, if any state is not fulfilled then does not generate match .

Following the REGEX proposal by @Miguel :

^($|[^b]|b($|[^b]|b(b)*($|[^ab]|a($|[^b]))))*$ 

This is based on positive match based on start ( ^ ) and end ( $ ) of sentence.

Questions

  
    Is it true that regex is computationally equivalent to regular grammars? Or regular grammars is a super-set itself?   

In fact they are related because they can be generated by an A D eterministic) utterance, however, can see in the image below, they do not have a direct relation.

Source: Regular Expressions and Regular Grammar

If you have an ER or GR and convert to AFε - which is not very complicated since ε (empty motion) lets you jump from one transition to another without problems - does not mean that you will be able to complete the other conversions of AFε -> AFN -> AFD since at each step you are restricting rules and depending on the complexity you will not be able to perform the conversion.

In some sources I found it is said that REGEX is based on RLG ( G R right R ight)) or LLG G L branch to the Left ( L eft)), usually RLG. link
But it's actually implicit, because such conversions will be possible.

  
  • What algorithm can I use to transform my AFD into a regex?
  •   

First convert it to an RLG , and after applying this rule:

Original: Steps to convert regular expressions directly to regular grammars and vice versa

  
  • What is the regular expression (mathematically speaking) that recognizes L ?
  •   

I'm going to have to.

Addendum

Deterministic and Non-Deterministic Finite Automata

    
02.10.2017 / 20:31