How is the find function implemented?


Unlike the match() that looks for an instance of a language y at the start of the string, find() strong> search for y within that string. I could use the Knuth-Morris-Pratt to search for a first occurrence n0, n1, ..., n of X (string) in S (string ) in O(n+k) complexity,% being the size of X and n the size of S , but this is impracticable in the implementation of the search automaton because of the state transitions (I believe, as the algorithm only looks at the beginning of the string, I need to implement it so that a given language can be found both at the beginning / middle / end).

Can anyone bring me a light on this?

asked by anonymous 09.08.2014 / 20:49

2 answers


Build an LR parser from productions:

s -> .*minha_regex
minha_regex -> sua regex original

When minha_regex production is reduced for the first time, you will have found it.

Converting parts of the regex into productions is relatively simple:

  • a -> x* is the same as:

    a -> ε
    a -> a x
  • a -> x | y is the same as:

    a -> x
    a -> y
  • a -> (x | y) z* is the same as:

    a -> a0 a1
    a0 -> x
    a0 -> y
    a1 -> ε
    a1 -> a1 z
11.11.2014 / 16:59

Regex - EM JAVA (you did not say the language, I hope it helps!)

Regular expressions (regex), is a resource where we can be doing search with just simple characters (sequence of letters, digits, etc.) or with metacharacters. Therefore, to use such advantages we obviously need to indicate a source and search characters.

Matcher methods: find (), start (), group (), end ();

We have the following font: blue color mouse . So I want to find where the word "color" is in our sentence (in bold). And now!?

import java.util.regex.Matcher; 
import java.util.regex.Pattern; 

public class Regex { 

public static void main(String[] args) { 

String fonte = “mouse da cor azul”; 

String queremosIsso = “cor”; 

* Nesse momento obteremos uma instância de Pattern, através do * método 
* static compile(String regex), o qual recebe uma String que é a 
* expressção regular 
Pattern p = Pattern.compile(queremosIsso); 

* Aqui, através da instancia de Pattern, chamando o método * matcher() e passando a fonte de busca 

Matcher m = p.matcher(fonte); 

//Ligando o motor 
while (m.find()) { 

//Obtendo o inicio do que foi encontrado 




When we execute this code, we will get 9 (nine) as answer. But, why 9?

First let's see what happens.

After we have obtained the Pattern and Matcher instances, in a loop (while) we call the m.find () method. When we call this face, the regex engine is fired, that is, the search task is effectively started. Vale recalls that the regex engine performs the search from left to right and while it encounters an occurrence, it is no longer participating in the search process. Just below the code, I called the m.start () method. This method returns an integer, where it represents the initial index of each sequence found, which in this case would be 9 (nine). I remember starting from scratch. Below is the letters and their indexes (where there are two quotes without content, they represent spaces).

0-m 1-o 2-u 3-s 4-e 5 '' 6-d 7-a 8-

blue color mouse

There is also as I said at the beginning, the group () method. This is responsible for bringing the group of characters that was found in the search.

Ex: // Starting the engine

while (m.find()) { 

//Obtendo o inicio do que foi encontrado 

System.out.println(m.start() +” “+ +” “+ m.end()); 


Output: 9 color 12

Then the end () method, where this is the start of start (), bringing the index of the last character found, but this index starts at one (1).

Source: link

11.11.2014 / 16:54