How to find position of the occurrence of a String in a file in JAVA?

0

I'm having to implement a job for college, where I need to read a text file, word for word, save it to a hash table and then, according to other words read in a second file, report the occurrence of each one of them. So far so good!

The problem is that I also need to store the start position of each occurrence and I do not know how to read word by word from the file so that I can save it. The only way I can think of doing this is by using RandomAccessFile , but how would I do it to read word for word?

I am currently reading the words as follows:

String palavra;
File arq = new File("teste.txt");
try{
    Scanner in = new Scanner(arq);
    while(in.hasNext()){
        palavra = in.next().toLowerCase();
    }
}catch(IOException e){
}

I ignored the rest of the code, because what really matters is the reading of words.

    
asked by anonymous 18.08.2017 / 00:00

1 answer

0

I do not know if I completely understand what you want, but here's an example:

First you will need to get all the contents of the text, so create the following method in the main class:

private static String getTexto(String nomeArquivo) throws IOException {
    StringBuilder conteudo = new StringBuilder();
    BufferedReader reader = new BufferedReader(new FileReader(nomeArquivo));
    while (reader.ready()) {
        String linha = reader.readLine();
        conteudo.append(linha);
    }
    reader.close();

    return conteudo.toString();
}

Then create the method that will return a LinkedHashMap containing the existing text words and their initial position in the text:

private static Map<String, Integer> getPalavrasDoTexto(String conteudoTexto) throws IOException {
    Map<String, Integer> listaDePalavrasDoTexto = new LinkedHashMap<>();
    String palavra = "";
    int posicaoInicioBusca = 0;
    for (Character caracter : conteudoTexto.toCharArray()) {
        if (Character.isAlphabetic(caracter)) { // verificação para armazenar somente letras
            palavra += caracter;
        } else {
            if (!palavra.isEmpty() && !listaDePalavrasDoTexto.containsKey(palavra.toLowerCase())) { // verificação para não pegar a palavra novamente caso já tenha encontrado antes
                int posicaoDeInicio = conteudoTexto.indexOf(palavra, posicaoInicioBusca); // aqui pegamos a posição da palavra a partir da posicao da ultima palavra
                posicaoInicioBusca = posicaoDeInicio;

                listaDePalavrasDoTexto.put(palavra.toLowerCase(), posicaoDeInicio);
            }

            palavra = "";
        }
    }

    return listaDePalavrasDoTexto;
}

Also create the method that returns the search words, as follows:

private static List<String> getPalavrasParaBuscar(String nomeArquivo) throws IOException {
    List<String> listaDePalavrasBusca = new ArrayList<>();
    BufferedReader reader = new BufferedReader(new FileReader(nomeArquivo));
    while (reader.ready()) {
        String palavra = reader.readLine();
        listaDePalavrasBusca.add(palavra.toLowerCase());
    }
    reader.close();

    return listaDePalavrasBusca;
}

Create a file named texto.txt containing all text and another named palavras.txt with one word per line, both in the root directory.

Inside the main method, put the following:

String texto = getTexto("texto.txt");
List<String> palavrasParaBuscar = getPalavrasParaBuscar("palavras.txt");
Map<String, Integer> palavrasDoTexto = getPalavrasDoTexto(texto);

for (String palavraParaBuscar : palavrasParaBuscar) {
    if (palavrasDoTexto.containsKey(palavraParaBuscar)) { // só vamos buscar as palavras que já foram encontradas anteriormente

        System.out.println("-----");
        System.out.println("BUSCANDO PALAVRA: " + palavraParaBuscar + ", POS INICIO TEXTO: "+ palavrasDoTexto.get(palavraParaBuscar));

        Pattern pattern = Pattern.compile(palavraParaBuscar, Pattern.CASE_INSENSITIVE);
        Matcher matcher = pattern.matcher(texto);

        while (matcher.find()) { // enquanto acharmos alguma ocorrência
            String palavraEncontrada = matcher.group();
            int posicaoDeInicio = matcher.start();
            int posicaoFinal = matcher.end();

            System.out.println();
            System.out.println("PALAVRA ENCONTRADA: " + palavraEncontrada);
            System.out.println("POS INICIO: " + posicaoDeInicio);
            System.out.println("POS FINAL: " + posicaoFinal);
        }
    }
}
    
18.08.2017 / 02:41