Create Java Index generating HTML file?

2

I was having to create an index with a book passed by parameter ( File , BufferedReader ).

So far I have not gotten good results, I only have one code that generates a TreeSet with all the words of the text passed by parameter. I'm trying for 3 weeks to make the code that takes the words and save the lines where they appear and generate the index HTML file.

Read is a LineNumberReader , words is a TreeSet .

I have encountered problems when I go through the list generated by the split method and compare it with the text word by word (this is the code I can not compile).

    while((line = read.readLine()) != null){
        line = line.replaceAll("[^a-zA-Z]", " ").toLowerCase();
        split = line.split(" ");

        for(String s : split){
            if(s.length() >= 1 && !palavras.contains(s)){
                palavras.add(s);
            }
        }           
    }

    path.close();
    read.close();

    }catch(FileNotFoundException e){
        e.getStackTrace();
        System.out.println("Caminho para o arquivo invalido!");

    }catch(IOException ex){
        ex.getStackTrace();
    }

    return palavras;  
}
    
asked by anonymous 29.06.2014 / 01:18

1 answer

2

Your code is almost there, I just think that for you to get what you want, it would help a lot to change the data structure "words" to a java.util.Map, rather than a java.util.Set. The point is that you do not want to store only the words, but the relationship existing between each word and a list of rows (ie a list of integers). This way, I redefined "words" as follows:

Map<String,Set<Integer>> palavras = new HashMap<String, Set<Integer>>();

With this structure you can save relationships like:

  • "bla" - > [1.3]
  • "ble" - > [2]

That is, the word "bla" was found in line 1 and 3 while the word "ble" was found in line 2. With that, I changed its "for" to add a new entry in the map if the word did not already there, and just add the page if it already exists:

for(String s : split){
    if(!palavras.keySet().contains(s)){
      Set<Integer> linhas = new TreeSet<Integer>();
          linhas.add(read.getLineNumber());
          palavras.put(s, linhas);
      } else {
          palavras.get(s).add(read.getLineNumber());
      }
}

Does it help? If you need further clarification, just ask in the comments.

    
02.04.2015 / 01:38