How to assemble logic to read 2 files, compare them and extract missing values

3

I have 2 files .txt , one of them is a correct list of cities (contain all the cities of the country, written correctly) and the other, also a list of cities but with some wrong data (this list suffered insertions of the user, so it has errors of Portuguese, etc.).

In order to streamline my process of correcting the 2nd list, I thought to check if each city is inserted in the 1st list (that is, if it is inserted, it means that the city is typed correctly If it's not , I keep this city because it's supposed to be wrong ).

My problem is logic, I've assembled the following code however it seems to only go through the first line of file 2 (with wrong data). And I'm also in doubt about how to use the comparison, since I need to know all the values in file 1 to know if the city that is in the loop is in the file or not.

import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Scanner;

public class Aehoo {
    public static void main(String[] args) throws IOException {
        Scanner biCities = new Scanner(new FileReader("C:\LISTA_CIDADES_BI.txt"));
        Scanner billCities = new Scanner(new FileReader("C:\LISTA_CIDADES_BILL_ADDR.txt"));
        ArrayList<String> array = new ArrayList<String>();

        // Percorre a lista de cidades com dados errados
        while (billCities.hasNextLine()) {
            String cityBill = billCities.nextLine();

            // Percore a lista de cidades correta para cada linha da outra lista
            // A fim de verificar se cityBill está na lista
            while (biCities.hasNextLine()) {
                String cityBi = biCities.nextLine();

                // Problema de lógica de comparação aqui
            }
        }

        for (String s : array) {
            System.out.println(s);
        }

        biCities.close();
        billCities.close();
    }
}

Cities are represented in the CIDADE;ESTADO pattern as shown below.

LISTA_CIDADES_BILL_ADDR                 LISTA_CIDADES_BI
(LISTA C/ DADOS ERRADOS)                (LISTA COM DADOS CORRETOS)
=- LAURO DE FREITAS;BA                  ABADIA DE GOIAS;GO
; VILAS DO ATLANTICO;BA                 ABADIA DOS DOURADOS;MG
ABADIA DE GOIAS;GO                      ABADIANIA;GO
ABADIA DOS DOURADOS;MG                  ABAETE;MG
ABADIANIA;GO                            ABAETETUBA;PA
ABAETE;MG                               ABAIARA;CE
ABAETE DOS MENDES;MG                    ABAIRA;BA
ABAETETUBA;PA                           ABARE;BA
ABAIARA;CE                              ABATIA;PR
ABAIBA;MG                               ABDON BATISTA;SC

Just for the sake of information, I was able to set up a logic that works, instead of reading the lists every time I scroll through my while I saved in 2 array and set the condition below:

ArrayList<Cidade> cidadesDiferentes = new ArrayList<Cidade>();

for (Cidade cidadeIncorreta : listaCidadesIncorretas) {
    int encontrou = 0;

    for (Cidade cidadeCorreta : listaCidadesCorretas) {
        if ((cidadeIncorreta.getCidade().equalsIgnoreCase(cidadeCorreta.getCidade())) && (cidadeIncorreta.getEstado().equalsIgnoreCase(cidadeCorreta.getEstado()))) {
            encontrou = 1;
        }
    }

    if (encontrou == 0) {
        cidadesDiferentes.add(cidadeIncorreta);
    }
}
    
asked by anonymous 16.01.2015 / 13:57

2 answers

2

The request follows a minimum sketch of the solution in memory with Set .

Read a file for Set :

public Set<String> leMunicipios(Path path, int linhasParaPular, Charset charset) 
        throws IOException {
    final List<String> contents = Files.readAllLines(path, charset);
    return new LinkedHashSet<>(contents.subList(linhasParaPular, contents.size()));
}

Listing errors:

final Charset charset = StandardCharsets.UTF_8;
try {
    final Path pathMunicipios = Paths.get("C:\LISTA_CIDADES_BILL_ADDR.txt");
    final Path pathGabarito = Paths.get("C:\LISTA_CIDADES_BI.txt");
    final Set<String> municipios = leMunicipios(pathMunicipios, 4, charset);
    final Set<String> gabarito = leMunicipios(pathGabarito, 2, charset);
    municipios.removeAll(gabarito);
    municipios.forEach(System.out::println);
} catch (IOException e) {
    e.printStackTrace();
}

If order is not important, you can slightly improve performance by replacing LinkedHashSet with HashSet (not that this will be very relevant in this case).

    
16.01.2015 / 16:58
3

It is only running once because the Scanner of the wrong cities has reached the end of the file. You have to restart it.

Do something like this in the first while :

<Segundo While>
biCities.close();
biCities = new Scanner(new FileReader("C:\LISTA_CIDADES_BI.txt"));
<fecha While>

Another option is to use the reset() method of class Scanner , like this:

biCities.reset(); //Se você usar alguma das funções Scanner.useDelimiter(), Scanner.useLocale()
                  //ou Scanner.useRadix(), deve reutilizá-las.
    
16.01.2015 / 14:06