How to create a filter from the words / phrases of interest to filter a certain vacancy from a "List"?

11

In my example I have two classes that are SetorInteresse and Vaga , below follows the structure of the two:

Sector Class Interests:

public class SetorInteresse {

    private List<String> setores;

    public SetorInteresse(List<String> setores) {
        this.setores = setores;
    }

    public SetorInteresse() { }

    public void addPalavra(String palavra) {  setores.add(palavra); }

    public void removePalavra(String palavra) { setores.remove(palavra); }

    public List<String> getSetores() { return setores; }
}

Class Vague:

public class Vaga {
    private String tituloVaga;
    private String setor;
    private String funcao;    

    public Vaga(String tituloVaga, String setor, String funcao) {
        this.tituloVaga = tituloVaga;
        this.setor = setor;
        this.funcao = funcao;        
    }

    public Vaga() { }   

    public String getDescricaoVaga() {
        return tituloVaga;
    }

    public void setDescricaoVaga(String tituloVaga) { this.tituloVaga = tituloVaga; }

    public String getSetor() { return setor; }

    public void setSetor(String setor) { this.setor = setor; }

    public String getFuncao() { return funcao; }

    public void setFuncao(String funcao) { this.funcao = funcao; }   
}

I have two methods below that populate the variable vagas of type List<Vaga> and the other that populate the setores attribute of object SetorInteresse see:

Method that populates variable vagas :

List<Vaga> vagas = criaVagas();
...
static List<Vaga> criaVagas() { 
    List<Vaga> vagas = new ArrayList<>();
    vagas.add(new Vaga("Desenvolvedor Java", "Tecnologia da Informação", "Desenvolvedor"));       
    vagas.add(new Vaga("Desenvolvedor C# e Web", "Tecnologia da Informação", "Desenvolvedor"));
    vagas.add(new Vaga("Motorista Carreteiro", "Logistica", "Motorista"));       
    vagas.add(new Vaga("Gerente de Sistemas", "Tecnologia da Informação", "Desenvolvedor"));
    vagas.add(new Vaga("Estágiario Tecnologia da Informação", "Tecnologia da Informação", "Estágiario"));       
    vagas.add(new Vaga("Analista de Sistemas", "Tecnologia da Informação", "Analista"));              
    vagas.add(new Vaga("Suporte Técnico", "Suporte", "Suporte"));              
    vagas.add(new Vaga("Gerente Comercial", "Departamento Administrativo", "Gerente"));       
    vagas.add(new Vaga("Assistente de Recursos Humanos", "Recursos Humanos RH", "Aissistente"));       
    return vagas;
}

Method that populates the setores attribute:

SetorInteresse setorInteresse = criaSetorInteresse();
...
static SetorInteresse criaSetorInteresse() { 
    SetorInteresse setorInteresse = new SetorInteresse();
    setorInteresse.addPalavra("Desenvolvimento de programas");        
    setorInteresse.addPalavra("Tecnologia da informação e serviços");
    setorInteresse.addPalavra("Análise de sistemas");        
    return setorInteresse;
}

Based on the data that was entered into the two variables vagas and setorInteresse I would like to know if there is any way I could create a filter that returns me only the objects of the vagas list in which the value of the setor attribute of the Vaga object relates to any of the words or phrase of the setores attribute or is there any alternative to this?

Example, if I have the following value in my sectors attribute:

  

Program Development

I would get all objects of type Vaga where the value of the setor attribute is related to Desenvolvimento de programas , in this case the vacancies I would receive would be:

  

Java Developer
  C # and Web Developer   Systems Manager
  Information Technology Internship
  Systems Analyst
  Technical Support

So the vacancies displayed would be according to the interests defined in the setores attribute.

Is there a way to create a filter that returns those results, or is there a library that does this for me. And also I would like to know what criteria I should define in the relationship between the words / phrases and how to define them, should it be necessary to do this?

    
asked by anonymous 12.09.2016 / 03:09

3 answers

1
The big problem lies in the fact that the relationship between the job and the industry is essentially a semantic validation, something in which computers are not very good. However, you can get some satisfactory results with just symbolic manipulation, something in which computers are excellent.

Assuming that the description of a SetorInteresse is at least symbolically similar to a sector in a vacancy, we could use the distance algorithm of Levenshtein to calculate the similarity between the sector of the job and the sector of interest. With this we could define a minimum threshold of acceptable similarity and filter the vacancies using this criterion.

A practical example:

List<Vaga> vagas = criaVagas();

List<Vaga> filtered = vagas.stream().filter(
        v -> StringUtils.getLevenshteinDistance(department, v.getSetor()) <= DISTANCE_THRESHOLD
).collect(Collectors.toList());

If the sector of interest is "Desenvolvimento de programas" then the following places are returned to a tolerance limit of 20:

1. Desenvolvedor Java
2. Desenvolvedor C# e Web
3. Gerente de Sistemas
4. Estagiário Tecnologia da Informação 
5. Analista de Sistemas 
6. Gerente Comercial
The problem in trying to solve this problem only at the syntactic level is that some strings may have smaller distance even if their semantic value has nothing to do with what was expected - as was the case of the Commercial Manager, distance from all of them.

In addition, a universal tolerance will not be very efficient. Most likely you will have to set individual tolerances for each department. A possible heuristic is to consider the tolerance according to the size of the strings compared. For example, Levenshtein's distance can not be equal to 80 or 90 percent of the length of the largest string.

The method that calculates Levenshtein's distance, as well as other methods for distance strings, is available at the Apache Commons Lang library.

compile 'org.apache.commons:commons-lang3:3.4'
    
22.09.2016 / 04:53
6

Since you are using Java 8, you can use lambda expressions . An example of how you could use them is:

List<Vaga> vagas = criaVagas();
List<Vaga> vagasFiltradas = vagas.stream().filter(vaga -> vaga.getSetor().contains("Tecnologia")).collect(Collectors.toList());

What the above excerpt does is use a filter that can be set as you want and it will be applied to all elements of the list! The filter I chose as an example maps a "vacant" object of Collection "vacancies" and verifies that the sector of that vacancy is associated with a sector that you will use as a filter. The check checks whether the sector field of the slot has the word "Technology" anywhere in String . I put "Technology" to illustrate and then they would appear:

Desenvolvedor Java
Desenvolvedor C# e Web
Gerente de Sistemas
Estágiario Tecnologia da Informação
Analista de Sistemas

This filter can be more sophisticated, considering for example other job information to further limit the results.

    
17.09.2016 / 03:14
3

My orientation is to further segment your code by creating 3 classes:

public class Setor {
    private int codigoSetor;
    private String descricaoSetor;


    ...gets and setters
}

public class Funcao {
    private int codigoFuncao;
    private String descricaoFuncao;


    ...gets and setters
}

public class Vaga {
    private String tituloVaga;
    private int codigoSetor;
    private int codigoFuncao;

    public Vaga(String tituloVaga, int setor, int funcao) {
        this.tituloVaga = tituloVaga;
        this.codigoSetor = setor;
        this.codigoFuncao = funcao;        
    }

    ...gets and setters
}

Create functions and vacancies with identifier codes:

SetorInteresse setorInteresse = criaSetorInteresse();

static boolean criaSetorInteresse() { 
    SetorInteresse setorInteresse = new SetorInteresse();
    setorInteresse.save(1, "Desenvolvimento de programas");        
    setorInteresse.save(2, "Tecnologia da informação e serviços");
    setorInteresse.save(3, "Logistica");
    ...

    return setorInteresse;
}

static String getSetorInteresse(int codigo) { 
    SetorInteresse setorInteresse = SetorInteresse();
    for (Setor setor : setorInteresse.getFuncoes()) {
        if (setor.getCodigoFuncao == codigo) {
            return setor.getDescricaoFuncao; 
        }
    }
}


FuncaoInteresse funcaoInteresse = criaFuncaoInteresse();

static boolean criaFuncaoInteresse() { 
    FuncaoInteresse funcaoInteresse = FuncaoInteresse();
    funcaoInteresse.save(1, "Desenvolvedor");        
    funcaoInteresse.save(2, "Estágiario");
    funcaoInteresse.save(3, "Motorista");    
    ...

    return funcaoInteresse;
}

static String getFuncao(int codigo) { 
    FuncaoInteresse funcaoInteresse = FuncaoInteresse();
    for (Funcao funcao : funcaoInteresse.getFuncoes()) {
        if (funcao.getCodigoFuncao == codigo) {
            return funcao.getDescricaoFuncao; 
        }
    }
}

List<Vaga> vagas = criaVagas();
...
static List<Vaga> criaVagas() { 
    List<Vaga> vagas = new ArrayList<>();
    vagas.add(new Vaga("Desenvolvedor Java", 2, 1));       
    vagas.add(new Vaga("Desenvolvedor C# e Web", 2, 1));
    vagas.add(new Vaga("Motorista Carreteiro", 3, 3));
    ...

    return vagas;
}

And when you want to use the descriptions use the methods getSetorInteresse and getFuncao passing the codes that were related to the inserted places.

    
14.09.2016 / 18:50