Select with WHERE condition with multiple values in the same column

10

I have two tables in my bank and I do the JOIN with them. One table is a list of people and the other list that person's characteristics. By logic there is only one person and each person can have several characteristics, so two tables. So when I do a join it appears the same person a few times only with its side feature. EX:

+-----+---------+----------------+
|  ID |  PESSOA | CARACTERÍSTICA |
+-----+---------+----------------+
|  1  | Carlos  | Alto           |
+-----+---------+----------------+
|  1  | Carlos  | Divertido      |
+-----+---------+----------------+
|  1  | Carlos  | Inteligente    |
+-----+---------+----------------+
|  2  | Iago    | Baixo          |
+-----+---------+----------------+
|  2  | Iago    | Divertido      |
+-----+---------+----------------+
|  2  | Iago    | Esperto        |
+-----+---------+----------------+
|  3  | Artur   | Divertido      |
+-----+---------+----------------+
|  3  | Artur   | Inteligente    |
+-----+---------+----------------+

If I do the following select I get an empty result:

SELECT 
    p.*, c.* 
FROM 
    pessoas AS p LEFT JOIN perfil AS c ON p.pid = c.perfil_pessoa 
WHERE 
    c.caracteristica = 'Divertido' AND c.caracteristica = 'Inteligente'

When actually I would like the following result:

+-----+---------+----------------+
|  ID |  PESSOA | CARACTERÍSTICA |
+-----+---------+----------------+
|  1  | Carlos  | Alto           |
+-----+---------+----------------+
|  1  | Carlos  | Divertido      |
+-----+---------+----------------+
|  1  | Carlos  | Inteligente    |
+-----+---------+----------------+
|  3  | Artur   | Divertido      |
+-----+---------+----------------+
|  3  | Artur   | Inteligente    |
+-----+---------+----------------+

That is, to result in every person being 'Fun' and 'Smart'.

If in the WHERE clause of select I use IN or OR the result is also not what I expect, for the given example, the select would return all the results, since all people have " Funny ":

+-----+---------+----------------+
|  ID |  PESSOA | CARACTERÍSTICA |
+-----+---------+----------------+
|  1  | Carlos  | Alto           |
+-----+---------+----------------+
|  1  | Carlos  | Divertido      |
+-----+---------+----------------+
|  1  | Carlos  | Inteligente    |
+-----+---------+----------------+
|  2  | Iago    | Baixo          |
+-----+---------+----------------+
|  2  | Iago    | Divertido      |
+-----+---------+----------------+
|  2  | Iago    | Esperto        |
+-----+---------+----------------+
|  3  | Artur   | Divertido      |
+-----+---------+----------------+
|  3  | Artur   | Inteligente    |
+-----+---------+----------------+
  

The intent is to build filters using the characteristics of people.

    
asked by anonymous 01.05.2018 / 17:02

3 answers

12

You want to bring the person and all their characteristics, if they meet the filter (which will be cumulative).

If you are using MySql version 8 , this will be like a walk in the park.

According to the documentation , this release supports Commom Table Expressions .

Free translation with highlights

  

A common table expression (CTE) is a resultset named temporary that only exists within the scope of a single sql statement , so it can be referenced only in it.

The applications and variations of CTEs are numerous. Your case is certainly one of them.

Note - Tables and column names have been simplified in the following examples

Here's how we could do it:

WITH candidatos AS 
 ( 
     SELECT DISTINCT c.id_pessoa 
     FROM caracteristica c
     WHERE c.caracteristica in ('Alto','Magro') // Insere aqui a lista de qualidades procuradas
     GROUP BY c.id_pessoa
     HAVING count(Distinct c.caracteristica) >= 2 // Aqui a quantidade de qualidades filtradas
 )
SELECT p.nome, c.caracteristica
FROM candidatos filtro 
    JOIN pessoa p ON o.id = filtro.id_pessoa
    JOIN característica c ON c.id_pessoa = p.id

Unfortunately I did not find a MySQL 8 fiddle online to enrich the answer with a verifiable example

It seems like a clean enough consultation for me and I really hope you can come up with a solution like this in your scenario for the good of all and general happiness of the nation.

On the other hand ...

If you are using a version of MySql < 8.0

If you're using previous versions, I've come up with some alternatives that can solve your problem, but continuing to look for better alternatives is recommended.

Option 1 - Using some nested selects:

SELECT P.nome, C.caracteristica
FROM PESSOA P
    JOIN CARACTERISTICA C on C.id_pessoa = P.id
WHERE Exists ( SELECT 1
               FROM CARACTERISTICA C2
                    RIGHT JOIN ( SELECT distinct cr.caracteristica
                                 FROM caracteristica cr
                                 WHERE Cr.CARACTERISTICA in ('Alto', 'Magro') ) c1 ON c2.caracteristica = c1.caracteristica
              WHERE c2.id_pessoa = p.id 
              GROUP BY C2.id_pessoa
              HAVING count(c2.caracteristica) >= ( SELECT Count(distinct cr.caracteristica)
                                                   FROM caracteristica cr
                                                   WHERE Cr.CARACTERISTICA in ('Alto', 'Magro') ) )

* There is still a problem here in the subselects I use a Distinct in the characteristics to get a subset with the values Alto and Magro . If you included as a filter a characteristic that no one had (% with%, for example), that condition would simply be 'ignored' in the result. I think this would not be the expected / ideal behavior.

* This example is available in SQL Fiddle

Option 2 - -Using temporary table:

Another way would be to use a temporary table and insert the desired characteristics into it. The query would look less ugly (but still far from ideal), like this:

SELECT P.nome, C.caracteristica
FROM PESSOA P
  JOIN CARACTERISTICA C on C.id_pessoa = P.id
  JOIN ( SELECT c2.id_pessoa, count(c2.caracteristica)
         FROM filtro f
             LEFT JOIN caracteristica c2 on c2.caracteristica = f.caracteristica
         GROUP BY c2.id_pessoa
         HAVING COUNT(c2.caracteristica) = (SELECT COUNT(*) FROM filtro) ) pf on pf.id_pessoa = p.id

* This example is also available in SQL Fiddle

About the results you had in your attempt

When you declare in the way you did (In% with% of query itself Feio ) you were receiving the result empty because each characteristic had to be equal to 'Smart' E same to 'Fun', which is conceptually impossible because the feature will always be one or other .

Using Where or c.caracteristica = 'Divertido' AND c.caracteristica = 'Inteligente' (also in% with% of query itself, as suggested) you would end up with just the filtered features and anyone who possessed at least one of them .

I hope I have helped.

    
01.05.2018 / 17:09
7

It has ways and ways to do, which is best, depends on how the queries will be generated.

For your case I would probably prefer to have a table of characteristics, one of people, and a third to make the relationships. Regardless of this, here are some alternatives for your current model:

Using JOIN :

This is a relatively simple and objective way - to make a JOIN for each characteristic:

SELECT 
    a.caracteristica,
    b.caracteristica,
    p.pessoa
FROM 
    perfil AS a
    LEFT JOIN perfil AS b ON a.perfil_pessoa = b.perfil_pessoa
    LEFT JOIN pessoa AS p ON a.perfil_pessoa = p.pid
WHERE 
    a.caracteristica = 'Divertido' AND
    b.caracteristica = 'Inteligente'
;

See working in SQL Fiddle .

Although you need a% number of% s equivalent to the number of features, using JOIN already allows an efficient planner to cascade items that do not meet any of the features.


Using LEFT , SUM and IF to filter item count:

The "secret" here is to compare the HAVING with the number of items sought (half that inspired by the solution of colleague @bruno):

SELECT
  GROUP_CONCAT(caracteristica ORDER BY caracteristica) AS caracteristicas,
  pessoa
FROM
  pessoa
  JOIN perfil ON perfil_pessoa = pid
GROUP BY
  pid
HAVING
  SUM(IF(caracteristica IN ('Divertido','Inteligente'),1,0))=2
;

See working on SQL Fiddle .

But this option has a danger: If there are two 'Fun' entries for the same person, it will not work. Be careful to ensure the uniqueness of options in this case (or use a DISTINCT). See the advanced solution below:


"Advanced" version of SUM :

This version uses the same technique as above, but with bit math, ensuring that all terms are found regardless of position, search order, or number of occurrences.

By using HAVING , we start to detect the position of the list in the list by setting a bit using the aggregation function FIND_IN_SET :

SELECT
  GROUP_CONCAT(caracteristica ORDER BY caracteristica) AS caracteristicas,
  pessoa
FROM
  pessoa
  JOIN perfil ON perfil_pessoa = pid
GROUP BY
  pid
HAVING
  BIT_OR(POW(2,FIND_IN_SET(caracteristica, 'Divertido,Inteligente')))&6=6
;

See working on SQL Fiddle .

The most complex part is to determine & 6 = 6 at the end of query . The BIT_AND returns values in the range of 0 to the number of SET items passed in string . Zero means "not found".

To set the bits we use FIND_IN_SET . That is, if nothing is found, the bit POW(2,...) will be set. If the first term is found, the bit 0b00000001 , and so on will be set.

We happen to want the first and second terms, which gives the binary value 0b00000010 , which is just 6. We have to ignore the first bit, since it will be set if a term other than the one searched for is returned. first bit will not be set if query finds exactly the searched terms).

So, by doing 0b00000110 , we get the value 6 with or not the last bit set. Recalling: word not found is bit zero (value 1). Term 1 is bit 2 (2 in decimal). Term 2 is bit 3 (4 in decimal). Therefore 2 + 4 = 6.

What if it were 3 terms? Same logic, 2 + 4 + 8 = 14, then the expression of &6 would be HAVING . If it were 4 terms? 2 + 4 + 8 + 16 = 30, getting &14=14 not &30=30 .

Do I have to calculate "on hand"? No, just use the following formula:

bitfield = ( 2 ^ num_termos - 1 ) * 2

This can be useful in the client language to generate query .

Using HAVING :

This query is versatile when generating filters in order not to need% s of% s extra (however, in practice having multiple% s of% s should not be a real problem in this context):

SELECT
  pessoa, caracteristicas
FROM
  (
    SELECT
      GROUP_CONCAT(caracteristica) AS caracteristicas,
      pessoa
    FROM
      pessoa JOIN perfil ON perfil_pessoa = pid
    GROUP BY
      pid
  ) c
WHERE
  CONCAT(',',caracteristicas,',') LIKE "%,Divertido,%" AND
  CONCAT(',',caracteristicas,',') LIKE "%,Inteligente,%"
;

Note that query would be much simpler without GROUP_CONCAT , but this is a good habit when you search for multiple strings , to prevent a JOIN get JOIN by mistake.

The "advantage" here is that you only touch WHERE, regardless of the number of features.

The disadvantage is that the search is done with CONCAT(',' and does not take advantage of indexes.

See working on SQL FIDDLE .

    
08.05.2018 / 22:37
2

This is a (dis) known problem called "Relational Division" (I'm trying to find sources in Portuguese but it's complicated) where the goal is to find records that meet a set of criteria. There are already some similar questions here in SOPt, for example

In this particular case, it is intended to identify the people who have ( among others ) the "Lean" and "High" attributes.

Based on the response of Diego Rafael Souza, here is a version with a slightly different behavior.

SELECT C.ID_PESSOA, 
       P.NOME,
       C.CARACTERISTICA
  FROM PESSOA P
 INNER JOIN CARACTERISTICA C
    ON C.ID_PESSOA = P.ID
 WHERE C.ID_PESSOA IN
 (
    SELECT ID_PESSOA
      FROM CARACTERISTICA C
     WHERE CARACTERISTICA IN ('Magro', 'Alto')
     GROUP BY ID_PESSOA
    HAVING COUNT(DISTINCT CARACTERISTICA) = 2
 );

In this case note that, unlike the first two proposed alternatives, only people who are both "Skinny" and "High" will be returned. The main difference is in this statement:

HAVING count(c2.caracteristica) >= ( SELECT Count(distinct cr.caracteristica)
                                       FROM caracteristica cr
                                      WHERE Cr.CARACTERISTICA in ('Alto', 'Magro') ) )

If there are no people with the "Thin" attribute in your database, all the people who are "High" and "Fat" will be included in the result set.

You can check out the fiddle .

In your case, you even need to force the number of features to be 2, of the form HAVING COUNT(DISTINCT CARACTERISTICA) = 2 or, as Diego did in the third option, using a temporary table and counting the number of existing records.

If you need a more general solution, the use of the temporary table or table variable is a good idea and at the same time should have a good performance:

CREATE TABLE FILTRO(f varchar(30));

INSERT INTO FILTRO(f) VALUES('Alto'),('Magro');

SELECT C.ID_PESSOA, 
       P.NOME,
       C.CARACTERISTICA
  FROM PESSOAS P
 INNER JOIN perfil C
    ON C.ID_PESSOA = P.ID
 WHERE C.ID_PESSOA IN
 (
    SELECT ID_PESSOA
      FROM PERFIL 
     WHERE CARACTERISTICA IN ('Magro', 'Alto')
     GROUP BY ID_PESSOA
    HAVING COUNT(DISTINCT CARACTERISTICA) = (SELECT COUNT(1) FROM FILTRO)
 );
    
04.05.2018 / 17:34