Normalize comma-separated values for new table

6

The idea is to stop having the column with the comma-separated values and pass them to an intermediate table:

Source table

Assuming a table named press with the following fields:

id, tag_id

Containing records type:

┌───────────┬──────────────┐
│ press_id  │  tag_id      │
├───────────┼──────────────┤
│  1        │  1,2,3       │
├───────────┼──────────────┤
│  2        │  2,6,5       │
├───────────┼──────────────┤
│  3        │  10,450      │
└───────────┴──────────────┘

Destination Table

We want to make a query to read them and write the values in a new press_tags table that will establish the relation between press and tag :

┌──────┬────────────┬──────────────┐
│  id  |  press_id  │  tag_id      │
├──────┼────────────┼──────────────┤
│  1   │  1         │  1           │
├──────┼────────────┼──────────────┤
│  2   │  1         │  2           │
├──────┼────────────┼──────────────┤
│  3   │  1         │  3           │
├──────┼────────────┼──────────────┤
│  4   │  2         │  2           │
├──────┼────────────┼──────────────┤
│  5   │  2         │  6           │
├──────┼────────────┼──────────────┤
│  6   │  2         │  5           │
├──────┼────────────┼──────────────┤
│  7   │  3         │  10          │
├──────┼────────────┼──────────────┤
│  8   │  3         │  450         │
└──────┴────────────┴──────────────┘

Question

How to select all records in the source table and for every% of% that exists by comma, insert a record in the target table?

    
asked by anonymous 07.04.2015 / 17:28

2 answers

6

MySQL does not have any function that allows us to split a string into multiple lines, so the work gets a bit complex:

SQL Fiddle

INSERT INTO press_tags (press_id, tag_id)
SELECT
    press.press_id,
    SUBSTRING_INDEX(SUBSTRING_INDEX(press.tag_id, ',', n.n), ',', -1) tag_id
FROM press
CROSS JOIN 
(
    SELECT a.N + b.N * 10 + 1 n
    FROM 
    (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a,
    (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
    ORDER BY n
) n
 WHERE n.n <= 1 + (LENGTH(press.tag_id) - LENGTH(REPLACE(press.tag_id, ',', '')))
 ORDER BY press_id, tag_id

Explanation

  • A sub-query with an pseudonym of n will generate in real time a sequence of numbers from 1 to 100, in this particular case , using [UNION ALL][4] and CROSS JOIN .

  • In the SELECT experiment, no SUBSTRING_INDEX() interior, we will get everything up to the nth element in a list.

    The SUBSTRING_INDEX() exterior will extract the portion more directly after the last delimiter, thus receiving the information of the nth element.

  • >
  • The condition in the CROSS JOIN clause will filter all unnecessary rows in the result set.

  • This query will split up to 100 press for each record in the source table. For the case in hand is sufficient, but if necessary, you can adjust the sub-queries.

    Common Scenario

    A common scenario with values separated by a delimiter in a column is the insertion of the combination WHERE + tag_id , resulting in something like:

    valor;valor;valor;
    

    Where the presence of the last delimiter, with the above solution, will generate a blank record in the target table:

    ┌──────┬────────────┬──────────────┐
    │  id  |  press_id  │  tag_id      │
    ├──────┼────────────┼──────────────┤
    │  1   │  1         │              │
    ├──────┼────────────┼──────────────┤
    │  2   │  1         │  2           │
    ├──────┼────────────┼──────────────┤
    │  3   │  1         │  3           │
    └──────┴────────────┴──────────────┘
    

    This problem may still be reflected in a type error:

      

    Incorrect integer value: '' for column 'tag_id'

    To avoid this, we can make use of another valor to select the fields where delimitador is not SELECT or tag_id :

    INSERT INTO press_tags (press_id, tag_id)
    SELECT
      result.press_id,
      result.tag_id
    FROM (
      # consulta aqui
    ) AS result
    WHERE result.tag_id > ''
    

    What results in:

    SQL Fiddle

    INSERT INTO press_tags (press_id, tag_id)
    SELECT
      result.press_id,
      result.tag_id
    FROM (
      SELECT
          press.press_id,
          SUBSTRING_INDEX(SUBSTRING_INDEX(press.tag_id, ',', n.n), ',', -1) tag_id
      FROM press
      CROSS JOIN 
      (
          SELECT a.N + b.N * 10 + 1 n
          FROM 
          (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a,
          (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
          ORDER BY n
      ) n
      WHERE n.n <= 1 + (LENGTH(press.tag_id) - LENGTH(REPLACE(press.tag_id, ',', '')))
      ORDER BY press_id, tag_id
    ) AS result
    WHERE result.tag_id > ''
    

    Much of this solution comes from SOEN responses in this question .

        
    07.04.2015 / 20:59
    1

    Look at your response to the @Zuul answer. You can see that you have a lot of MySql experience.

    I can achieve the same result, but in a humbler way, it may even be considered a gambiarra, but it is what I have always used as a solution for cases of this type.

    PREPARE DATA

  • I make an sql that will generate a "script":
  • select concat('insert into press_tags (press_id,tag_id) select ',press_id,',tag_id from (select NULL tag_id union select ', replace(tag_id,',',' union select '),') A where tag_id IS NOT NULL;') 'query' from press;
    
      

    This statement will generate an output like this:

    insert into press_tags (press_id,tag_id) select 1,tag_id from (select NULL tag_id union select 1 union select 2 union select 3) A where tag_id IS NOT NULL;
    insert into press_tags (press_id,tag_id) select 2,tag_id from (select NULL tag_id union select 2 union select 6 union select 5) A where tag_id IS NOT NULL;
    insert into press_tags (press_id,tag_id) select 3,tag_id from (select NULL tag_id union select 10 union select 450) A where tag_id IS NOT NULL;
    

    RUN SCRIPT

  • I get this output and execute it as sql statement:
  • mysql> insert into press_tags (press_id,tag_id) select 1,tag_id from (select NULL tag_id union select 1 union selec
    t 2 union select 3) A where tag_id IS NOT NULL;
    Query OK, 3 rows affected (0.04 sec)
    Records: 3  Duplicates: 0  Warnings: 0
    
    mysql> insert into press_tags (press_id,tag_id) select 2,tag_id from (select NULL tag_id union select 2 union selec
    t 6 union select 5) A where tag_id IS NOT NULL;
    Query OK, 3 rows affected (0.04 sec)
    Records: 3  Duplicates: 0  Warnings: 0
    
    mysql> insert into press_tags (press_id,tag_id) select 3,tag_id from (select NULL tag_id union select 10 union sele
    ct 450) A where tag_id IS NOT NULL;
    Query OK, 2 rows affected (0.03 sec)
    Records: 2  Duplicates: 0  Warnings: 0
    

    As I said it is a simpler way, it uses only more basic commands, but it is an alternative that works;)

        
    16.04.2015 / 14:34