Remove an undetermined number of "\" in a column in the database

8

The description field in the product table contains HTML generated by CKEditor (English) to escape content:

<table cellpadding=\"\&quot;\\&quot;\\\\&quot;\\\\\\\\&quot;\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;0\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\&quot;\\\\\\\\&quot;\\\\&quot;\\&quot;\&quot;\" cellspacing=\"\&quot;\\&quot;\\\\&quot;\\\\\\\\&quot;\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;0\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\&quot;\\\\\\\\&quot;\\\\&quot;\\&quot;\&quot;\">
<thead>
    <tr>
        <td>
            <p>&nbsp;</p>
        </td>
    </tr>
</thead>
...

Problem

The problem here is that there have been a number of user copy / paste actions, where CKEditor has finally applied the \ escape character to the \ escape character and out there giving rise to which can be seen above.

With each new copy the number of bars has been increasing ...

Since the number of \ is different in each place and each record in the table, and given the number of records, manual resolution of this subject is impractical.

Question

How to apply a regular expression to an update query to the database, updating the description field with the value it contains, but clearing all those escape bars, leaving only one where more than one is found?

Theoretical example:

UPDATE product
SET description = (description - milhões de barras)
WHERE 1

Not relevant, but the issue of copying and pasting clearing previously applied escape characters is already resolved on the PHP side. New records no longer cause this problem. The question is a solution in MySQL to deal with the old records already present in the database.

    
asked by anonymous 29.01.2014 / 13:37

6 answers

3

MySQL does not support substitution for regular expressions natively (one of the reasons I prefer PosgreSQL). Therefore, I recommend that you install the UDFs lib mysql-udf-regexp. Once you have done this, just use the substitution for regular expressions:

UPDATE product
SET description = PREG_REPLACE('\\+', '\', description);

The regular expression \\+ matches any number of backslashes (the first parameter of the function). Because \ is escape character in MySQL and regular expression, you must use four \ to count as one in the regular expression. The second parameter is the substitution string, which in this case will be \ . Finally, the last parameter is the column in which the operation will be performed.

    
29.01.2014 / 14:38
2

You can run the following query multiple times until it no longer changes any records:

UPDATE product
SET description = REPLACE(description, '\\', '\')
WHERE description LIKE '%\\\\%'

MySQL should indicate that no record has changed with a message like "Query OK, 0 rows affected". You can also verify that the replacement has completed when the following query returns no records:

SELECT description FROM product
WHERE description LIKE '%\\\\%'

Explanation

The query exchanges each pair of backslashes, \ , with a single, \ . Thus, in a first execution of the query, \\ is replaced by \ , which in a second execution is replaced by \ .

Note that, in the expression REPLACE(description, '\\', '\') , the backslash is used as an escape character; thus, '\\' represents two bars.

The expression description LIKE '%\\\\%' , looks for columns that contain two consecutive backslashes, \ . This is because, in the expression LIKE , backslashes need to be "escaped" twice. According to MySQL documentation :

  

To search for "\", specify it as "\\"; This is because the backslashes are stripped once by the parser and again when the pattern match is made, leaving a single backslash to be matched against.

    
01.02.2014 / 03:21
0

where you have two or more than one bars, switch to one:

$var = '\&quot;\\&quot;\\\\&quot;\\\\\\\\&quot;\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\';
echo preg_replace("/\\+/",'\',$var);
// retorna: \&quot;\&quot;\&quot;\&quot;\&quot;\
    
29.01.2014 / 14:30
0

Since you want to "save" old results, then we'll give you a update in the database.

UPDATE table
SET field = REPLACE(field, '\\'', '\'')
WHERE field LIKE '%\\''
    
29.01.2014 / 14:32
0

You could try maybe a while in PHP or some other language with code similar to this:

WHILE (variable.contains('%\%')
   variable.replace('\','\');
    
29.01.2014 / 14:22
0

Because this problem is fairly simple, if you do not want or can not install an external library for substitutions with regular expressions, this is an adaptation This function of Narcodes.

The only thing I did with the function was to replace the fields VARCHAR(1000) with TEXT to handle strings of any size and remove the call to TRIM . Replacing the fields VARCHAR with TEXT has a cost in terms of performance , but increases the flexibility of the solution:

DELIMITER //
CREATE FUNCTION 'regex_replace'(pattern TEXT, replacement TEXT, original TEXT)

RETURNS TEXT
DETERMINISTIC
BEGIN
 DECLARE temp TEXT;
 DECLARE ch VARCHAR(1);
 DECLARE i INT;
 DECLARE j INT;
 DECLARE qbTemp TEXT;

 SET i = 1;
 SET j = 1;
 SET temp = '';
 SET qbTemp = '';

 IF original REGEXP pattern THEN
  loop_label: LOOP
   IF i>CHAR_LENGTH(original) THEN
    LEAVE loop_label; 
   END IF;
   SET ch = SUBSTRING(original,i,1);
   IF NOT ch REGEXP pattern THEN
    SET temp = CONCAT(temp,ch);
   ELSE
    SET temp = CONCAT(temp,replacement);
   END IF;
   SET i=i+1;
  END LOOP;
 ELSE
  SET temp = original;
 END IF;
 SET temp = REPLACE(REPLACE(REPLACE(temp , CONCAT(replacement,replacement),
   CONCAT(replacement,'#')),CONCAT('#',replacement),''),'#','');
 RETURN temp;
END// 

So we can use the regular expression of @ André:

UPDATE product
SET description = regex_replace('\\+', '\', description);

Example with SQL Fiddle

    
02.02.2014 / 16:49