Regex - Replace process time too high

2

Recently after developing a process, I saw that it was taking an exorbitant 5 ~ 6min to run, something that should take a maximum of 2s, so I started to debug the code with timers to know which process was taking so long to be accomplished, and I came to this.

$html = preg_replace('~[^#]*(<Ajax>[^\~]*?</ajax>)[^#]*~', '$1', $html);

The HTML that I am doing replace has more than 2 thousand lines so I will not post it here but it is follows this pattern:

<Ajax>
    <Sucesso>True</Sucesso>
    <DadosRetorno><![CDATA[
        <br />
        <input type="button" id="btnExportarExtNfe" class="button" value="Exportar resultado completo da pesquisa para arquivo texto" onclick="btnExportarExtNfe_click();" />
        <br />
        <br />
        <table class="painel">
            <tr class="listaHeaderEcac">
                <th><label>#</label></th>
                <th><label>Dt Emit</label></th>
                <th><label>Dt Ent/Sai</label></th>
                <th><label>IE Emit</label></th>
                <th><label>UF Emit</label></th>
                <th><label>CNPJ Emit</label></th>
                <th><label>IE Dest/Remet</label></th>
                <th><label>UF Dest/Remet</label></th>
                <th><label>CNPJ Dest/Remet</label></th>
                <th><label>Mod</label></th>
                <th><label>Série</label></th>
                <th><label>Número</label></th>
                <th><label>Total NF-e</label></th>
                <th><label>Total BC ICMS</label></th>
                <th><label>Total ICMS</label></th>
                <th><label>Total BC ICMS ST</label></th>
                <th><label>Total ICMS ST</label></th>
                <th><label>Sit</label></th>
                <th><label>E/S</label></th>
            </tr>
            <tr>
                <td><span class="linha"><a onclick="ExibeNfeCompleta('00000000000000000000000000000000000000000020')" style="cuANor:pointer"><img src='../Imagens/lupa.png' alt='Visualizar' border=0></a></span></td>
                <td><span class="linha">03/08/15</span></td>
                <td><span class="linha">03/08/15</span></td>
                <td><span title='Empresa 1' class="linha">000/0000000</span></td>
                <td><span class="linha">AN</span></td>
                <td><span title='Empresa 1' class="linha">00.000.000/0000-00</span></td>
                <td><span title='Empresa 2' class="linha">000/0000000</span></td>
                <td><span class="linha">AN</span></td>
                <td><span title='Empresa 2' class="linha">00000000000000</span></td>
                <td><span class="linha">55</span></td>
                <td><span class="linha">1</span></td>
                <td><span class="linha">00000</span></td>
                <td align="right"><span class="linha">0,00</span></td>
                <td align="right"><span class="linha">0,00</span></td>
                <td align="right"><span class="linha">0,00</span></td>
                <td align="right"><span class="linha">0,00</span></td>
                <td align="right"><span class="linha">0,00</span></td>
                <td><span title='Normal' class="linha">N</span></td>
                <td><span title='Saída' class="linha">S</span></td>
            </tr>
        </table>
        <div width="000%"><span class="linha">NFes Emitidas até: <strong>00/00/0005 09:01:03</strong></span></div>
        <div width="000%" align="center">
            &nbsp;
            <SPAN title="Linha Inicial e Final da Página">Linhas de 1 a 000</SPAN> - &nbsp;
            <SPAN title="Total de Linhas Recuperadas">Total de Linhas: 000</SPAN>
            <br> &nbsp;
            <SPAN title="Total de Páginas">Páginas: 4</SPAN>
            <br> &nbsp;|&nbsp;
            <span class="menu4"><b>1</b></span>&nbsp;|&nbsp;<a href="javascript:trocaPagina(2);" style="font-weight: bold;color: #000000; text-decoration: underline;" class="LinkNavActive">2</a>&nbsp;|&nbsp;<a href="javascript:trocaPagina(3);" style="font-weight: bold;color: #000000; text-decoration: underline;" class="LinkNavActive">3</a>&nbsp;|&nbsp;<a href="javascript:trocaPagina(4);" style="font-weight: bold;color: #000000; text-decoration: underline;" class="LinkNavActive">4</a>&nbsp;|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="javascript:trocaPagina(2);" style="font-weight: bold;color: #000000; text-decoration: underline;" class="menu4">Próx.</a>&nbsp;&nbsp;&nbsp;:&nbsp;&nbsp;<a href="javascript:trocaPagina(4);" style="font-weight: bold;color: #000000; text-decoration: underline;" class="menu4">Final</a>
        </div>
        ]]></DadosRetorno>
</Ajax>

It has a few more tags of header and footer as this will replace it. By debug it is precisely this replace that takes 5 ~ 6min.

Would anyone know why so much delay? Can anyone indicate a REGEX better?

    
asked by anonymous 19.08.2015 / 15:36

1 answer

4

Maybe change the quantifier [^\~]*? to [^\~]* solve.

The non-greasy quantizer *? ( lazy ) causes that for each married character, the search will test the rest of the regular expression, so the delay.

By using a "greedy" quantizer with% greedy , the regular expression will search the group in question for all characters up to the end of the string or until a character does not match, and then "back" by searching the rest of the regular expression backwards.

But because it is XML, it is recommended to use an XML interpreter and not a regular expression

    
20.08.2015 / 13:49