Decrease field value span with preg_replace

6

I'm trying to change all the values of fields containing span class .

Example the site looks like this:

<div id="isOffered">
   <a class="price addBetButton footballBetButton" id="bk_82285689_mk_sel1" href="">
   <span class="priceText wide UK">1/2</span>
   <span class="priceText wide EU">1.50</span>
   <span class="priceText wide US">-200</span>
   <span class="priceText wide CH">1.50</span>
   <span class="priceChangeArrow"></span>
   <input class="betCode" type="hidden" value="0]SK@82285689@314222649@NB*1~2*0*-1*0*0]CPN:0" />
   <input class="originalBetCode" type="hidden" value="0]SK@82285689@314222649@NB*1~2*0*-1*0*0]CPN:0" /> 
   </a>
</div>

What is the easiest way for me to retrieve the 1.50 , 200 , 1.50 values and decrease the original value using 20 preg_replace ?

    
asked by anonymous 23.02.2015 / 01:56

4 answers

9

As already mentioned in the Tivie response , regular expressions are not recommended for analyze a structure like HTML , in addition it is not a regular language , do not use regex when there are better tools that can do this work.

Read more about this in this article: Regular Expressions : You now have two issues

I'll follow the same path as Tivie and use DOMDocument and DOMXPath to parse HTML , but another parser , such as the Simple HTML DOM Parser for example.

$url = "paginahtml.html";         // Link da página
$outputFile = "novapagina.html";  // Arquivo onde será salvo as modificações

$html = file_get_contents($url); // Pega o conteúdo da página

$DOM =  new DOMDocument();
$DOM->loadHTML($html);

$xpath = new DomXpath($DOM);

$prices = $xpath->query('//*[contains(concat(" ", normalize-space(@class), " "), "priceText ")]');
$percent = 20.0 / 100.0; // 20%

foreach($prices as $price){
    $value = $price->nodeValue;
    $floatValue = floatval($value);
    $finalValue = $floatValue - ($percent * $floatValue);
    $price->nodeValue = $finalValue; // Salva o valor final com desconto de 20%
}

file_put_contents($outputFile, $DOM->saveHTML()); // Salva as modificações
echo "Done!";

DEMO

The above example uses the file_get_contents function to get the content of the page and save the changes to a new file with file_put_contents .

The code worked as expected by passing the link from the provided page in this comment . The expression used in query will return the desired results if the node current display part of the name of the class attribute, in this case priceText , with the function normalize-space of Xpath we replace surplus spaces with a single space and so validating the expression.

To view the changes on the screen you can use echo .

echo $DOM->saveHTML();
    
26.02.2015 / 02:51
6

Analyzing HTML with regex is a bad option and can lead to insanity . There are many ways regex can fail to read HTML (eg TAGS in uppercase, spaces between classes, extra lines between html elements, etc ...)

Regex stands for "Regular Expression", regular expression, and HTML is not a regular language. Invariably will break sometime ...

That said ...

The best way is to use a true parser. Fortunately, there are several options in PHP.

I'd advise using DOMDocument and DOMXPath that are included in PHP by default. Here's an example:

HTML

$html = '
<html>
<head></head>
<body>
    <div id="isOffered">
       <a class="price addBetButton footballBetButton" id="bk_82285689_mk_sel1" href="">
           <span class="priceText wide UK">1.2</span>
           <span class="priceText wide EU">1.50</span>
           <span class="priceText wide US">200</span>
           <span class="priceText wide CH">1.50</span>
           <span class="priceChangeArrow"></span>
           <input class="betCode" type="hidden" value="0]SK@82285689@314222649@NB*1~2*0*-1*0*0]CPN:0" />
           <input class="originalBetCode" type="hidden" value="0]SK@82285689@314222649@NB*1~2*0*-1*0*0]CPN:0" /> 
       </a>
    </div>
</body>
</html>';

PHP code

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);

//Lista de spans filhos de div"isOffered"->a
//So lista as divs que contenham a class 'pricetext'
$nodeList = $xpath->query("*/div[@id='isOffered']/a/span[contains(concat(' ', @class, ' '), ' priceText ')]");

foreach ($nodeList as $node) {
    if ($node instanceof \DOMElement) {
        // Le o valor do span e transforma num inteiro
        $value = (float) $node->nodeValue;

        // Altera o valor do span
        $node->nodeValue = $value * 0.8;
        var_dump($node->nodeValue);
    }
}

//salva as alterações feitas ao documenthtml
//e guarda na variavel newHtml
$newHtml = $doc->saveHtml();

To prevent DOMDOcument from choking on bad HTML documents, you can add this line at the beginning of your code:

libxml_use_internal_errors(true) AND libxml_clear_errors();
    
23.02.2015 / 04:14
3

I also recommend using the parser but just for fun it's a beta version using regex

<?php

$html = <<<XXX
<div id="isOffered">
   <a class="price addBetButton footballBetButton" id="bk_82285689_mk_sel1" href="">
   <span class="priceText wide UK">1/2</span>
   <span class="priceText wide EU">1.50</span>
   <span class="priceText wide US">-200</span>
   <span class="priceText wide CH">1.50</span>
   <span class="priceChangeArrow"></span>
   <input class="betCode" type="hidden" value="0]SK@82285689@314222649@NB*1~2*0*-1*0*0]CPN:0" />
   <input class="originalBetCode" type="hidden" value="0]SK@82285689@314222649@NB*1~2*0*-1*0*0]CPN:0" /> 
   </a>
</div>
XXX;

$re = "/(span.*pricetext.*>)([\d\/.-]+)/im";

$ret = preg_replace_callback($re, function($matches){
    $matches[2] = ((float)$matches[2]) * .8;
    return $matches[1] . $matches[2];
}, $html);

echo $ret;

link

    
26.02.2015 / 17:35
0

And now a clandestine and pornographic answer (Perl:)

with regular expressions:

perl -pe 's/<span.*?priceText.*?>\K(.+?)(?=<)/$1*0.8/e' span.xml

With xml parser:

#!/usr/bin/perl
use XML::DT;
my $filename = shift or die("Erro: usage $0 file.html\n");

print dt($filename, 
            span => sub{$c *= 0.8 if $v{class} =~ /^pricetext/i; toxml },
            -html => 1,
        );

# $c - contents after child processing
# %v - hash of attributes
    
07.04.2015 / 16:20