How to get parts of a text in php

1

How to get text from part

  ASUS Motherboard for Intel LGA 1151 ATX ROG STRIX Z270E GAMING, DDR4, Aura Sync, Gamer Audio, Intel Network, SLI / CFX, Wi-Fi, USB 3.1 Front, HDMI / DP "

and

  

"price": 1095.9

Remember that depending on the link, the name and price will be different, but always have name. * and price. *

$texto = "
string(43488) "HTTP/1.1 200 OK
Etag: "a6152a2c"
Content-Type: text/html; charset=ISO-8859-1
Content-Length: 188487
X-TIME: 1493043126.194
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
Access-Control-Allow-Origin: *
Cache-Control: max-age=219, public
Expires: Mon, 24 Apr 2017 14:17:06 GMT
Date: Mon, 24 Apr 2017 14:13:27 GMT
Set-Cookie: incap_ses_297_582873=HDYgfiL0VibiGqTihigfBAcI/lgAAAAAOgjiY0SVKeRwJpG/EqcKgg==; path=/; Domain=.kabum.com.br
Set-Cookie: ___utmvmPwutOXo=yrlhYeFzkwB; path=/; Max-Age=900
Set-Cookie: ___utmvaPwutOXo=XmSBlTw; path=/; Max-Age=900
Set-Cookie: ___utmvbPwutOXo=pZE
    XNfOValo: vtJ; path=/; Max-Age=900
X-Iinfo: 5-61671947-0 0CNN RT(1493043207095 0) q(0 -1 -1 -1) r(0 -1)
X-CDN: Incapsula

      window.lpTag=window.lpTag||{};if(typeof window.lpTag._tagCount==='undefined'){window.lpTag={site:'85687252'||'',section:lpTag.section||'',autoStart:lpTag.autoStart===false?false:true,ovr:lpTag.ovr||{},_v:'1.6.0',_tagCount:1,protocol:'https:',events:{bind:function(app,ev,fn){lpTag.defer(function(){lpTag.events.bind(app,ev,fn);},0);},trigger:function(app,ev,json){lpTag.defer(function(){lpTag.events.trigger(app,ev,json);},1);}},defer:function(fn,fnType){if(fnType==0){this._defB=this._defB||[];this._defB.push(fn);}else if(fnType==1){this._defT=this._defT||[];this._defT.push(fn);}else{this._defL=this._defL||[];this._defL.push(fn);}},load:function(src,chr,id){var t=this;setTimeout(function(){t._load(src,chr,id);},0);},_load:function(src,chr,id){var url=src;if(!src){url=this.protocol+'//'+((this.ovr&&this.ovr.domain)?this.ovr.domain:'lptag.liveperson.net')+'/tag/tag.js?site='+this.site;}var s=document.createElement('script');s.setAttribute('charset',chr?chr:'UTF-8');if(id){s.setAttribute('id
      $(document).ready(function() {
        $('#carousel').flexslider({
              animation: 'slide',
              animationSpeed: 300,
              slideshowSpeed: 4000,
              controlNav: false,
              animationLoop: false,
              slideshow: false,
              itemWidth: 64,
              itemMargin: 5,
              asNavFor: '#slider',
              start:function(slider){
                  $('#slider .flex-direction-nav').remove();
                  $("#imagem-slide li").gkzoom();
              }
          });

          $('#slider').flexslider({
              animation: 'fade',
              animationSpeed: 300,
              controlNav: false,
              animationLoop: false,
              slideshow: false,
              sync: "#carousel",
              start: function(slider){
                 if ($('ul.slides li').size() < 11) {
                       $('ul.flex-direction-nav').remove();
                 }
              }
          });
      });
      $(document).ready(function(){

        var add_dias_uteis = function(date, dias) {
                var copiedDate = new Date(date.getTime());
                var dias_corridos = 0;
                for(i = 0; i < dias; i) {
                    copiedDate.setDate(copiedDate.getDate()+1);
                    if (!(copiedDate.getDay() == 0 || copiedDate.getDay() == 6)) {
                        i++;
                    }
                    dias_corridos++
                }
                date.setDate(date.getDate() + dias_corridos);

                return date;
            };

        $('.cep').mask('99999-999');
        var PATH = 'http://'+window.location.host;

          $("#calcula_frete").on('submit', function(ev){
              if($("#calc_cep").val().length == 9){
                  ev.preventDefault();
                  var id = "#janela1";
                  $('#table-calcular').html("");
                  $("#agendamento_texto").html("");
                  $('#table-cal');
                  var alturaTela = $(document).height();
                  var larguraTela = $(window).width();

                            if(value.valor == 0) {

          dataLayer = [{"productsShelf":[],"productsDetail":[{"position":"1","name":"Placa-M�e ASUS p/ Intel LGA 1151 ATX ROG STRIX Z270E GAMING,DDR4,Aura Sync, �udio Gamer, Rede Intel, SLI/CFX, Wi-Fi, USB 3.1 Frontal,HDMI/DP","category":"Hardware/Placas-m�e/P/ Processador Intel/ASUS","brand":"Asus;","price":1095.9,"id":"84264","available":true}],"visitor":"","pageType":"product","breadcrumb":[{"url":"http://www.kabum.com.br/hardware","name":"Hardware"},{"url":"http://www.kabum.com.br/hardware/placas-mae","name":"Placas-m�e"},{"url":"http://www.kabum.com.br/hardware/placas-mae/p-processador-intel","name":"P/ Processador Intel"},{"url":"http://www.kabum.com.br/hardware/placas-mae/p-processador-intel/asus","name":"ASUS"}]}];

    ";
    
asked by anonymous 24.04.2017 / 16:24

3 answers

2

Final code

<?php

function getKabum($urlCompleta) {

    libxml_use_internal_errors(true) and libxml_clear_errors();
    $header = "X-Forwarded-For: {$_SERVER['REMOTE_ADDR']}";
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, "$urlCompleta");
    curl_setopt($ch, CURLOPT_REFERER, "http://www.kabum.com.br");
    curl_setopt($ch, CURLOPT_HTTPHEADER, array($header));
    curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $html = curl_exec($ch);
    $DOM = new DOMDocument();
    $DOM->loadHTML($html);
    $xpath = new DomXpath($DOM);


    $titulo = $xpath->query('//h1[@class="titulo_det"]')->item(0);
    $preco = $xpath->query('//span[@class="preco_desconto"]')->item(0);
    if (empty($titulo->nodeValue)) {
        preg_match('/(\"productsDetail\"\:\[{\"position\":\"1\",\"name\":\"[^\"]+\")/', $DOM->textContent, $t);
        preg_match('/(\"productsDetail\"\:\[{\"position\":\"1\",\"name\":\".*?\"),(\"price\":\d*[\.|\,]*\d*)/', $DOM->textContent, $output_array);        
        $title =  substr($t[1], 42, -1);
        $price =  substr($output_array[2],8);

        $titulo->nodeValue = $title;
        $preco->nodeValue = $price;
//    print'<pre>';
//    var_dump($DOM);
//    print'</pre>';
    }
    $retorno = array("titulo" => $titulo->nodeValue, "preco" => $preco->nodeValue);
    return $retorno;
}

$produto [] = getKabum("http://www.kabum.com.br/cgi-local/site/produtos/descricao.cgi?codigo=84264");
$produto [] = getKabum("http://www.kabum.com.br/cgi-local/site/produtos/descricao.cgi?codigo=84404");
//aqui o curl montou diferente
$produto [] = getKabum("http://www.kabum.com.br/cgi-local/site/produtos/descricao.cgi?codigo=75332");
$produto [] = getKabum("http://www.kabum.com.br/cgi-local/site/produtos/descricao.cgi?codigo=63735");
$produto [] = getKabum("http://www.kabum.com.br/cgi-local/site/produtos/descricao.cgi?codigo=85198");
$produto [] = getKabum("http://www.kabum.com.br/cgi-local/site/produtos/descricao.cgi?codigo=41620");
$produto [] = getKabum("http://www.kabum.com.br/cgi-local/site/produtos/descricao.cgi?codigo=34217");
$produto [] = getKabum("http://www.kabum.com.br/cgi-local/site/produtos/descricao.cgi?codigo=77987");
$produto [] = getKabum("http://www.kabum.com.br/cgi-local/site/produtos/descricao.cgi?codigo=63327");

foreach ($produto as $value) {
    if ($value['titulo'] == '') {
        print_r($value);
    }
    print $value['titulo'];
    print "<h1>" . $value['preco'] . "</h1><hr>";
}
    
26.04.2017 / 19:29
3

You should remember that in a regex there will always be a delimiter, when you cite that the files will have name. * and price. * is not enough to solve your problems, this only defines where the regex should start fetching.

  

You should always inform the desired result and also mention the   inconsistencies that can be found.

Just say "remembering that depending on the link, the name and price will be different, but always have name. * and price. *" is not enough, but I did a more generic answer to your problem, try:

("name":".*?")("price":\d*[\.|\,]*\d*)

In short, the first Capture Group : ("name":".*?") captures any number of characters including special characters that have "name":" before them and end with "

The second ("price":\d*[\.|\,]*\d*) captures any number of digits (1-9) after "price": that may have as a separator . or , to decimal

    
24.04.2017 / 22:34
2

Viewing your text for analysis along with the PHP tag, I imagine you are doing a CURL.

Suggestion PARSER

The ideal in such cases, because it is an HTML analysis, is to use a parser. If so, I suggest Simple Html Parser .

Suggestion JSON parser

By analyzing the HTML context exactly with what you want, you can verify that it is extracting a given present in a JSON.

dataLayer = [{"productsShelf":[],"productsDetail":[{"position":"1","name":"Placa-M�e ASUS p/ Intel LGA 1151 ATX ROG STRIX Z270E GAMING,DDR4,Aura Sync, �udio Gamer, Rede Intel, SLI/CFX, Wi-Fi, USB 3.1 Frontal,HDMI/DP","category":"Hardware/Placas-m�e/P/ Processador Intel/ASUS","brand":"Asus;","price":1095.9,"id":"84264","available":true}],"visitor":"","pageType":"product","breadcrumb":[{"url":"http://www.kabum.com.br/hardware","name":"Hardware"},{"url":"http://www.kabum.com.br/hardware/placas-mae","name":"Placas-m�e"},{"url":"http://www.kabum.com.br/hardware/placas-mae/p-processador-intel","name":"P/ Processador Intel"},{"url":"http://www.kabum.com.br/hardware/placas-mae/p-processador-intel/asus","name":"ASUS"}]}];

I suggest working with analysis of it. To do this, just capture it and use json_decode($json, true) so the content becomes a array , and it becomes easier to work with.

Solution by REGEX

If you still want to do it for REGEX you can use:

("name":"[^"]+")|("price":(?:\d{1,3}.?)+[.,]\d{1,2})

See working at REGEX101 .

The fact of returning other name tags is that the search is not very specific with only "name": being the exact part.

    
25.04.2017 / 14:36