Read Feed (rss) description with PHP

0

I'm trying to get the description of the feed to display on a website I have to do, but when I try to get it it returns me as empty. When I see the source code of the link in the feed, the description is there:

<description><![CDATA[
    <div>
    <a href="http://eissomesmo.com.br/blog/e-dicas/"><img title="Eisso4" src="http://eissomesmo.com.br/blog/wp-content/uploads/2016/02/Eisso4.jpg"alt="É Dicas!" width="230"  height="230" /></a>
    </div>
    Para um conteúdo cumprir a sua função, deve ser feito adaptado para ser exibido em várias plataformas de mídia e pensado estrategicamente para atrair a atenção do público-alvo e mantê-lo. Este conteúdo pode assumir diversas formas como notícias, videos instrutivos, e-books, posts de blog, guias, artigos, perguntas e respostas, imagens, entre outros. Empresas que constroem]]></description>

But PHP returns me empty when I pull the description tag. I'm using this code:

$curl_handle=curl_init();
        curl_setopt($curl_handle, CURLOPT_URL,'http://eissomesmo.com.br/blog/feed/');
        curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
        curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($curl_handle, CURLOPT_USERAGENT, 'rss');
        $query = curl_exec($curl_handle);
        curl_close($curl_handle);

        $rss = new SimpleXmlElement($query);

        echo '<pre>';
        echo var_dump($rss->channel->item->description);
        echo '</pre>';

Feed is being generated by Wordpress. Link to the feed is link

    
asked by anonymous 29.03.2016 / 16:54

1 answer

1

Well, this is because within the XML some tags contain the CDATA commands, which as we know it serves to pass values that should not be interpreted by XML but rather as the values themselves, is for the XML not to be confused with HTML because they contain < & > ...

  

The problem is that PHP and its libraries that deal with XML have a bug that can not properly interpret CDATA, according to some forums including SOF if you update your XML libs this problem may work correctly.

But if you do not want to update one way of doing this, force PHP to merge the CDATA with the text using the LIBXML_NOCDATA inside the simplexml_load_string function at the time of reading for example.

<?php
$curl_handle = curl_init();
curl_setopt($curl_handle, CURLOPT_URL, "http://eissomesmo.com.br/blog/feed/");
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_handle, CURLOPT_USERAGENT, "rss");
$query = curl_exec($curl_handle);
curl_close($curl_handle);

$rss = simplexml_load_string($query, "SimpleXMLElement", LIBXML_NOCDATA);
$desc = $rss->channel->item->description;

preg_match('/<a href=\"([^\"]*)\"><[^>]*?src="([^"]+)"[^>]*><\/a>/isU', $rss->channel->item->description, $valores);

echo "<pre>";
print_r($valores);
echo "</pre>";
?>

EDIT: As your colleague's question was made via comment, I am adding a Regular Expression which extracts data that is within div with the help of the preg_match returning in an array like this:

 Array (
  [0] => <a href="http://eissomesmo.com.br/blog/e-pascoa-2/"><img title="post_relacionamento_3" src="http://eissomesmo.com.br/blog/wp-content/uploads/2016/03/post_relacionamento_3.jpg"alt="É Páscoa!" width="230"  height="220" /></a>
  [1] => http://eissomesmo.com.br/blog/e-pascoa-2/
  [2] => http://eissomesmo.com.br/blog/wp-content/uploads/2016/03/post_relacionamento_3.jpg
)

We tested this solution, but as I do not know the version of PHP you need to test.

    
29.03.2016 / 17:28