Compare strings with accents, UTF-8

2

Something is escaping me here, I'm doing the curl to a weather page, if the results have accents and comparing with exactly the same string in full this condition returns false (not equal). This is purely for testing:

function get_page($url) {

   $curl = curl_init();
   curl_setopt($curl, CURLOPT_RETURNTRANSFER, True);
   curl_setopt($curl, CURLOPT_URL, $url);
   /*curl_setopt($curl, CURLOPT_TIMEOUT_MS, 1000);*/
   $return = curl_exec($curl);
   curl_close($curl);
  return $return;

}

$weather = get_page("http://www.accuweather.com/pt/pt/cascais/274007/weather-forecast/274007");

preg_match('/<span class="cond">(.*?)<\/span>/s', $weather, $cond);
preg_match('/<strong class="temp">(.*?)<span>/s', $weather, $temp);
$condition = trim($cond[1]); //Céu Limpo (hoje)
$temp = trim($temp[1]); //27 (hoje)

In today's case (06-30-2015) the condition we have is "Clear Sky", but when I test the following condition:

if(strtolower($condition) == "céu limpo") {
   ....
}

Returns false (the commands within if are not executed)

But if you do:

$hey = "Céu Limpo";
if(strtolower($hey) == "céu limpo") {
   ....
}

It already returns true and the code within the condition is already executed. I would like to know why this and how to solve

    
asked by anonymous 30.06.2015 / 17:10

2 answers

2

Your problem is related to Html Entities , if you do it here:

$arrayCondition = str_split($condition);
$arrayString = str_split("Céu Limpo");
var_dump($arrayCondition);
var_dump($arrayString);

You'll notice the difference in their output:

  

array (14) {[0] = > string (1) "C" [1] = > string (1) "&" [2] = > string (1) "#" [3] = > string (1) "2" [4] = > string (1) "3" [5] = > string (1) "3" [6] = > string (1) ";" [7] = > string (1) "u" [8] = > string (1) "" [9] = > string (1) "L" [10] = > string (1) "i" [11] = > string (1) "m" [12] = > string (1) "p" [13] = > string (1) "or"}

     

array (10) {[0] = > string (1) "C" [1] = > string (1) " " [2] = > string (1) " " [3] = > string (1) "u" [4] = > string (1) "" [5] = > string (1) "L" [6] = > string (1) "i" [7] = > string (1) "m" [8] = > string (1) "p" [9] = > string (1) "or"}

The first one that comes from your cURL is coming with HtmlEntities , your letter is is coming with the value &#233;

To solve this you can use the html_entity_decode , example:

if (strtolower(html_entity_decode($condition)) == "céu limpo") {
    echo 'funcionou!!!';
}
    
30.06.2015 / 17:50
1

The problem occurs because the CURL response is probably not UTF-8.

For this try converting to utf-8 before comparing.

  

Functions you can use: utf8_encode or utf8_decode

Example:

if(strtolower(utf8_encode($condition)) == "céu limpo") {
....
}
//Caso não funciona tente ao contrario as vezes seu arquivo não esta em utf-8

if(strtolower(utf8_decode($condition)) == "céu limpo") {
....
}
    
30.06.2015 / 17:44