Introduction
I'm working on an email box where I'll later have to filter the messages by senders. But the problem is in the coding of some "subjects".
I make a connection to the mail server through the imap_open function;
$mail_box = imap_open("{" . $incoming_server . ":" . $port . "/imap/ssl/novalidate-cert}INBOX", $username, $password) or die();
Then, I get the header information through the imap_headerinfo
$header = imap_headerinfo($mail_box, $num_da_mensagem);
Between these two steps I do not manipulate anything. Everything has been sorted internally via PHP itself.
Difficulty
The problem is that when I give a print_r in this $ header ['subject'] return some records will bring a string encoded like this:
[subject] => =?utf-8?B?UkVTOiBSRVM6IFtFWFRFUk5BTF0gUmU6IEluZm9ybWHDp8O1ZXMgc29icmUg?= =?utf-8?B?YSBBdGl2YcOnw6NvIGRvcyBQcm9kdXRvcyBlIFNlcnZpw6dvcyBDb250cmF0?= =?utf-8?Q?ados_-_WJINTERNET?=
To decode I tried to use htmlentities and another custom function that I explain below.
function convert_encoding ($string, $to_encoding, $from_encoding = '') {
if ($from_encoding == '')
$from_encoding = $this->detect_encoding($string);
if ($from_encoding == $to_encoding)
return $string;
return mb_convert_encoding($string, $to_encoding, $from_encoding);
}
function detect_encoding($string){
if (preg_match('%^(?: [\x09\x0A\x0D\x20-\x7E] | [\xC2-\xDF][\x80-\xBF] | \xE0[\xA0-\xBF][\x80-\xBF] | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} | \xED[\x80-\x9F][\x80-\xBF] | \xF0[\x90-\xBF][\x80-\xBF]{2} | [\xF1-\xF3][\x80-\xBF]{3} | \xF4[\x80-\x8F][\x80-\xBF]{2} )*$%xs', $string))
return 'UTF-8';
return mb_detect_encoding($string, array('UTF-8', 'ASCII', 'ISO-8859-1', 'JIS', 'EUC-JP', 'SJIS'));
}
So, it would look like this: convert_encoding ($ header ['subject'], 'UTF-8');
But ... nothing happens. Certainly because it is not a codification but a predefined (suspect) format. Therefore it is appropriate to say that I have not yet understood the reason for having some normal records and others like this.
What I need
I would like to know why some messages are coming with the subject and some are not. Understanding the root of the problem can help me see a different horizon to arrive at a workable solution.
If it is a purely technical problem, if possible, what technique can I use to try to convert this encoding to something readable?