PHP decoding of characters in an IMAP connection

3

Introduction

I'm working on an email box where I'll later have to filter the messages by senders. But the problem is in the coding of some "subjects".

  • I make a connection to the mail server through the imap_open function;

    $mail_box = imap_open("{" . $incoming_server . ":" . $port . "/imap/ssl/novalidate-cert}INBOX", $username, $password) or die();
    
  • Then, I get the header information through the imap_headerinfo

    $header = imap_headerinfo($mail_box, $num_da_mensagem);
    
  • Between these two steps I do not manipulate anything. Everything has been sorted internally via PHP itself.

    Difficulty

    The problem is that when I give a print_r in this $ header ['subject'] return some records will bring a string encoded like this:

    [subject] => =?utf-8?B?UkVTOiBSRVM6IFtFWFRFUk5BTF0gUmU6IEluZm9ybWHDp8O1ZXMgc29icmUg?= =?utf-8?B?YSBBdGl2YcOnw6NvIGRvcyBQcm9kdXRvcyBlIFNlcnZpw6dvcyBDb250cmF0?= =?utf-8?Q?ados_-_WJINTERNET?=
    

    To decode I tried to use htmlentities and another custom function that I explain below.

    function convert_encoding ($string, $to_encoding, $from_encoding = '')  {
    if ($from_encoding == '')
        $from_encoding = $this->detect_encoding($string);
    
    if ($from_encoding == $to_encoding)
        return $string;
    
    return mb_convert_encoding($string, $to_encoding, $from_encoding);
    }
    
    function detect_encoding($string){
    if (preg_match('%^(?: [\x09\x0A\x0D\x20-\x7E] | [\xC2-\xDF][\x80-\xBF] | \xE0[\xA0-\xBF][\x80-\xBF] | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} | \xED[\x80-\x9F][\x80-\xBF] | \xF0[\x90-\xBF][\x80-\xBF]{2} | [\xF1-\xF3][\x80-\xBF]{3} | \xF4[\x80-\x8F][\x80-\xBF]{2} )*$%xs', $string))
        return 'UTF-8';
    
    return mb_detect_encoding($string, array('UTF-8', 'ASCII', 'ISO-8859-1', 'JIS', 'EUC-JP', 'SJIS'));
    }
    

    So, it would look like this: convert_encoding ($ header ['subject'], 'UTF-8');
    But ... nothing happens. Certainly because it is not a codification but a predefined (suspect) format. Therefore it is appropriate to say that I have not yet understood the reason for having some normal records and others like this.

    What I need

  • I would like to know why some messages are coming with the subject and some are not. Understanding the root of the problem can help me see a different horizon to arrive at a workable solution.

  • If it is a purely technical problem, if possible, what technique can I use to try to convert this encoding to something readable?

  • asked by anonymous 12.09.2018 / 16:07

    1 answer

    2

    Every subject of e-mail that has special characters is encoded. They are the accented letters, the cedilla and etc.

    I do not know exactly why this encoding, because I do not know the SMTP protocol deeply. Then I can give a study to improve this answer.

    There is a PHP function that performs all the heavy decoding work.

    iconv_mime_decode

    See the result:

    <?php
    $assunto = '=?utf-8?B?UkVTOiBSRVM6IFtFWFRFUk5BTF0gUmU6IEluZm9ybWHDp8O1ZXMgc29icmUg?= =?utf-8?B?YSBBdGl2YcOnw6NvIGRvcyBQcm9kdXRvcyBlIFNlcnZpw6dvcyBDb250cmF0?= =?utf-8?Q?ados_-_WJINTERNET?=';
    
    echo iconv_mime_decode($assunto);
    ?>
    

    The above code returns:

    RES: RES: [EXTERNAL] Re: Informações sobre a Ativação dos Produtos e Serviços Contratados - WJINTERNET
    

    References: link

        
    15.09.2018 / 06:05
    Concatenate Strings in C ___ ___ erkimt Specifying the search engines update an HTML document? ______ qstntxt ___

    According to MDN using the %code% tag with attribute %code% %code% , allows search engines to know the date the document was created, and then displays this information in the Rich Snippet of the searches.

    Is it possible to indicate the update date for it?

        
    ______ azszpr332568 ___

    You can build your %code% by using the %code% tag to tell %code% that you want your content to be re-indexed, hourly, every day, or weekly for example.

    Here you can see the complete and recommended protocol for you to build your %code% , notice that you can determine how regularly your content is reindexed: link

    The frequency with which the page changes. This value provides general information for search engines and may not match the frequency of page indexing. Valid values are:

    • always
    • hourly
    • daily
    • weekly
    • monthly
    • yearly
    • never

    The "always" value should be used to describe documents that always change when accessed. The value "never" should be used to describe the archived URLs.

    Note that the value of this tag is considered a %code% and not a command.

    Although you can not be totally sure that Google will consider this tag to reindex its contents %code% or %code% for example

      

    "If the site pages are properly linked, the normal   Web crawlers can detect most of your   site. ", but" The use of the sitemap does not guarantee that all items in it   will be crawled and indexed because Google processes are   based complex algorithms to program the tracking. However, the   sitemap benefits the site in most cases, and you will never be   penalized for using it. "

    Source: link

    Otherwise, you will not be able to completely stop %code% , in a case of urgency you can manually request the reindexing of a URL. For example if you make a security update on the contacts page you can ask Google to do a reindexing of your page. Here you can learn more about this: link

    To manually add a URL (search tb "Fetch as Google") : link

    To index by Search Console:

    YoucanstillrequesttoreindextheentiresitethroughtheSerchConsole!

        
    ___