Base64 verification?

2

I'm doing an email application (HapiJS) and found that some emails have text that is encoded for base64, but others do not.

In this application I will need to receive emails from all the services (Gmail, Hotmail, ...) and I need to make a method to check if the text is in base64 or not, only to forward it to decode or direct to the client.

I have already looked a lot and so far I have not been able to find anything that works 100% as I need it, and since I'm new to programming, I still do not have enough knowledge to figure out how to do it myself ...

Code that I'm using to try to verify:

let base64 = /^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$/;

        let isBase64Valid = base64.test(mail.text); // base64Data is the base64 string

        if (isBase64Valid) {   
            // true if base64 formate
            console.log('base64');
        } else {
            // false if not in base64 formate
            console.log('String');
        }
    
asked by anonymous 29.08.2017 / 15:51

2 answers

0

Base64 strings have only az, AZ, 0-9, '+', '/' and '=' characters, ie if there are any characters other than these, such as a ' , then this string is not in base64. This is the test that should be done by regex.

Try changing the Regex to this one: ^ ([A-Za-z0-9 + /] {4}) * ([A-Za-z0-9 + /] {4} | [A-Za-z0-9 + /] {3 } = | [A-Za-z0-9 + /] {2} ==) $

To encode and decode use the btoa() functions to encode and atob() to decode. The following is a reference for the functions: JS base64 Encode and Decode

    
29.08.2017 / 18:52
1
Using javascript The most correct approach to check if a given {String} was encoded in base64 on the front end is enveloping in a block try\catch the return of the atob() compared to the encoded return itself since the VM of javascript of the browser will already throw an exception in case of failure.

Some examples here of the StackOverflow (Portuguese, English) community say that the following approach is most correct:

function isBase64(str) {
    try {
        return atob(str) ? true : false
    } catch(e) {
        return false
    }
}

However this approach is incorrect because the following example would return a false-positive :

isBase64('jgjhgj hg') // true

When in fact the return of the above example using atob() would be:

console.log(atob('jgjhgj hg')) // "á8'"
  

The most correct front-end approach

The correct one would be to "encode" the "decode" and compares the entry like this:

function isBase64(str) {
    try {
        return btoa(atob(str)) === str ? true : false
    } catch(e) {
        return false
    }
}

isBase64('jgjhgj hg') // false
  

On the backend (NodeJs)

There are no native functions in NodeJS as btoa() or atob() so it is very common to use third-party modules or use Buffer to achieve the same result

It's important to note that not all third-party libraries report exceptions or make a comparison against the entry, so it's easy to get false positives .

The following example uses Buffer to encode and decode in addition to checking against the entry:

function atob(str) {
    return new Buffer(str, 'base64').toString('binary');
}

function btoa(str) {
    let buffer;
    if ( str instanceof Buffer ) {
        buffer = str
    } else {
        buffer = new Buffer(str.toString(), 'binary')
    }
    return buffer.toString('base64')
}

function isBase64(str) {
    try {
         return btoa(atob(str)) === str ? true : false
    } catch(ex) {
        false
    }
}

Testing you can see that you do not report false positives :

console.log(isBase64('SGVsbG8gV29ybGQh')) // true

console.log(isBase64('jgjhgj hg')) // false
  

Using RegExp (opinionated)

If it is not possible to believe that the input (source) of {String} is actually encoded (and therefore the need for verification), the use of RegExp should not always be understood as "the best option" example expresses this question:

function isBase64(str) {
    return /^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$/.test(str)
}

isBase64('SGVsbG8gV29ybGQh') // true

isBase64('jgjhgj hg') // false

isBase64("regexnaofunciona") // true

isBase64("hoje") // true

isBase64("errado+tanto+faz") // true

The above expression is unsuccessful because it validates any {String} with length of 4 or multiple of 4.

Note that if it is not possible to state that the {String} of entry was actually encoded in base64 there is no guarantee that RegExp above does not validate it thus forming a "false-positive" .

    
18.12.2017 / 10:11