Base62 encoding

6

I'd like to know where I can get some PHP implementation, similar to PHP's MIME Base64 but only contain the A-Z, a-z, and 0-9 characters.

PHP Base64 is quite versatile, but I need an algorithm that does not include the characters + , / , - and = . I know I can replace the mentioned characters for a URL , which is not even the case, but I really wanted a direct encoding algorithm.

Number encoding for base62 , is linear, but you wanted to encode a binary PHP string. Important to do encode and decode .

Can anyone tell me some practical implementation?

    
asked by anonymous 16.01.2015 / 16:12

2 answers

4

The implementation follows.

  • Implemented in two different languages: PHP and Java.
  • Lets you specify the alphabet in the constructor.
  • The size of the alphabet is obtained from the alphabet itself.
  • Must work for any alphabet size> 2 and < 256.
  • The operation of encode consists of interpreting the String of entry as a base-256 number to be converted to BigInteger . Then, BigInteger is converted to a String on base-62 (or any other, according to the given alphabet).
  • The operation of decode is only the inverse of encode . It receives String as if it were a base-62 (or any other) number, converts to BigInteger , and then converts BigInteger to% base_25%.

PHP:

Here is the code:

<?php

include('Math/BigInteger.php');

class BaseN {

    private $base;
    private $radix;
    private $bi256, $one, $zero;

    function __construct($base) {
        $this->base = $base;
        $this->radix = new Math_BigInteger(strlen($base));
        $this->bi256 = new Math_BigInteger(256);
        $this->zero = new Math_BigInteger(0);
        $this->one = new Math_BigInteger(1);
    }

    public function encode($text) {
        $big = $this->one;
        for ($j = 0; $j < strlen($text); $j++) {
            $big = $big->multiply($this->bi256)->add(new Math_BigInteger(ord($text[$j])));
        }
        $result = "";
        while (!$this->zero->equals($big)) {
            $parts = $big->divide($this->radix);
            $small = intval($parts[1]->toString());
            $big = $parts[0];
            $result = $this->base[$small] . $result;
        }
        return $result;
    }

    public function decode($text) {
        $big = $this->zero;
        for ($j = 0; $j < strlen($text); $j++) {
            $i = strpos($this->base, $text[$j]);
            $big = $big->multiply($this->radix)->add(new Math_BigInteger($i));
        }
        $result = "";
        while (!$this->zero->equals($big)) {
            $parts = $big->divide($this->bi256);
            $small = $parts[1]->toBytes();
            $big = $parts[0];
            $result = $small . $result;
        }
        return substr($result, 1);
     }
}

?>

How to use:

// Passa o alfabeto como parâmetro. Tem 62 caracteres aqui, então são 62 símbolos.
$k = new BaseN("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
$x = "The quick brown fox jumps over a lazy dog";
echo $x . "\n";
$c = $k->encode($x);
echo $c . "\n"; // Escreve "1u9WLfG65OMtVkQWPtWDcC6o8IjI5td5l9DzpilIK4Nyx81tKLRrStPj"
$d = $k->decode($c);
echo $d . "\n"; // Escreve "The quick brown fox jumps over a lazy dog"

See here in ideone (do not be scared by the size of the code, I had to put the whole String class there).

In Java

And, if anyone is interested, I've also implemented Java. Here is the code:

import java.util.List;
import java.math.BigInteger;
import java.util.ArrayList;
import java.util.Collections;

/**
 * @author Victor
 */
public class BaseN {
    private static final BigInteger BI_256 = BigInteger.valueOf(256);

    private final String base;
    private final BigInteger radix;

    public BaseN(String base) {
        this.base = base;
        this.radix = BigInteger.valueOf(base.length());
    }

    public String encode(String text) {
        byte[] bytes = text.getBytes();
        BigInteger big = BigInteger.ONE;
        for (byte b : bytes) {
            big = big.multiply(BI_256).add(BigInteger.valueOf(b));
        }
        StringBuilder sb = new StringBuilder(bytes.length * 4 / 3 + 2);
        while (!BigInteger.ZERO.equals(big)) {
            BigInteger[] parts = big.divideAndRemainder(radix);
            int small = parts[1].intValue();
            big = parts[0];
            sb.append(base.charAt(small));
        }

        return sb.reverse().toString();
    }

    public String decode(String text) {
        BigInteger big = BigInteger.ZERO;
        for (char c : text.toCharArray()) {
            int i = base.indexOf(c);
            if (i == -1) throw new IllegalArgumentException();
            big = big.multiply(radix).add(BigInteger.valueOf(i));
        }

        List<Byte> byteList = new ArrayList<>(text.length());
        while (!BigInteger.ZERO.equals(big)) {
            BigInteger[] parts = big.divideAndRemainder(BI_256);
            int small = parts[1].intValue();
            big = parts[0];
            byteList.add((byte) small);
        }
        Collections.reverse(byteList);

        byte[] r = new byte[byteList.size() - 1];
        int i = 0;
        for (Byte b : byteList) {
            if (i > 0) r[i - 1] = b;
            i++;
        }
        return new String(r);
    }
}

How to use:

public class Main {
    public static void main(String[] args) {
        // Passa o alfabeto como parâmetro. Tem 62 caracteres aqui, então são 62 símbolos.
        BaseN bn = new BaseN("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
        String x = "The quick brown fox jumps over a lazy dog";
        System.out.println(x);
        String a = bn.encode(x);
        System.out.println(a); // Escreve "1u9WLfG65OMtVkQWPtWDcC6o8IjI5td5l9DzpilIK4Nyx81tKLRrStPj"
        String b = bn.decode(a);
        System.out.println(b); // Escreve "The quick brown fox jumps over a lazy dog"
    }
}

See here at ideone .

    
16.01.2015 / 22:59
1

You can not have something represented in base64 with just this range of characters (AZ, az and 0-9) because this range has only 62 characters and base64 requires 64 different representations .

p>

So if you do not want the + and / characters in your base64 representation, you need to replace them with something else outside this range. You'll have to choose a replacement for = also because it may appear in a base64 representation to complete the last block size.

What has been used in practice, when it is necessary for example to include a base64 representation in a URL, is to replace the set + / = } by { - _ ,

In a quick search, it seemed to me that PHP does not natively have a function for this, so you'll have to implement your own.

Even if you do not intend to use it in URL, this idea should serve:

function base64url_encode($plainText) {

    $base64 = base64_encode($plainText);
    $base64url = strtr($base64, '+/=', '-_,');
    return $base64url;   
}

function base64url_decode($plainText) {

    $base64url = strtr($plainText, '-_,', '+/=');
    $base64 = base64_decode($base64url);
    return $base64;   
}

Update: It just occurred to me that you can convert your bytes to Hexadecimal , which is represented only by 0-9 and A-F. The resulting string is much larger than the base64 representation, but it might serve you. I do not know PHP function that does this but the logic of converting bytes to hexadecimal is quite simple.

    
16.01.2015 / 17:12