Polygonal Background
06 January, 2020
  • Posted By Charles Fol
  • php mt_rand mt_srand predict seed bruteforce
Full Article

Introduction

While performing a pentest on an old website, we encountered a piece of code that we had not seen in a long time:

function resetPassword($email)
{
    [...]
    $superSecretToken = md5($email . ':' . mt_rand());
    [...]
    $this->sendEmailWithResetToken($email, $superSecretToken);
}

A token, deemed secret and unguessable, was generated using an output from the PHP implementation of the Mersenne Twister.

An internal conversation ensued about the good old times of PRNG prediction and the mass of articles and tools it generated, early in the last decade. At the time, many applications used either rand() or mt_rand() to generate secret tokens or passwords. These functions are labeled cryptographically insecure, and as such should not be used for cryptographic purposes. In order to predict these values, you had to find out the seed, the original value that all of these were generated from. The consensus appeared to be that you needed to either obtain the whole internal state (624 values for mt_rand() !) or bruteforce the seed using part of those values.

However, one of our researchers was adamant that it was possible to recover the Mersenne Twister seed using only two outputs of the mt_rand() function, and without any kind of bruteforce. Nevertheless, we were unable to find any information supporting this theory, and his notes on the matter were long lost.

After crunching the numbers a little bit, and years after the PRNG-prediction circus, we proved him right. For old times' sake, we will now demonstrate how to do this.

Breaking PHP's Mersenne Twister, with 2 output values and some math

The first step in generating random numbers using mt_rand() is to use a seed, an unsigned int, to generate a state array of 624 values. This is done by either calling mt_srand($seed) or automatically, by PHP, upon requesting the first random number. After this, each call to mt_rand() will take the next state value, scramble it, and return it to the user.

Everything happens in the ext/standard/mt_rand.c file for modern PHP versions (PHP7+), or ext/standard/rand.c for older versions (PHP5-).

Knowing the value of the seed makes you able to predict every number that will be generated by the PRNG until another call to mt_srand() is made. Therefore, secret random values generated from one or several values of mt_rand() would not be secret anymore, leading to obvious security issues.

The random number generation process is a three steps process:

  1. Generate an initial state array by modular multiplication of the seed
  2. Scramble state values together to get a scrambled state array
  3. When mt_rand() is called, return a modified scrambled state value

For each step, we'll show the code, briefly describe how it works, and then how to work our way from the output to the input.

Note: PHP7+ changed the code responsible for generating the scrambled state by a bit (literally) (see both twist and twist_php #defines in mt_rand.c). This does not change much, as we are able to get the seed with each variation of the algorithm. Here, we focus on the original implementation, twist_php. The script provided at the bottom handles both versions.

Seed to initial state

#define MT_N (624)
#define N             MT_N                 /* length of state vector */
#define M             (397)                /* a period parameter */

static inline void php_mt_initialize(uint32_t seed, uint32_t *state)
{
    /* Initialize generator state with seed
       See Knuth TAOCP Vol 2, 3rd Ed, p.106 for multiplier.
       In previous versions, most significant bits (MSBs) of the seed affect
       only MSBs of the state array.  Modified 9 Jan 2002 by Makoto Matsumoto. */

    register uint32_t *s = state;
    register uint32_t *r = state;
    register int i = 1;

    *s++ = seed & 0xffffffffU;
    for( ; i < N; ++i ) {
        *s++ = ( 1812433253U * ( *r ^ (*r >> 30) ) + i ) & 0xffffffffU;
        r++;
    }
}

An initial state array of 624 values is created. The first state is the seed. Each subsequent state value is generated from the previous one.

Now, if we have knowledge of any of these state values, say state[123], can we work our way back to state[0], the seed ?

The python equivalent of the code is:

N = 624
M = 397

MAX = 0xffffffff

STATE_MULT = 1812433253

def php_mt_initialize(seed):
    """Creates the initial state array from a seed.
    """
    state = [None] * N
    state[0] = seed & 0xffffffff;
    for i in range(1, N):
        r = state[i-1]
        state[i] = ( STATE_MULT * ( r ^ (r >> 30) ) + i ) & MAX
    return state

Since the values are generated by iterations, the first thing we need to do is try and find out state[i-1] from i and state[i]. We can then repeat the operation on i-1, i-2, until we reach state[0], the seed. With S the state, and i in [1, 624], we have:

1

We can compute the modular multiplicative inverse of 1812433253 modulo 0x100000000:

2

And then:

3

In python, we get:

STATE_MULT_INV = 2520285293
MAX = 0xffffffff

def _undo_php_mt_initialize(s, i):
    s = (STATE_MULT_INV * (s - i)) & MAX
    return s ^ s >> 30

def undo_php_mt_initialize(s, p):
    """From an initial state value `s` at position `p`, find out seed.
    """
    for i in range(p, 0, -1):
        s = _undo_php_mt_initialize(s, i)
    return s

This means that if we get our hands on any initial state value, we can compute any other state value, and the original seed.

Initial state to scrambled state

#define hiBit(u)      ((u) & 0x80000000U)  /* mask all but highest   bit of u */
#define loBit(u)      ((u) & 0x00000001U)  /* mask all but lowest    bit of u */
#define loBits(u)     ((u) & 0x7FFFFFFFU)  /* mask     the highest   bit of u */
#define mixBits(u, v) (hiBit(u)|loBits(v)) /* move hi bit of u to hi bit of v */

#define twist(m,u,v)  (m ^ (mixBits(u,v)>>1) ^ ((php_uint32)(-(php_int32)(loBit(u))) & 0x9908b0dfU))

static inline void php_mt_reload(TSRMLS_D)
{
    /* Generate N new values in state
       Made clearer and faster by Matthew Bellew (matthew.bellew@home.com) */

    register php_uint32 *state = BG(state);
    register php_uint32 *p = state;
    register int i;

    for (i = N - M; i--; ++p)
        *p = twist(p[M], p[0], p[1]);
    for (i = M; --i; ++p)
        *p = twist(p[M-N], p[0], p[1]);
    *p = twist(p[M-N], p[0], state[0]);
    BG(left) = N;
    BG(next) = state;
}

The initial state array is scrambled by mixing-up values, giving a new state, the scrambled state. The first 226 (N-M) values are computed differently from the ones afterwards. In python, we have:

N = 624
M = 397

def php_mt_reload(s):
    for i in range(0, N - M):
        s[i] = _twist(s[i+M], s[i], s[i+1])
    for i in range(N - M, N - 1):
        s[i] = _twist(s[i+M-N], s[i], s[i+1])

def _twist_php(m, u, v):
    """Emulates the `twist` #define.
    """
    mask = 0x9908b0df if u & 1 else 0
    return m ^ (((u & 0x80000000) | (v & 0x7FFFFFFF)) >> 1) ^ mask

Here, we will show that we can deduce one of the initial states from two scrambled states.

A relation emerges from state 0 and state 227. If we name the initial state array by s and the new, scrambled one, by S, we have:

    S[227] = _twist(S[227 - 227], s[227], s[228])
<=> S[227] = _twist(S[0], s[227], s[228])
<=> S[227] = S[0] ^ (((s[227] & 0x80000000) | (s[228] & 0x7FFFFFFF)) >> 1) ^ (0x9908b0df if s[227] & 1 else 0)
<=> S[227] ^ S[0] = (((s[227] & 0x80000000) | (s[228] & 0x7FFFFFFF)) >> 1) ^ (0x9908b0df if s[227] & 1 else 0)

As an attacker, if we have knowledge of scrambled states S[0] and S[227], and XOR them, we get lots of information about initial state s[228], and some about s[227]. If we denote S[227] ^ S[0] by X, we have:

  • Since the left part of the XOR expression is right shifted, its MSB will always be empty. Therefore, the MSB of X is only determined by the right part, the mask, because MSB(0x9908b0df) is set. Hence MSB(X) = LSB(s[227]), and we can remove the mask from X if it is present.
  • Then, BIT(s[227], 31) = BIT(X, 30).
  • The rest of the value are bits from s[228], namely bits from 30 to 1.

To sum up, we have:

  • MSB and LSB of initial state value s[227]
  • Bits from 30 to 1 of initial state value s[228]

We therefore have 4 possible values for s[228]. But, since we know some bits of s[227] as well, we can compute a candidate s[227] from each of the four potential values for s[228] (see first part), and verify if the obtained s[227] value' LSB and MSB match the ones we have. Furthermore, we can obtain the seed associated with a candidate state s[228] (again, refer to first part as to how). We can then use this seed to regenerate a state, and check if it matches S[0]. Here's the python code that does just that:

def undo_php_mt_reload(S000, S227):
    # m S000
    # u S227
    # v S228
    X = S000 ^ S227

    # This means the mask was applied, and as such that S227's LSB is 1
    s22X_0 = bv(X, 31)
    # remove mask if present
    if s22X_0:
        X ^= 0x9908b0df

    # Another easy guess
    s227_31 = bv(X, 30)
    # remove bit if present
    if s227_31:
        X ^= 1 << 30

    # We're missing bit 0 and bit 31 here, so we have to try every possibility
    s228_1_30 = (X << 1)
    for s228_0 in range(2):
        for s228_31 in range(2):
            s228 = s228_0 | s228_31 << 31 | s228_1_30

            # Check if the results are consistent with the known bits of s227
            s227 = _undo_php_mt_initialize(s228, 228)
            if bv(s227, 0) != s22X_0:
                continue
            if bv(s227, 31) != s227_31:
                continue

            # Check if the guessed seed yields S000 as its first scrambled state
            rand = undo_php_mt_initialize(s228, 228)
            state = php_mt_initialize(rand)
            php_mt_reload(state)

            if not S000 == state[0]:
                continue

            return rand
    return None

To deduce the original seed, we used the relationship between scrambled states S[0] and S[227]. One can see that this relationship is valid for any couple of state values separated by 226 values. Therefore, with knowledge of two state values S[i] and S[i+227], and their offset i, we can recover the seed.

We can now check the last step of the mt_rand() PRNG.

Scrambled state value to mt_rand() output

When we call mt_rand(), PHP takes a value from the scrambled state array, plays with it a little bit, and returns it.

PHP_FUNCTION(mt_rand)
{
    zend_long min;
    zend_long max;
    int argc = ZEND_NUM_ARGS();

    if (argc == 0) {
        // genrand_int31 in mt19937ar.c performs a right shift
        RETURN_LONG(php_mt_rand() >> 1);
    }
    ...
}

PHPAPI uint32_t php_mt_rand(void)
{
    /* Pull a 32-bit integer from the generator state
       Every other access function simply transforms the numbers extracted here */

    register uint32_t s1;

    if (UNEXPECTED(!BG(mt_rand_is_seeded))) {
        php_mt_srand(GENERATE_SEED());
    }

    if (BG(left) == 0) {
        php_mt_reload();
    }
    --BG(left);

    s1 = *BG(next)++; // PICKS NEXT SCRAMBLED STATE VALUE
    s1 ^= (s1 >> 11);
    s1 ^= (s1 <<  7) & 0x9d2c5680U;
    s1 ^= (s1 << 15) & 0xefc60000U;
    return ( s1 ^ (s1 >> 18) );
}

s1 denotes a scrambled state value. After the 4 operations applied by php_mt_rand(), mt_rand() shifts it one bit to the right. This last operation cannot be reversed, obviously. The four other can.

We end up with the following code for undoing these transformations:

def undo_php_mt_rand(s1):
    """Retrieves the merged state value from the value sent to the user.
    """
    s1 ^= (s1 >> 18)
    s1 ^= (s1 << 15) & 0xefc60000

    s1 = undo_lshift_xor_mask(s1, 7, 0x9d2c5680)

    s1 ^= s1 >> 11
    s1 ^= s1 >> 22

    return s1

def undo_lshift_xor_mask(v, shift, mask):
    """r s.t. v = r ^ ((r << shift) & mask)
    """
    for i in range(shift, 32, shift):
        v ^= (bits(v, i - shift, shift) & bits(mask, i, shift)) << i
    return v

Given an mt_rand() output, and guessing its LSB, we can get 2 possible scrambled state values: one where the output's LSB is 0, and the other where it is 1.

Putting it together

Let's put each of these reverse steps together:

  1. Obtain 2 values from mt_rand(), separated by 226 other values: R000 and R227; and i, the number of generated values generated before R000.
  2. Get the scrambled state from these values.
  3. XOR those states and deduce the initial state 228
  4. Get the seed by working back from s228 to s0
def main(_R000, _R227, offset):
    # Both were >> 1, so the leftmost byte is unknown
    _R000 <<= 1
    _R227 <<= 1

    for R000_0 in range(2):
        for R227_0 in range(2):
            R000 = _R000 | R000_0
            R227 = _R227 | R227_0
            S000 = undo_php_mt_rand(R000)
            S227 = undo_php_mt_rand(R227)
            seed = undo_php_mt_reload(S000, S227, offset)
            if seed:
                print(seed)

Since we're missing the LSB of R000 and R227, we need to try the algorithm for each case. Often, only one of the four combinations will yield a seed, as others are impossible.

Output:

4

The full python code is available on our GitHub.

Conclusion

We demonstrated that, given two mt_rand() output values separated by 226 others, it is possible to compute, without any bruteforce, the original seed, and therefore obtain any previous or subsequent mt_rand() output, effectively breaking the PRNG.