Update: this article assumes that the encryption algorithm would not be distributed to the user openly. It would instead be stored remotely (i.e. on the server), or be distributed in an encrypted form.
Update: Arnold Daniels has responded in his blog with the significant improvement step.
This article is intended for programmers who haven’t had a chance to develop any tokens up to this point. I’m not a professional in the field, but recently I was assigned to design a token algorithm at the company I work for. The algorithm I came up with turned out to be quite bloated and twisted, but it was practically my first experience making a generator of verifiable sequences of characters. I will not write about anything nearly as weird as that, since there are endless possibilities in spicing it up. However, everyone needs a starting point. I hope this tutorial can serve as a decent one.
Almost every commercial application out there asks you for some sort of serial key. Once you type it in – the program only takes a fraction of a second to “decide” if the key is valid. The purpose of this tutorial is to rant about the concept of these keys (tokens), and present simple examples of implementation. So now, like real awesome folk, I’ll try to form the question that I’ll be answering (run-on warning!):
How can I write an algorithm which would generate a sequence of characters such that I’d be able to write a function which would take any sequence of characters on the input (nothing else), and recognize if the input adheres to the algorithm I designed?
In essence, we need to write two functions: (let’s allow to specify the length of anticipated token with parameter $length)
function verify_token( $str );
Part 1. A braindead Example.
I usually find simple interpretations of something complex quite helpful. Give me an explanation on a braindead example, and I would grasp the concept. That’s what I’m attempting to convey here.
Let’s write a useless yet clear algorithm.
// we don't want one-digit token
if ($length < 2)
return false;
// dividing the length into ~halves
$first_half_length = floor($length / 2);
$second_half_length = ($length - $first_half_length);
// getting random key
$key = rand(1, 8);
$result = '';
for ( $i = 0; $i < $first_half_length; $i++) {
$result .= $key;
}
for ( $i = 0; $i < $second_half_length; $i++) {
$result .= ($key + 1);
}
return $result;
}
Considering the audience that I’m addressing, the function should speak for itself. For example, if I run make_token(4) – it might return something like 6677. All it does is – it takes the input $length, comes up with a random number for first half of length, adds one to it and fills up the rest of the string with [first random number + 1]. The key here is that we have both the question and the answer within the token string. It’s sort of like writing “What is your favblueorite color?”. The answer is hidden within the question and the only problem is knowing how to get it out. In our case – the question is 66, and the answer is 77. However, only our algorithm is designed specifically to recognize the question, recognize the answer, and match them. (Certainly, in this stupid case it’s obvious what the algorithm is – but that’s just a stupid case.) : )
So let’s write the verification algorithm, although you probably already see what it is. It’s quite similar to the generating algorithm:
$length = strlen($str);
if ( (!is_int($str)) || ($length < 2) ) {
return false;
}
/*
Here I extract first and second half of the string as an array of characters
*/
$full_string = str_split($str);
$first_half_length = floor($length / 2);
$first_half = array_slice($full_string, 0, $first_half_length);
$second_half = array_slice($full_string, $first_half_length);
/*
let's see if all digits are the same in left half
*/
$first_char = $first_half[0];
foreach ($first_half as $k => $char) {
if ( $char != $first_char ) {
// if there is an inconsistensy - fail immediately
return false;
}
}
/*
now that we've made sure that all digits are equal in the first half,
let's ensure that all digits in second half are equal to a
[first_char++] -- which will also ensures that they're all consistent
*/
$first_char++;
foreach ($second_half as $k => $char) {
if ($char != $first_char) {
return false;
}
}
/*
ok, if we got to this point -- nothing failed,
so we may feel confident giving it a green light
*/
return true;
}
As you can see, the function verify_token() doesn’t need any help from database to verify the validity of the token passed into it.
This might be a good point to try something more complex.
Part 2. A complex example.
Here, we’re going to write the same two functions as above, only make them more complex. Once again, the only parameter we get when we make a token is $length. It means that analyzing $length is the only way for us to alter the algorithm consistently, based on the input into the function.
Since $length must be a positive integer, let’s list some of the things we can say about positive integers:
-
Odd or Even?
Has root or doesn’t have root?
As a first step, let’s write a function that will return a random alphanumeric character. It will be useful in many cases. The function must randomly return one of the following: a-z, A-Z, 0-9. Here is my implementation:
function rand_alphanumeric() {
$subsets[0] = array('min' => 48, 'max' => 57); // ascii digits
$subsets[1] = array('min' => 65, 'max' => 90); // ascii lowercase English letters
$subsets[2] = array('min' => 97, 'max' => 122); // ascii uppercase English letters
// random choice between lowercase, uppercase, and digits
$s = rand(0, 2);
$ascii_code = rand($subsets[$s]['min'], $subsets[$s]['max']);
return chr( $ascii_code );
}
Ok, the next step is to write our token function. Once again, we only take length as a parameter, thus we’re only going to use information we can extract from the length in order to mess up the algorithm.
<
p>Now that we have some variables, let’s use them. We will generate four alphanumerics, and hide them somewhere within our string. We could use $length_odd to slightly alternate the location where they’re hidden.
* Let's make an offset based on oddity
* to mess things up a bit. Feel free to go crazy here,
* but for the purpose of this tutorial I'll keep it simple.
*/
$offset = $length_odd ? 1 : 0;
Time to pick four positions and use $offset to shift them.
* Mapping keys to positions that they will occupy.
* Since arrays are zero-based, we're subtracting 1 from each.
* Also we're adding offset to each.
* For convenience, let's gather our keys into string too.
* We will need it for hashing.
*/
$key_str = '';
$key_str .= $keys[ (0 + $offset) ] = rand_alphanumeric();
$key_str .= $keys[ (($length / 4) - 1 + $offset) ] = rand_alphanumeric();
$key_str .= $keys[ (($length / 2) - 1 + $offset) ] = rand_alphanumeric();
$key_str .= $keys[ (($length - 2) + $offset) ] = rand_alphanumeric();
Notice that in the above example it doesn’t matter how I pick my positions. I decided to use those particular parts of string to insert the key characters, but it could be done anyhow differently. However you have to watch out for a possibility of getting out-of-bounds if you’re using $offset improperly, thus it’s something to take into account.
The next step would be to take care of the answer for our 4-key question. Our “answer” will be the actual “question” (our 4 chars) — but we’ll encode them with some common hashing algorithms, such as sha1 or md5. In order to make it harder to crack, let’s pick hashing sequence based on our $length_has_root variable which we calculated earlier.
Great. Last but not least, we have to use this messy hash output to fill up the remainder of the string. We have to remember that the four position are already occupied. This is why we’ll be skipping them in the following loop.
* Once again, it's easy to go crazy here, but
* for the purpose of this tutorial, we're going to simply
* fill up all remaining positions with the hashed_keys string
* as far as we have space
*
*/
$hash_enum = 0;
for ($i = 0; $i < $length; $i++) {
if ( $keys[$i] == '') {
$keys[$i] = $hashed_keys[$hash_enum];
$hash_enum++;
}
}
Here’s a little catch, that might not be obvious. If we implode the array now, the implode function will put our four special keys in front of the string, even though their positions (indexes) are not starting from zero. It happens because we added these elements first. The implode function doesn’t care about array keys, it only cares about element arrangement in memory. That’s why we have to explicitly sort the array using ksort().
Verification Algorithm for Complex Example.
The verification algorithm for our second example is almost the same as the generating algorithm. We are checking the length of the string, and based on length we’re finding the locations of all the keys that have to be fetched from the string. Then we’re simply collecting the rest of the string, and checking if what we got is partially equal to the hash of our original string. The verification is not perfect, and you can find ways to make it even more reliable, but it’s good enough for this tutorial. I put some guiding comments into this script.
$length = strlen($str);
$keys = str_split($str);
// we're simply using the same algorithm
// to find key positions based on length
// as well as find which hashes must be used
$length_odd = (($length % 2) != 0);
$length_has_root = ( strpos( sqrt($length), '.' ) === false);
$offset = $length_odd ? 1 : 0;
// Only this time we're extracting the keys instead of
// generating them. And while we're at it, let's remember the positions.
$key_str = '';
$key_str .= $keys[ $pos1 = (int)(0 + $offset) ];
$key_str .= $keys[ $pos2 = (int)(($length / 4) - 1 + $offset) ];
$key_str .= $keys[ $pos3 = (int)(($length / 2) - 1 + $offset) ];
$key_str .= $keys[ $pos4 = (int)(($length - 2) + $offset) ];
$hashed_keys = $length_has_root ? sha1(md5($key_str)) : sha1(sha1($key_str));
$hash_string = '';
// we've already extracted the keys above, so here we should skip them,
// and instead extract everything else
for ($i = 0; $i < $length; $i++) {
if ( $i != $pos1 &&
$i != $pos2 &&
$i != $pos3 &&
$i != $pos4 ) {
$hash_string .= $keys[$i];
}
}
$hash_length = $length - 4;
// returning the comparison of question to the answer;
// if they're equal - the key is valid
return ( $hash_string == substr($hashed_keys, 0, $hash_length) );
}
That’s it. Please let me know if there is something I should fix or clarify.
Enjoy!
Although this is a way to do this, I wouldn’t recommend it. Even if the pattern looks quite complex, a computer will figure out the pattern using a hand full of valid keys. It’s very difficult to come up with an algorithm which is difficult to hack.
I’ve written a follow up article, giving an alternative: http://blog.adaniels.nl/?p=45
It’s a good catch. Having a secret word stored somewhere brings the cracking complexity to the next level. Relying on hiding a function itself might not be a good idea in terms of high security. Ideally, if we have enough inputs and outputs – we can figure out the algorithm. However, if in a hypothetical case – that one configuration word leaks out, then you’ll have to come up with another word, while storing the first one. I avoided the idea of storing an extra seed that has to be maintained – but it can easily be plugged into the above functions at some places where I mentioned that “you can go crazy here”. : ) Thanks for the follow up!
Hello,
I finally decided to clean and distribute to the community a token grid class in PHP. You can have a look on it on the PHPclasses.org repository, licensed in LGPL.
You can produce a credit card sized printed token grid for each customer, and then each time they want to log in, we ask (in addition to the username and the password) the token at a specific position.
Each token (by default 10×10 on one card) are calculated using an application id, a user id and the position in the grid. The token generation is based on a md5 of the parameters (you can have a look in the source code)
Best regards, have a nice week-end.
Any feedback welcome!
André