Update: this article assumes that the encryption algorithm would not be distributed to the user openly. It would instead be stored remotely (i.e. on the server), or be distributed in an encrypted form.
Update: Arnold Daniels has responded in his blog with the significant improvement step.
This article is intended for programmers who haven't had a chance to develop any tokens up to this point. I'm not a professional in the field, but recently I was assigned to design a token algorithm at the company I work for. The algorithm I came up with turned out to be quite bloated and twisted, but it was practically my first experience making a generator of verifiable sequences of characters. I will not write about anything nearly as weird as that, since there are endless possibilities in spicing it up. However, everyone needs a starting point. I hope this tutorial can serve as a decent one.
Almost every commercial application out there asks you for some sort of serial key. Once you type it in - the program only takes a fraction of a second to "decide" if the key is valid. The purpose of this tutorial is to rant about the concept of these keys (tokens), and present simple examples of implementation. So now, like real awesome folk, I'll try to form the question that I'll be answering (run-on warning!):
How can I write an algorithm which would generate a sequence of characters such that I'd be able to write a function which would take any sequence of characters on the input (nothing else), and recognize if the input adheres to the algorithm I designed?
In essence, we need to write two functions:
(let's allow to specify the length of anticipated token with parameter $length)
<?php
function make_token( $length );
function verify_token( $str );
?>
Part 1. A braindead Example.
I usually find simple interpretations of something complex quite helpful. Give me an explanation on a braindead example, and I would grasp the concept. That's what I'm attempting to convey here.
Let's write a useless yet clear algorithm.
<?php
function make_token( $length ) {
// we don't want one-digit token
if ($length < 2)
return false;
// dividing the length into ~halves
$first_half_length = floor($length / 2);
$second_half_length = ($length - $first_half_length);
// getting random key
$key = rand(1, 8);
$result = '';
for (
$i = 0; $i < $first_half_length; $i++) {
$result .= $key;
}
for (
$i = 0; $i < $second_half_length; $i++) {
$result .= ($key + 1);
}
return $result;
}
?>
Considering the audience that I'm addressing, the function should speak for itself. For example, if I run make_token(4) - it might return something like 6677. All it does is - it takes the input $length, comes up with a random number for first half of length, adds one to it and fills up the rest of the string with [first random number + 1]. The key here is that we have both the question and the answer within the token string. It's sort of like writing "What is your favblueorite color?". The answer is hidden within the question and the only problem is knowing how to get it out. In our case - the question is 66, and the answer is 77. However, only our algorithm is designed specifically to recognize the question, recognize the answer, and match them. (Certainly, in this stupid case it's obvious what the algorithm is - but that's just a stupid case.) : )
So let's write the verification algorithm, although you probably already see what it is. It's quite similar to the generating algorithm:
<?php
function verify_token( $str ) {
$length = strlen($str);
if ( (!is_int($str)) || ($length < 2) ) {
return false;
}
/*
Here I extract first and second half of the string as an array of characters
*/
$full_string = str_split($str);
$first_half_length = floor($length / 2);
$first_half = array_slice($full_string, 0, $first_half_length);
$second_half = array_slice($full_string, $first_half_length);
/*
let's see if all digits are the same in left half
*/
$first_char = $first_half[0];
foreach ($first_half as $k => $char) {
if ( $char != $first_char ) {
// if there is an inconsistensy - fail immediately
return false;
}
}
/*
now that we've made sure that all digits are equal in the first half,
let's ensure that all digits in second half are equal to a
[first_char++] -- which will also ensures that they're all consistent
*/
$first_char++;
foreach ($second_half as $k => $char) {
if ($char != $first_char) {
return false;
}
}
/*
ok, if we got to this point -- nothing failed,
so we may feel confident giving it a green light
*/
return true;
}
?>
As you can see, the function verify_token() doesn't need any help from database to verify the validity of the token passed into it.
This might be a good point to try something more complex.
Part 2. A complex example.
Here, we're going to write the same two functions as above, only make them more complex. Once again, the only parameter we get when we make a token is $length. It means that analyzing $length is the only way for us to alter the algorithm consistently, based on the input into the function.
Since $length must be a positive integer, let's list some of the things we can say about positive integers:
Odd or Even?
Has root or doesn't have root?
That data will be enough to randomize our token for this example. When you're playing with it on your own, you can come up with much more twisted ways for changing algorithm paths.
As a first step, let's write a function that will return a random alphanumeric character. It will be useful in many cases. The function must randomly return one of the following: a-z, A-Z, 0-9. Here is my implementation:
<?php
// return random alphanumeric char
function rand_alphanumeric() {
$subsets[0] = array('min' => 48, 'max' => 57); // ascii digits
$subsets[1] = array('min' => 65, 'max' => 90); // ascii lowercase English letters
$subsets[2] = array('min' => 97, 'max' => 122); // ascii uppercase English letters
// random choice between lowercase, uppercase, and digits
$s = rand(0, 2);
$ascii_code = rand($subsets[$s]['min'], $subsets[$s]['max']);
return chr( $ascii_code );
}
?>
Ok, the next step is to write our token function. Once again, we only take length as a parameter, thus we're only going to use information we can extract from the length in order to mess up the algorithm.
<?php
function make_token ( $length = 16 ) {
if ($length < 8 || $length > 44) return false;
// collecting info about the length
$length_odd = (($length % 2) != 0);
$length_has_root = ( strpos( sqrt($length), '.' ) === false);
?>
Now that we have some variables, let's use them. We will generate four alphanumerics, and hide them somewhere within our string. We could use $length_odd to slightly alternate the location where they're hidden.
<?php
/*
* Let's make an offset based on oddity
* to mess things up a bit. Feel free to go crazy here,
* but for the purpose of this tutorial I'll keep it simple.
*/
$offset = $length_odd ? 1 : 0;
?>
Time to pick four positions and use $offset to shift them.
<?php
/*
* Mapping keys to positions that they will occupy.
* Since arrays are zero-based, we're subtracting 1 from each.
* Also we're adding offset to each.
* For convenience, let's gather our keys into string too.
* We will need it for hashing.
*/
$key_str = '';
$key_str .= $keys[ (0 + $offset) ] = rand_alphanumeric();
$key_str .= $keys[ (($length / 4) - 1 + $offset) ] = rand_alphanumeric();
$key_str .= $keys[ (($length / 2) - 1 + $offset) ] = rand_alphanumeric();
$key_str .= $keys[ (($length - 2) + $offset) ] = rand_alphanumeric();
?>
Notice that in the above example it doesn't matter how I pick my positions. I decided to use those particular parts of string to insert the key characters, but it could be done anyhow differently. However you have to watch out for a possibility of getting out-of-bounds if you're using $offset improperly, thus it's something to take into account.
The next step would be to take care of the answer for our 4-key question. Our "answer" will be the actual "question" (our 4 chars) -- but we'll encode them with some common hashing algorithms, such as sha1 or md5. In order to make it harder to crack, let's pick hashing sequence based on our $length_has_root variable which we calculated earlier.
<?php
/*
* Building the "answer" to the key string.
* We'll do it by hashing the string in weird ways.
* We'll choose a hashing sequence based on whether the length has root.
*
*/
$hashed_keys = $length_has_root ? sha1(md5($key_str)) : sha1(sha1($key_str));
?>
Great. Last but not least, we have to use this messy hash output to fill up the remainder of the string. We have to remember that the four position are already occupied. This is why we'll be skipping them in the following loop.
<?php
/*
* Once again, it's easy to go crazy here, but
* for the purpose of this tutorial, we're going to simply
* fill up all remaining positions with the hashed_keys string
* as far as we have space
*
*/
$hash_enum = 0;
for ($i = 0; $i < $length; $i++) {
if ( $keys[$i] == '') {
$keys[$i] = $hashed_keys[$hash_enum];
$hash_enum++;
}
}
?>
Here's a little catch, that might not be obvious. If we implode the array now, the implode function will put our four special keys in front of the string, even though their positions (indexes) are not starting from zero. It happens because we added these elements first. The implode function doesn't care about array keys, it only cares about element arrangement in memory. That's why we have to explicitly sort the array using ksort().
<?php
ksort($keys);
return implode($keys, '');
} // and we're done
?>
Verification Algorithm for Complex Example.
The verification algorithm for our second example is almost the same as the generating algorithm. We are checking the length of the string, and based on length we're finding the locations of all the keys that have to be fetched from the string. Then we're simply collecting the rest of the string, and checking if what we got is partially equal to the hash of our original string. The verification is not perfect, and you can find ways to make it even more reliable, but it's good enough for this tutorial. I put some guiding comments into this script.
<?php
function verify_token ( $str ) {
$length = strlen($str);
$keys = str_split($str);
// we're simply using the same algorithm
// to find key positions based on length
// as well as find which hashes must be used
$length_odd = (($length % 2) != 0);
$length_has_root = ( strpos( sqrt($length), '.' ) === false);
$offset = $length_odd ? 1 : 0;
// Only this time we're extracting the keys instead of
// generating them. And while we're at it, let's remember the positions.
$key_str = '';
$key_str .= $keys[ $pos1 = (int)(0 + $offset) ];
$key_str .= $keys[ $pos2 = (int)(($length / 4) - 1 + $offset) ];
$key_str .= $keys[ $pos3 = (int)(($length / 2) - 1 + $offset) ];
$key_str .= $keys[ $pos4 = (int)(($length - 2) + $offset) ];
$hashed_keys = $length_has_root ? sha1(md5($key_str)) : sha1(sha1($key_str));
$hash_string = '';
// we've already extracted the keys above, so here we should skip them,
// and instead extract everything else
for ($i = 0; $i < $length; $i++) {
if ( $i != $pos1 &&
$i != $pos2 &&
$i != $pos3 &&
$i != $pos4 ) {
$hash_string .= $keys[$i];
}
}
$hash_length = $length - 4;
// returning the comparison of question to the answer;
// if they're equal - the key is valid
return ( $hash_string == substr($hashed_keys, 0, $hash_length) );
}
?>
That's it. Please let me know if there is something I should fix or clarify.
Enjoy!