Cyber Security: Hashing

Hashing is the name given to the process of transforming any given key or a string of characters into another value. It is a mathematical process that converts data of any size into data of fixed length known as the ‘hash’ (alternative names include message digest, hash codes, hash sums or hash values).

This is done by a 'hashing algorithm'.

The result is usually represented by a fixed-length value for the 'key' it represents. This makes it easier and safer to store that the original string. Hashing operates in one direction only, making it impossible to deduce the original data from the resultant hash.

The intention of hashing is not to preserve the contents of the data but to create a unique identifier for every single piece of data.

See this page to find out how it is done in detail.

Hashing can show that data has not changed in transmission, but on its own cannot demonstrate that the data originated with its supposed author. To do that, a digital signature should be used.

One-way approach for hashing functions

In the context of passwords, a "hash" is a scrambled version of text that is reproducible if you know what hash software was used.

So, if I 'hash' the word "password" using MD5 hashing software, the output hash is:

5f4dcc3b5aa765d61d8327deb882cf99

Then if another person hashes the word "password" using MD5 hashing software, they will also get

5f4dcc3b5aa765d61d8327deb882cf99

We know when seeing a hash of '5f4dcc3b5aa765d61d8327deb882cf99' that the word "password" is the secret code... but most people won't have a clue!

For this reason, the passwords you use on websites are stored in servers as hashes instead of in plain text like "password" so that if someone views them, in theory they won't know the actual password.

You can't do the reverse.

A hash digest like 5f4dcc3b5aa765d61d8327deb882cf99 can't be reverse-computed to produce the word "password" that was used to make it in the first place.

This one-way approach for hashing functions is by design.

When a file is published on the internet, the author may choose to publish the hash value for that file.

Hashing Algorithms

A large number of hashing algorithms have been developed; the most widespread are algorithms called:

MD5 -Message-Digest algorithm

SHA-1 (Secure Hash Algorithm 1) and SHA-2 (Secure Hash Algorithm 2).

Here is some information published by the GnuPG encryption software authors on their website (in this case from the hashing program called SHA-1):

Each long line of numbers and letters on the left is a hash , the text on the right is the name of the file.

If you download one of these programs, you can then run your own copy of SHA-1 on your download and obtain a hash – if your file exactly matches the original the two hashes will be identical.

A variation of a single bit of data between two otherwise identical files will result in vastly different hash values, so any edits to a file between two hashing operations will result in different hash values revealing that the data has been tampered with and should not be trusted.

lthough MD5 and SHA-1 are in common use, both have been found to be flawed.

Under certain circumstances ‘collisions’ can occur where two pieces of different data can generate the same hash value (albeit under specifically controlled conditions).

This weakness in the MD5 hashing algorithm has been used in malware targeting Microsoft Windows computers.

Since neither algorithm can be guaranteed to generate unique hashes they can be considered ‘broken’ and should not be used. The United States government requires all hashes to be generated using the newer SHA-2 algorithm which has not shown any such weaknesses.

So how do hackers who steal hashes from websites ultimately end up with a list of real life passwords?

Hackers solve this problem by brute force cracking the passwords instead.

In this context, cracking involves making a list of all combinations of characters on your keyboard and then hashing them. By finding matches between this list and the hashes from the stolen passwords, hackers canfigure out the true password.

This is combatted by 'salting the hash'.

Hash Tables

The most popular use for hashing is the implementation of hash tables. A hash table stores key and value pairs in a list that is accessible through its index.

Because key and value pairs are unlimited, the hash function will map the keys to the table size.

A hash value then becomes the index for a specific element.

A hash function generates new values according to a mathematical hashing algorithm, known as a hash value or simply a hash.

To prevent the conversion of hash back into the original key, a good hash always uses a one-way hashing algorithm. Hashing is relevant to - but not limited to - data indexing and retrieval, digital signatures, cybersecurity and cryptography.