| Image by John Salvino via Unsplash Copyright-free
Below I'll give you brief descriptions of what hashing, encryption and obfuscation are and aren't, and examples of when to use them.
A Data Hash is sometimes referred to as data fingerprint.
The hash value has a constant length, regardless of the hashed data size. The hash value of data can be calculated by anyone who has access to that data. Different data will give us different hash values. Lastly - having only the data hash, you cannot reconstruct the original data.
This is an important message → b00f3941c94d5d6c11c97057800ab7bbe5de23e99c20375f4f5a6d5fb483a844
Another important message → 0f90316561cc421784f23567cbf06c8b3f0543bce18e3619710f12b71b4178b2
Hash of the file at Openssl-3.0.0-alpha5.tar.gz →
As you can see, even for similar data, hash values turn out to be very different. Hash value length stays the same and is relatively short.
What can we use it for?
We can use it to make sure that the data was not modified while transported or stored. Suppose you have a large file that you can't store locally and have to rely on a 3rd party provider. Calculate the file's hash, save it locally, then send the file over to storage. When you retrieve the file in future - calculate the hash for retrieved file, compare it to the value you saved before and voila - you can tell if the file was modified or not.
This is an integrity check.
It is worth noting that the hash value must come from a reliable source, i.e. storing it or transporting it with the data itself doesn't work. If someone has been able to modify the stored/transported data, they will likely also be able to modify the stored hash value too, and thus your integrity check wouldn't show any tampering.
Other use cases can be password checking (server stores password hash instead of password itself) or data structures (i.e. hash maps).
Hashing does not ensure data confidentiality.
Encryption is used to maintain confidentiality. Essentially, encryption is the conversion of some important data into an unreadable form, using an encryption key. I.e. encrypting "Another important message" can give
Another important message → 9CBBE5820FF5CF3E26AA306C6D6FF63D0E1F99738DF8EF2C3231D2FEB3CF1B81
Now, seeing this data, we can't really tell what the plain text was.
There are two types of encryption algorithms: symmetric and asymmetric. The difference is that with symmetric encryption, the same key is used to encrypt and decrypt the data. With asymmetric encryption, a different (public) key is used to encrypt, while a private key is used to decrypt. Here we focus on symmetric encryption, asymmetric will be covered in another blog entry.
Encryption is reversible - you can later decrypt it back to the original data using only the key. This means your encryption key must be kept secret. Otherwise, someone could decrypt your data.
Encrypted data length will be similar to original data length. Encryption will not provide data integrity - i.e. even a message recipient in possession of the correct cryptographic key can not tell if what he has decrypted was modified in-flight.
Now you can see how useful a combination of encryption and hashing can be to ensure both integrity and confidentiality.
One last bit to go through:
Obfuscating means obscuring data meaning. While encryption is meant to hide every little information detail, obfuscation is a more generic term. Besides encryption, what else can we do to obfuscate the data?
With data masking, it is possible to substitute false, but realistic data for the original. There's no going back from masked data to original.
This approach can ensure privacy while having consistent data that can be used for measurements or testing. One can anonymise a dataset originally containing personal user information to achieve reduction in scope. Once anonymity is achieved, until the individuals in the dataset cannot be identified, data protection laws may no longer apply. Nevertheless, the obfuscated data set is useful for measurements and running tests.
Given a data set with
Name, Surname, Street, HouseNo, PostCode, City, Country, Age, CreditCard
fields, you could consistently mask-out (i.e. by substitution) the Name, City, Post Code and Credit Card information and still be able to tell how many users there are in certain age groups or in the same city.
Another use case is obfuscating application code (sometimes it doesn't take obfuscation software to make code unreadable!). Here's a piece:
It is still valid code, what it does is prints the current time in an ASCII-graphical form and the purpose of obfuscation was to hide its operation. So you still have a working application, but it's pretty hard to say what it does or modify it to do what you want it to do.
There are a lot of use cases to implement these principles: A software company may try to hide their algorithm implementation from the public. The bad actor could produce different obfuscated versions of their malware to try to fool antivirus tools.
That's it for now, next time we will go into more details around hashing and password cracking!
Online Crypto Tools
AES128-CBC Key: 7234753778214125, IV: 635166546A576E5A (HEX encoding)