What is a hashcode? Is it unique?
A hashcode is basically a fingerprint
We Are Trying To Uniquely Identify Someone
I am a detective, on the look out for a criminal. Let us call him Mr Cruel. (He was a notorious murderer when I was a kid – he broke into a house kidnapped and murdered a poor girl, dumped her body and he’s still out on the loose - but that’s a separate matter). Mr Cruel has certain peculiar characteristics that I can use to uniquely identify him amongst a sea of people. We have 25 million people in Australia. One of them is Mr Cruel. How can we find him?
Bad ways of Identifying Mr Cruel
Apparently Mr Cruel has blue eyes. That’s not much help because almost half the population in Aus also has blue eyes.
Good ways of Identifying Mr Cruel
What else can i use? I know: I will use a fingerprint!
Advantages:
- It is really really hard for two people to have the same finger print (not impossible, but extremely unlikely).
- Mr Cruel’s fingerprint will never change.
- Every single part of Mr Cruel’s entire being: his looks, hair colour, personality, eating habits etc must (ideally) be reflected in his fingerprint, such that if he has brother (who is very similar but is not the same) - then both should technically have different finger prints. I say “should” because we cannot guarantee 100% that two people in this world will have different fingerprints.
- But we can always guarantee that Mr Cruel will always have the same finger print - and that his fingerprint will NEVER change.
The above characteristics generally make for good hash functions.
Basically for a given input we want, as much as possible, a unique output. If we get the same out put no matter what our input is, then this is going to be a problem and is going to make a very bad hash function.
Collisions
So imagine if I get a lead and I find someone matching Mr Cruel’s fingerprint at the crime scene. Does this mean I have found Mr Cruel?
……..perhaps! I must take a closer look. If i am using SHA256 and I am looking in a small town with only 5 people - then there is a very good chance I found him! But if I am using MD5 and checking for fingerprints in a town with +2^1000 people, then it is a fairly good possibility that two entirely different people might have the same fingerprint.
So what is the benefit of all this anyways?
You’d use it if you want to find out whether two people are different. If the prints don’t match then you know it’s definitely NOT Mr Cruel. if the finger prints match at two different locations then you know the chances are good that the same person committed both crimes. Or if you want to know if Katy Wong, age 25 is in your database already, you can just check if Katy’s hashcode is present, instead of comparing all her details and searching for a match that way. The latter will take a lot longer.
It’s the same things with DNA. The benefit of using DNA is that if the DNA samples don’t match then you know for 100% that you’ve got the wrong person. That’s basically how Ronald Cotton was exonerated. It was someone else’s DNA, not his. (Google him - it’s a tragic case of mistaken identity).
Key Summary
- Two different people/objects can theoretically still have the same fingerprint. Or in other words. If you have two fingerprints that are the same………then they need not both come from the same person/object.
- Buuuuuut, the same person/object will always return the same fingerprint.
- So basically a hashcode is a finger print.
I hope this helps someone because it took a lot of grief for me to learn it all!