Hash - clarifying facts behind Java39s implementation of HashSetHashMap

De openkb
Aller à : Navigation, rechercher

Sommaire

Questions

1. I understand the different hash map mechanisms and the ways in which key collisions are handled (either open addressing -linear/quadratic probing, chaining, extendable hashing, etc. Which one does HashSet/HashMap make use of?

2. I realise that a good HashMap relies on a good hash function. How does Java s HashSet/HashMap hash the objects? I know that there is a hash function but so far for strings I have not needed to implement this. What if I now want to Hash a Java Object that I create - do I need to implement the hash function? Or does Java have a built in way of creating a hash code?

I know that the default implementation cannot be relied on as it bases the hash function on the memory address which is not constant.

Answers

http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/java/util/HashMap.java/ http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/java/util/HashMap.java/

(Hint: you can usually find the source code for Java SE classes using Google; e.g. search for "java.util.HashMap source".)

I understand the different hash map mechanisms and the ways in which key collisions are handled (either open addressing -linear/quadratic probing, chaining, extendable hashing, etc. Which one does HashSet/HashMap make use of?

Chaining. See the source code. (Line 154 in the version I linked to).

How does Java s HashSet/HashMap hash the objects?

It doesn t. The object s hashcode method is called to do this. See the source code. (line 360).

If you look at the code you will see some interesting wrinkles:

    • The code (in the version I linked to) is hashing Strings using a special method. (It appears that this is to allow hashing of strings to be "tuned" at the platform level. I didn t dig into this ...)
    • The hashcode returned by the Object.hashcode() call is "scrambled" further to reduce the chance of collisions. (Read the comment!)
What if I now want to Hash a Java Object that I create - do I need to implement the hash function?

You can do that.

Whether you need to do this depends on how you have defined equals for the class. Specifically, Java s HashMap, HashSet and related classes place the following requirement on hashcode() and equals(Object):

    • If a.equals(a) then a.hashcode() == b.hashcode().
    • While a is in a HashSet or is a key in a HashMap, the value returned by a.hashcode() must not change.
    • if !a.equals(b), then the probability that a.hashcode() == b.hashcode() should be low, especially if a and b are probably hash keys for the application.

(The last requirement for performance reasons. If you you have a "poor" hash function that results in a high probability that different keys hash the same hashcode, you get lots of collisions. The hash chains will become unbalanced, and you won t get the average O(1) performance that is normally expected of hash table operations. In the worst case, performance will be O(N); i.e. equivalent to a linear search of a linked list.)

Or does Java have a built in way of creating a hash code?

Every class inherits a default hashcode() method from Object (unless this is overridden). It uses what is known as an "identity hash code"; i.e. a hash value that is based on the object s identity (its reference). This matches the default implementation of equals(Object) ... which simply uses == to compare references.

I know that the default implementation cannot be relied on as it bases the hash function on the memory address which is not constant.

This is incorrect.

The default hashcode() method returns the "identity hashcode". This is typically based on the object s memory address at some point time, but it is NOT the object s memory address.

In particular, if an object is moved by the garbage collector, its "identity hashcode" is guaranteed not to change. Yes. That s right, it DOES NOT CHANGE ... even though the object was moved!

http://stackoverflow.com/a/3796963/139985 http://stackoverflow.com/a/3796963/139985

The bottom line is that the default Object.hashcode() method satisfies all of the requirements that I listed above. It can be relied on.

Source

License : cc by-sa 3.0

http://stackoverflow.com/questions/20849809/clarifying-facts-behind-javas-implementation-of-hashset-hashmap

Related

Outils personnels
Espaces de noms

Variantes
Actions
Navigation
Outils