Wednesday, July 16, 2008
Object.hashCode implementation

After the previous post, I just had to look. The implementation of Object.equals is, as was previously noted, just "return this == obj", but the implementation of Object.hashCode is far more complicated.

Taken straight from the latest hg-pulled OpenJDK sources, Object.hashCode is a native method registered from Object.c that calls into a Hotspot-exported function, JVM_IHashCode(), from hotspot\src\share\vm\prims\jvm.cpp:

JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))
// as implemented in the classic virtual machine; return 0 if object is NULL
return handle == NULL ? 0 : ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)) ;

which in turn calls ObjectSynchronizer::FastHashCode, defined in hotspot\src\share\vm\runtime\synchronizer.cpp as:

intptr_t ObjectSynchronizer::FastHashCode (Thread * Self, oop obj) {
if (UseBiasedLocking) {
// NOTE: many places throughout the JVM do not expect a safepoint
// to be taken here, in particular most operations on perm gen
// objects. However, we only ever bias Java instances and all of
// the call sites of identity_hash that might revoke biases have
// been checked to make sure they can handle a safepoint. The
// added check of the bias pattern is to avoid useless calls to
// thread-local storage.
if (obj->mark()->has_bias_pattern()) {
// Box and unbox the raw reference just in case we cause a STW safepoint.
Handle hobj (Self, obj) ;
// Relaxing assertion for bug 6320749.
assert (Universe::verify_in_progress() ||
"biases should not be seen by VM thread here");
BiasedLocking::revoke_and_rebias(hobj, false, JavaThread::current());
obj = hobj() ;
assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now");

// hashCode() is a heap mutator ...
// Relaxing assertion for bug 6320749.
assert (Universe::verify_in_progress() ||
!SafepointSynchronize::is_at_safepoint(), "invariant") ;
assert (Universe::verify_in_progress() ||
Self->is_Java_thread() , "invariant") ;
assert (Universe::verify_in_progress() ||
((JavaThread *)Self)->thread_state() != _thread_blocked, "invariant") ;

ObjectMonitor* monitor = NULL;
markOop temp, test;
intptr_t hash;
markOop mark = ReadStableMark (obj);

// object should remain ineligible for biased locking
assert (!mark->has_bias_pattern(), "invariant") ;

if (mark->is_neutral()) {
hash = mark->hash(); // this is a normal header
if (hash) { // if it has hash, just return it
return hash;
hash = get_next_hash(Self, obj); // allocate a new hash code
temp = mark->copy_set_hash(hash); // merge the hash code into header
// use (machine word version) atomic operation to install the hash
test = (markOop) Atomic::cmpxchg_ptr(temp, obj->mark_addr(), mark);
if (test == mark) {
return hash;
// If atomic operation failed, we must inflate the header
// into heavy weight monitor. We could add more code here
// for fast path, but it does not worth the complexity.
} else if (mark->has_monitor()) {
monitor = mark->monitor();
temp = monitor->header();
assert (temp->is_neutral(), "invariant") ;
hash = temp->hash();
if (hash) {
return hash;
// Skip to the following code to reduce code size
} else if (Self->is_lock_owned((address)mark->locker())) {
temp = mark->displaced_mark_helper(); // this is a lightweight monitor owned
assert (temp->is_neutral(), "invariant") ;
hash = temp->hash(); // by current thread, check if the displaced
if (hash) { // header contains hash code
return hash;
// The displaced header is strictly immutable.
// It can NOT be changed in ANY cases. So we have
// to inflate the header into heavyweight monitor
// even the current thread owns the lock. The reason
// is the BasicLock (stack slot) will be asynchronously
// read by other threads during the inflate() function.
// Any change to stack may not propagate to other threads
// correctly.

// Inflate the monitor to set hash code
monitor = ObjectSynchronizer::inflate(Self, obj);
// Load displaced header and check it has hash code
mark = monitor->header();
assert (mark->is_neutral(), "invariant") ;
hash = mark->hash();
if (hash == 0) {
hash = get_next_hash(Self, obj);
temp = mark->copy_set_hash(hash); // merge hash code into header
assert (temp->is_neutral(), "invariant") ;
test = (markOop) Atomic::cmpxchg_ptr(temp, monitor, mark);
if (test != mark) {
// The only update to the header in the monitor (outside GC)
// is install the hash code. If someone add new usage of
// displaced header, please update this code
hash = test->hash();
assert (test->is_neutral(), "invariant") ;
assert (hash != 0, "Trivial unexpected object/monitor header usage.");
// We finally get the hash
return hash;

Hope this answers all the debates. :-)

Editor's note: Yes, I know it's a long quotation of code completely out of context; my goal here is simply to suggest that the hashCode() implementation is not just a integerification of the object's address in memory, as was suggested in other discussions. For whatever it's worth, the get_next_hash() implementation that's referenced in the FastHashCode() method looks like:

// hashCode() generation :
// Possibilities:
// * MD5Digest of {obj,stwRandom}
// * CRC32 of {obj,stwRandom} or any linear-feedback shift register function.
// * A DES- or AES-style SBox[] mechanism
// * One of the Phi-based schemes, such as:
// 2654435761 = 2^32 * Phi (golden ratio)
// HashCodeValue = ((uintptr_t(obj) >> 3) * 2654435761) ^ GVars.stwRandom ;
// * A variation of Marsaglia's shift-xor RNG scheme.
// * (obj ^ stwRandom) is appealing, but can result
// in undesirable regularity in the hashCode values of adjacent objects
// (objects allocated back-to-back, in particular). This could potentially
// result in hashtable collisions and reduced hashtable efficiency.
// There are simple ways to "diffuse" the middle address bits over the
// generated hashCode values:

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
intptr_t value = 0 ;
if (hashCode == 0) {
// This form uses an unguarded global Park-Miller RNG,
// so it's possible for two threads to race and generate the same RNG.
// On MP system we'll have lots of RW access to a global, so the
// mechanism induces lots of coherency traffic.
value = os::random() ;
} else
if (hashCode == 1) {
// This variation has the property of being stable (idempotent)
// between STW operations. This can be useful in some of the 1-0
// synchronization schemes.
intptr_t addrBits = intptr_t(obj) >> 3 ;
value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
} else
if (hashCode == 2) {
value = 1 ; // for sensitivity testing
} else
if (hashCode == 3) {
value = ++GVars.hcSequence ;
} else
if (hashCode == 4) {
value = intptr_t(obj) ;
} else {
// Marsaglia's xor-shift scheme with thread-specific state
// This is probably the best overall implementation -- we'll
// likely make this the default in future releases.
unsigned t = Self->_hashStateX ;
t ^= (t << 11) ;
Self->_hashStateX = Self->_hashStateY ;
Self->_hashStateY = Self->_hashStateZ ;
Self->_hashStateZ = Self->_hashStateW ;
unsigned v = Self->_hashStateW ;
v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
Self->_hashStateW = v ;
value = v ;

value &= markOopDesc::hash_mask;
if (value == 0) value = 0xBAD ;
assert (value != markOopDesc::no_hash, "invariant") ;
return value;

Thus (hopefully) putting the idea that it might be allocating a hash based on the object's identity completely to rest.

For the record, this is all from the OpenJDK source base--naturally, it's possible that earlier VM implementations did something entirely different.

Wednesday, July 16, 2008 7:57:37 AM (Pacific Standard Time, UTC-08:00)
That code actually does calculate the hash using the object's identity. Of course, it has to cache it after that since the object can move. In some VMs, the object doesn't move, and they can just use the address.
Wednesday, August 6, 2008 2:54:42 AM (Pacific Standard Time, UTC-08:00)
Very, very interesting stuff.. I'm writing a Java profiler which focusses exclusively on the universe that starts with hashCode(). So, if I may be so bold, I'd love anyone interested in the above to also visit my market research page and give me some feedback on what you'd like to see in a profiling tool that deals exclusively with hashing containers. Curious? Just head for
Wednesday, August 6, 2008 2:57:29 AM (Pacific Standard Time, UTC-08:00)
Eh.. actually, correction to the previous post. Maybe best if you go via .. which will give you more context before the market research questions.
Comments are closed.