原文地址 译者:丁一
// Single threaded version class Foo { private Helper helper = null; public Helper getHelper() { if (helper == null) helper = new Helper(); return helper; } // other functions and members... }
// Correct multithreaded version class Foo { private Helper helper = null; public synchronized Helper getHelper() { if (helper == null) helper = new Helper(); return helper; } // other functions and members... }
// Broken multithreaded version // "Double-Checked Locking" idiom class Foo { private Helper helper = null; public Helper getHelper() { if (helper == null) synchronized(this) { if (helper == null) helper = new Helper(); } return helper; } // other functions and members... }
即便编译器不对这些写操作重排序,在多处理器上,某个处理器或内存系统也可能重排序这些写操作,运行在其它 处理器上的线程就可能看到重排序带来的结果。
Doug Lea写了一篇更详细的有关编译器重排序的文章。
Paul Jakubik找到了一个使用DCL不能正常工作的例子。下面的代码做了些许整理:
public class DoubleCheckTest { // static data to aid in creating N singletons static final Object dummyObject = new Object(); // for reference init static final int A_VALUE = 256; // value to initialize 'a' to static final int B_VALUE = 512; // value to initialize 'b' to static final int C_VALUE = 1024; static ObjectHolder[] singletons; // array of static references static Thread[] threads; // array of racing threads static int threadCount; // number of threads to create static int singletonCount; // number of singletons to create static volatile int recentSingleton; // I am going to set a couple of threads racing, // trying to create N singletons. Basically the // race is to initialize a single array of // singleton references. The threads will use // double checked locking to control who // initializes what. Any thread that does not // initialize a particular singleton will check // to see if it sees a partially initialized view. // To keep from getting accidental synchronization, // each singleton is stored in an ObjectHolder // and the ObjectHolder is used for // synchronization. In the end the structure // is not exactly a singleton, but should be a // close enough approximation. // // This class contains data and simulates a // singleton. The static reference is stored in // a static array in DoubleCheckFail. static class Singleton { public int a; public int b; public int c; public Object dummy; public Singleton() { a = A_VALUE; b = B_VALUE; c = C_VALUE; dummy = dummyObject; } } static void checkSingleton(Singleton s, int index) { int s_a = s.a; int s_b = s.b; int s_c = s.c; Object s_d = s.dummy; if(s_a != A_VALUE) System.out.println("[" + index + "] Singleton.a not initialized " + s_a); if(s_b != B_VALUE) System.out.println("[" + index + "] Singleton.b not intialized " + s_b); if(s_c != C_VALUE) System.out.println("[" + index + "] Singleton.c not intialized " + s_c); if(s_d != dummyObject) if(s_d == null) System.out.println("[" + index + "] Singleton.dummy not initialized," + " value is null"); else System.out.println("[" + index + "] Singleton.dummy not initialized," + " value is garbage"); } // Holder used for synchronization of // singleton initialization. static class ObjectHolder { public Singleton reference; } static class TestThread implements Runnable { public void run() { for(int i = 0; i < singletonCount; ++i) { ObjectHolder o = singletons[i]; if(o.reference == null) { synchronized(o) { if (o.reference == null) { o.reference = new Singleton(); recentSingleton = i; } // shouldn't have to check singelton here // mutex should provide consistent view } } else { checkSingleton(o.reference, i); int j = recentSingleton-1; if (j > i) i = j; } } } } public static void main(String[] args) { if( args.length != 2 ) { System.err.println("usage: java DoubleCheckFail" + " <numThreads> <numSingletons>"); } // read values from args threadCount = Integer.parseInt(args[0]); singletonCount = Integer.parseInt(args[1]); // create arrays threads = new Thread[threadCount]; singletons = new ObjectHolder[singletonCount]; // fill singleton array for(int i = 0; i < singletonCount; ++i) singletons[i] = new ObjectHolder(); // fill thread array for(int i = 0; i < threadCount; ++i) threads[i] = new Thread( new TestThread() ); // start threads for(int i = 0; i < threadCount; ++i) threads[i].start(); // wait for threads to finish for(int i = 0; i < threadCount; ++i) { try { System.out.println("waiting to join " + i); threads[i].join(); } catch(InterruptedException ex) { System.out.println("interrupted"); } } System.out.println("done"); } }
当上述代码运行在使用Symantec JIT的系统上时,不能正常工作。尤其是,Symantec JIT将
singletons[i].reference = new Singleton();
编译成了下面这个样子(Symantec JIT用了一种基于句柄的对象分配系统)。
0206106A mov eax,0F97E78h 0206106F call 01F6B210 ; allocate space for ; Singleton, return result in eax 02061074 mov dword ptr [ebp],eax ; EBP is &singletons[i].reference ; store the unconstructed object here. 02061077 mov ecx,dword ptr [eax] ; dereference the handle to ; get the raw pointer 02061079 mov dword ptr [ecx],100h ; Next 4 lines are 0206107F mov dword ptr [ecx+4],200h ; Singleton's inlined constructor 02061086 mov dword ptr [ecx+8],400h 0206108D mov dword ptr [ecx+0Ch],0F84030h
// (Still) Broken multithreaded version // "Double-Checked Locking" idiom class Foo { private Helper helper = null; public Helper getHelper() { if (helper == null) { Helper h; synchronized(this) { h = helper; if (h == null) synchronized (this) { h = new Helper(); } // release inner synchronization lock helper = h; } } return helper; } // other functions and members... }
很不幸,这种直觉完全错了。同步的规则不是这样的。monitorexit(即,退出同步块)的规则是,在monitorexit前面的action必须在该monitor释放之前执行。但是,并没有哪里有规定说monitorexit后面的action不可以在monitor释放之前执行。因此,编译器将赋值操作helper = h;挪到同步块里面是非常合情合理的,这就回到了我们之前说到的问题上。许多处理器提供了这种单向的内存屏障指令。如果改变锁释放的语义 —— 释放时执行一个双向的内存屏障 —— 将会带来性能损失。
为何?因为处理器有自己本地的对内存的缓存拷贝。在有些处理器上,除非处理器执行一个cache coherence指令(即,一个内存屏障),否则读操作可能从过期的本地缓存拷贝中取值,即使其它处理器使用了内存屏障将它们的写操作写回了内存。
通常,更高级别的技巧,如,使用内部的归并排序,而不是交换排序(见SPECJVM DB的基准),带来的影响更大。
class HelperSingleton { static Helper singleton = new Helper(); }
// Correct Double-Checked Locking for 32-bit primitives class Foo { private int cachedHashCode = 0; public int hashCode() { int h = cachedHashCode; if (h == 0) synchronized(this) { if (cachedHashCode != 0) return cachedHashCode; h = computeHashCode(); cachedHashCode = h; } return h; } // other functions and members... }
// Lazy initialization 32-bit primitives // Thread-safe if computeHashCode is idempotent class Foo { private int cachedHashCode = 0; public int hashCode() { int h = cachedHashCode; if (h == 0) { h = computeHashCode(); cachedHashCode = h; } return h; } // other functions and members... }
如果有显式的内存屏障指令可用,则有可能使DCL生效。例如,如果你用的是C++,可以参考来自Doug Schmidt等人所著书中的代码:
// C++ implementation with explicit memory barriers // Should work on any platform, including DEC Alphas // From "Patterns for Concurrent and Distributed Objects", // by Doug Schmidt template <class TYPE, class LOCK> TYPE * Singleton<TYPE, LOCK>::instance (void) { // First check TYPE* tmp = instance_; // Insert the CPU-specific memory barrier instruction // to synchronize the cache lines on multi-processor. asm ("memoryBarrier"); if (tmp == 0) { // Ensure serialization (guard // constructor acquires lock_). Guard<LOCK> guard (lock_); // Double check. tmp = instance_; if (tmp == 0) { tmp = new TYPE; // Insert the CPU-specific memory barrier instruction // to synchronize the cache lines on multi-processor. asm ("memoryBarrier"); instance_ = tmp; } return tmp; }
Alexander Terekhov (TEREKHOV@de.ibm.com)提出了个能实现DCL的巧妙的做法 —— 使用线程局部存储。每个线程各自保存一个flag来表示该线程是否执行了同步。
class Foo { /** If perThreadInstance.get() returns a non-null value, this thread has done synchronization needed to see initialization of helper */ private final ThreadLocal perThreadInstance = new ThreadLocal(); private Helper helper = null; public Helper getHelper() { if (perThreadInstance.get() == null) createHelper(); return helper; } private final void createHelper() { synchronized(this) { if (helper == null) helper = new Helper(); } // Any non-null value would do as the argument here perThreadInstance.set(perThreadInstance); } }
这种方式的性能严重依赖于所使用的JDK实现。在Sun 1.2的实现中,ThreadLocal是非常慢的。在1.3中变得更快了,期望能在1.4上更上一个台阶。Doug Lea分析了一些延迟初始化技术实现的性能
JDK5以及后续版本扩展了volatile语义,不再允许volatile写操作与其前面的读写操作重排序,也不允许volatile读操作与其后面的读写操作重排序。更多详细信息见Jeremy Manson的博客。
// Works with acquire/release semantics for volatile // Broken under current semantics for volatile class Foo { private volatile Helper helper = null; public Helper getHelper() { if (helper == null) { synchronized(this) { if (helper == null) helper = new Helper(); } } return helper; } }