[ViewVC] Diff of: jsr166/jsr166/src/jsr166e/StripedAdder.java

Comparing jsr166/src/jsr166e/StripedAdder.java (file contents):
Revision 1.2 by jsr166, Wed Jul 20 16:06:19 2011 UTC vs.
Revision 1.6 by dl, Tue Jul 26 17:16:36 2011 UTC

#	Line 16 \| Line 16 \| import java.io.ObjectOutputStream;
16
17		/**
18		* A set of variables that together maintain a sum. When updates
19	<	* (method {@link #add}) are contended across threads, the set of
20	<	* adders may grow to reduce contention. Method {@link #sum} returns
21	<	* the current combined total across these adders. This value is
22	<	* <em>NOT</em> an atomic snapshot (concurrent updates may occur while
23	<	* the sum is being calculated), and so cannot be used alone for
24	<	* fine-grained synchronization control.
19	>	* (method {@link #add}) are contended across threads, this set of
20	>	* adder variables may grow dynamically to reduce contention. Method
21	>	* {@link #sum} returns the current combined total across these
22	>	* adders. This value is <em>NOT</em> an atomic snapshot (concurrent
23	>	* updates may occur while the sum is being calculated), and so cannot
24	>	* be used alone for fine-grained synchronization control.
25		*
26		* <p> This class may be applicable when many threads frequently
27		* update a common sum that is used for purposes such as collecting
28		* statistics. In this case, performance may be significantly faster
29		* than using a shared {@link AtomicLong}, at the expense of using
30	<	* significantly more space. On the other hand, if it is known that
31	<	* only one thread can ever update the sum, performance may be
32	<	* significantly slower than just updating a local variable.
30	>	* much more space. On the other hand, if it is known that only one
31	>	* thread can ever update the sum, performance may be significantly
32	>	* slower than just updating a local variable.
33	>	*
34	>	* <p>A StripedAdder may optionally be constructed with a given
35	>	* expected contention level; i.e., the number of threads that are
36	>	* expected to concurrently update the sum. Supplying an accurate
37	>	* value may improve performance by reducing the need for dynamic
38	>	* adjustment.
39		*
40		* @author Doug Lea
41		*/
#	Line 37 \| Line 43 \| public class StripedAdder implements Ser
43		private static final long serialVersionUID = 7249069246863182397L;
44
45		/*
46	<	* Overview: We maintain a table of AtomicLongs (padded to reduce
47	<	* false sharing). The table is indexed by per-thread hash codes
48	<	* that are initialized as random values. The table doubles in
49	<	* size upon contention (as indicated by failed CASes when
50	<	* performing add()), but is capped at the nearest power of two >=
51	<	* #cpus: At that point, contention should be infrequent if each
52	<	* thread has a unique index; so we instead adjust hash codes to
53	<	* new random values upon contention rather than expanding. A
54	<	* single spinlock is used for resizing the table as well as
46	>	* A StripedAdder maintains a table of Atomic long variables. The
47	>	* table is indexed by per-thread hash codes.
48	>	*
49	>	* By default, the table is lazily initialized, to minimize
50	>	* footprint until adders are used. On first use, the table is set
51	>	* to size DEFAULT_INITIAL_SIZE (currently 8). Table size is
52	>	* bounded by the number of CPUS (if larger than the default
53	>	* size).
54	>	*
55	>	* Per-thread hash codes are initialized to random values.
56	>	* Collisions are indicated by failed CASes when performing an add
57	>	* operation (see method retryAdd). Upon a collision, if the table
58	>	* size is less than the capacity, it is doubled in size unless
59	>	* some other thread holds lock. If a hashed slot is empty, and
60	>	* lock is available, a new Adder is created. Otherwise, if the
61	>	* slot exists, a CAS is tried. Retries proceed by "double
62	>	* hashing", using a secondary hash (Marsaglia XorShift) to try to
63	>	* find a free slot.
64	>	*
65	>	* The table size is capped because, when there are more threads
66	>	* than CPUs, supposing that each thread were bound to a CPU,
67	>	* there would exist a perfect hash function mapping threads to
68	>	* slots that eliminates collisions. When we reach capacity, we
69	>	* search for this mapping by randomly varying the hash codes of
70	>	* colliding threads. Because search is random, and failures only
71	>	* become known via CAS failures, convergence will be slow, and
72	>	* because threads are typically not bound to CPUS forever, may
73	>	* not occur at all. However, despite these limitations, observed
74	>	* contention is typically low in these cases.
75	>	*
76	>	* Table entries are of class Adder; a form of AtomicLong padded
77	>	* to reduce cache contention on most processors. Padding is
78	>	* overkill for most Atomics because they are usually irregularly
79	>	* scattered in memory and thus don't interfere much with each
80	>	* other. But Atomic objects residing in arrays will tend to be
81	>	* placed adjacent to each other, and so will most often share
82	>	* cache lines without this precaution. Adders are by default
83	>	* constructed upon first use, which further improves per-thread
84	>	* locality and helps reduce footprint.
85	>	*
86	>	* A single spinlock is used for resizing the table as well as
87		* populating slots with new Adders. Upon lock contention, threads
88	<	* just try other slots rather than blocking. We guarantee that at
88	>	* try other slots rather than blocking. After initialization, at
89		* least one slot exists, so retries will eventually find a
90	<	* candidate Adder.
90	>	* candidate Adder. During these retries, there is increased
91	>	* contention and reduced locality, which is still better than
92	>	* alternatives.
93		*/
94
95		/**
96	<	* Number of processors, to place a cap on table growth.
57	<	*/
58	<	static final int NCPU = Runtime.getRuntime().availableProcessors();
59	<
60	<	/**
61	<	* Version of AtomicLong padded to avoid sharing cache
62	<	* lines on most processors
96	>	* Padded version of AtomicLong
97		*/
98		static final class Adder extends AtomicLong {
99	<	long p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pa, pb, pc, pd;
99	>	long p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pa, pb, pc, pd, pe;
100		Adder(long x) { super(x); }
101		}
102
103	+	private static final int NCPU = Runtime.getRuntime().availableProcessors();
104	+
105		/**
106	<	* Holder for the thread-local hash code.
106	>	* Table bounds. DEFAULT_INITIAL_SIZE is the table size set upon
107	>	* first use under default constructor, and must be a power of
108	>	* two. There is not much point in making size a lot smaller than
109	>	* that of Adders though. CAP is the maximum allowed table size.
110	>	*/
111	>	private static final int DEFAULT_INITIAL_SIZE = 8;
112	>	private static final int CAP = Math.max(NCPU, DEFAULT_INITIAL_SIZE);
113	>
114	>	/**
115	>	* Holder for the thread-local hash code. The code is initially
116	>	* random, but may be set to a different value upon collisions.
117		*/
118		static final class HashCode {
119	+	static final Random rng = new Random();
120		int code;
121	<	HashCode(int h) { code = h; }
121	>	HashCode() {
122	>	int h = rng.nextInt();
123	>	code = (h == 0) ? 1 : h; // ensure nonzero
124	>	}
125		}
126
127		/**
128		* The corresponding ThreadLocal class
129		*/
130		static final class ThreadHashCode extends ThreadLocal<HashCode> {
131	<	static final Random rng = new Random();
82	<	public HashCode initialValue() {
83	<	int h = rng.nextInt();
84	<	return new HashCode((h == 0) ? 1 : h); // ensure nonzero
85	<	}
131	>	public HashCode initialValue() { return new HashCode(); }
132		}
133
134		/**
135		* Static per-thread hash codes. Shared across all StripedAdders
136	<	* because adjustments due to collisions in one table are likely
137	<	* to be appropriate for others.
136	>	* to reduce ThreadLocal pollution and because adjustments due to
137	>	* collisions in one table are likely to be appropriate for
138	>	* others.
139		*/
140		static final ThreadHashCode threadHashCode = new ThreadHashCode();
141
142		/**
143	<	* Table of adders. Initially of size 2; grows to be at most NCPU.
143	>	* Table of adders. Size is power of two, grows to be at most CAP.
144		*/
145		private transient volatile Adder[] adders;
146
147		/**
148		* Serves as a lock when resizing and/or creating Adders. There
149	<	* is no need for a blocking lock: When busy, other threads try
150	<	* other slots.
149	>	* is no need for a blocking lock: Except during initialization
150	>	* races, when busy, other threads try other slots. However,
151	>	* during (double-checked) initializations, we use the
152	>	* "synchronized" lock on this object.
153		*/
154		private final AtomicInteger mutex;
155
156		/**
157	<	* Marsaglia XorShift for rehashing on collisions
157	>	* Creates a new adder with zero sum.
158		*/
159	<	private static int xorShift(int r) {
160	<	r ^= r << 13;
161	<	r ^= r >>> 17;
113	<	return r ^ (r << 5);
159	>	public StripedAdder() {
160	>	this.mutex = new AtomicInteger();
161	>	// remaining initialization on first call to add.
162		}
163
164		/**
165	<	* Creates a new adder with initially zero sum.
165	>	* Creates a new adder with zero sum, and with stripes presized
166	>	* for the given expected contention level.
167	>	*
168	>	* @param expectedContention the expected number of threads that
169	>	* will concurrently update the sum.
170		*/
171	<	public StripedAdder() {
172	<	Adder[] as = new Adder[2];
173	<	as[0] = new Adder(0); // ensure at least one available adder
171	>	public StripedAdder(int expectedContention) {
172	>	int cap = (expectedContention < CAP) ? expectedContention : CAP;
173	>	int size = 1;
174	>	while (size < cap)
175	>	size <<= 1;
176	>	Adder[] as = new Adder[size];
177	>	for (int i = 0; i < size; ++i)
178	>	as[i] = new Adder(0);
179		this.adders = as;
180		this.mutex = new AtomicInteger();
181		}
#	Line 129 \| Line 186 \| public class StripedAdder implements Ser
186		* @param x the value to add
187		*/
188		public void add(long x) {
189	+	Adder[] as; Adder a; int n; long v; // locals to hold volatile reads
190		HashCode hc = threadHashCode.get();
191	<	for (int h = hc.code;;) {
192	<	Adder[] as = adders;
193	<	int n = as.length;
194	<	Adder a = as[h & (n - 1)];
195	<	if (a != null) {
196	<	long v = a.get();
197	<	if (a.compareAndSet(v, v + x))
198	<	break;
199	<	if (n >= NCPU) { // Collision when table at max
200	<	h = hc.code = xorShift(h); // change code
201	<	continue;
191	>	int h = hc.code;
192	>	if ((as = adders) == null \|\| (n = as.length) < 1 \|\|
193	>	(a = as[(n - 1) & h]) == null \|\|
194	>	!a.compareAndSet(v = a.get(), v + x))
195	>	retryAdd(x, hc);
196	>	}
197	>
198	>	/**
199	>	* Handle cases of add involving initialization, resizing,
200	>	* creating new Adders, and/or contention. See above for
201	>	* explanation.
202	>	*/
203	>	private void retryAdd(long x, HashCode hc) {
204	>	int h = hc.code;
205	>	final AtomicInteger mutex = this.mutex;
206	>	int collisions = 1 - mutex.get(); // first guess: collides if not locked
207	>	for (;;) {
208	>	Adder[] as; Adder a; long v; int k, n;
209	>	while ((as = adders) == null \|\| (n = as.length) < 1) {
210	>	synchronized(mutex) { // Try to initialize
211	>	if (adders == null) {
212	>	Adder[] rs = new Adder[DEFAULT_INITIAL_SIZE];
213	>	rs[h & (DEFAULT_INITIAL_SIZE - 1)] = new Adder(0);
214	>	adders = rs;
215	>	}
216	>	}
217	>	collisions = 0;
218	>	}
219	>
220	>	if ((a = as[k = (n - 1) & h]) == null) { // Try to add slot
221	>	if (mutex.get() == 0 && mutex.compareAndSet(0, 1)) {
222	>	try {
223	>	if (adders == as && as[k] == null)
224	>	a = as[k] = new Adder(x);
225	>	} finally {
226	>	mutex.set(0);
227	>	}
228	>	if (a != null)
229	>	break;
230		}
231	+	collisions = 0;
232		}
233	<	final AtomicInteger mutex = this.mutex;
234	<	if (mutex.get() != 0)
148	<	h = xorShift(h); // Try elsewhere
149	<	else if (mutex.compareAndSet(0, 1)) {
150	<	boolean created = false;
233	>	else if (collisions != 0 && n < CAP && // Try to expand table
234	>	mutex.get() == 0 && mutex.compareAndSet(0, 1)) {
235		try {
236	<	Adder[] rs = adders;
237	<	if (a != null && rs == as) // Resize table
238	<	rs = adders = Arrays.copyOf(as, as.length << 1);
239	<	int j = h & (rs.length - 1);
240	<	if (rs[j] == null) { // Create adder
157	<	rs[j] = new Adder(x);
158	<	created = true;
236	>	if (adders == as) {
237	>	Adder[] rs = new Adder[n << 1];
238	>	for (int i = 0; i < n; ++i)
239	>	rs[i] = as[i];
240	>	adders = rs;
241		}
242		} finally {
243		mutex.set(0);
244		}
245	<	if (created) {
164	<	hc.code = h; // Use this adder next time
165	<	break;
166	<	}
245	>	collisions = 0;
246		}
247	+	else if (a.compareAndSet(v = a.get(), v + x))
248	+	break;
249	+	else
250	+	collisions = 1;
251	+	h ^= h << 13; // Rehash
252	+	h ^= h >>> 17;
253	+	h ^= h << 5;
254		}
255	+	hc.code = h;
256		}
257
258		/**
#	Line 176 \| Line 263 \| public class StripedAdder implements Ser
263		* @return the estimated sum
264		*/
265		public long sum() {
266	<	long sum = 0;
266	>	long sum = 0L;
267		Adder[] as = adders;
268	<	int n = as.length;
269	<	for (int i = 0; i < n; ++i) {
270	<	Adder a = as[i];
271	<	if (a != null)
272	<	sum += a.get();
268	>	if (as != null) {
269	>	int n = as.length;
270	>	for (int i = 0; i < n; ++i) {
271	>	Adder a = as[i];
272	>	if (a != null)
273	>	sum += a.get();
274	>	}
275		}
276		return sum;
277		}
#	Line 194 \| Line 283 \| public class StripedAdder implements Ser
283		*/
284		public void reset() {
285		Adder[] as = adders;
286	<	int n = as.length;
287	<	for (int i = 0; i < n; ++i) {
288	<	Adder a = as[i];
289	<	if (a != null)
290	<	a.set(0L);
286	>	if (as != null) {
287	>	int n = as.length;
288	>	for (int i = 0; i < n; ++i) {
289	>	Adder a = as[i];
290	>	if (a != null)
291	>	a.set(0L);
292	>	}
293		}
294		}
295
#	Line 222 \| Line 313 \| public class StripedAdder implements Ser
313		* @return the estimated sum
314		*/
315		public long sumAndReset() {
316	<	long sum = 0;
316	>	long sum = 0L;
317		Adder[] as = adders;
318	<	int n = as.length;
319	<	for (int i = 0; i < n; ++i) {
320	<	Adder a = as[i];
321	<	if (a != null) {
322	<	sum += a.get();
323	<	a.set(0L);
318	>	if (as != null) {
319	>	int n = as.length;
320	>	for (int i = 0; i < n; ++i) {
321	>	Adder a = as[i];
322	>	if (a != null) {
323	>	sum += a.get();
324	>	a.set(0L);
325	>	}
326		}
327		}
328		return sum;
#	Line 244 \| Line 337 \| public class StripedAdder implements Ser
337		private void readObject(ObjectInputStream s)
338		throws IOException, ClassNotFoundException {
339		s.defaultReadObject();
247	–	long c = s.readLong();
248	–	Adder[] as = new Adder[2];
249	–	as[0] = new Adder(c);
250	–	this.adders = as;
340		mutex.set(0);
341	+	add(s.readLong());
342		}
343
344		}
255	–
256	–

Diff Legend

-–
+Removed lines
-+
+Added lines
-<
+Changed lines
->
+Changed lines

Comparing jsr166/src/jsr166e/StripedAdder.java (file contents): Revision 1.2 by jsr166, Wed Jul 20 16:06:19 2011 UTC vs. Revision 1.6 by dl, Tue Jul 26 17:16:36 2011 UTC

Diff Legend

Comparing jsr166/src/jsr166e/StripedAdder.java (file contents):
Revision 1.2 by jsr166, Wed Jul 20 16:06:19 2011 UTC vs.
Revision 1.6 by dl, Tue Jul 26 17:16:36 2011 UTC