ViewVC Help
View File | Revision Log | Show Annotations | Download File | Root Listing
root/jsr166/jsr166/src/main/java/util/concurrent/Exchanger.java
Revision: 1.77
Committed: Fri Jun 3 10:45:44 2016 UTC (8 years ago) by dl
Branch: MAIN
Changes since 1.76: +1 -1 lines
Log Message:
Revert to volatile read

File Contents

# Content
1 /*
2 * Written by Doug Lea, Bill Scherer, and Michael Scott with
3 * assistance from members of JCP JSR-166 Expert Group and released to
4 * the public domain, as explained at
5 * http://creativecommons.org/publicdomain/zero/1.0/
6 */
7
8 package java.util.concurrent;
9
10 import java.lang.invoke.MethodHandles;
11 import java.lang.invoke.VarHandle;
12 import java.util.concurrent.locks.LockSupport;
13
14 /**
15 * A synchronization point at which threads can pair and swap elements
16 * within pairs. Each thread presents some object on entry to the
17 * {@link #exchange exchange} method, matches with a partner thread,
18 * and receives its partner's object on return. An Exchanger may be
19 * viewed as a bidirectional form of a {@link SynchronousQueue}.
20 * Exchangers may be useful in applications such as genetic algorithms
21 * and pipeline designs.
22 *
23 * <p><b>Sample Usage:</b>
24 * Here are the highlights of a class that uses an {@code Exchanger}
25 * to swap buffers between threads so that the thread filling the
26 * buffer gets a freshly emptied one when it needs it, handing off the
27 * filled one to the thread emptying the buffer.
28 * <pre> {@code
29 * class FillAndEmpty {
30 * Exchanger<DataBuffer> exchanger = new Exchanger<>();
31 * DataBuffer initialEmptyBuffer = ... a made-up type
32 * DataBuffer initialFullBuffer = ...
33 *
34 * class FillingLoop implements Runnable {
35 * public void run() {
36 * DataBuffer currentBuffer = initialEmptyBuffer;
37 * try {
38 * while (currentBuffer != null) {
39 * addToBuffer(currentBuffer);
40 * if (currentBuffer.isFull())
41 * currentBuffer = exchanger.exchange(currentBuffer);
42 * }
43 * } catch (InterruptedException ex) { ... handle ... }
44 * }
45 * }
46 *
47 * class EmptyingLoop implements Runnable {
48 * public void run() {
49 * DataBuffer currentBuffer = initialFullBuffer;
50 * try {
51 * while (currentBuffer != null) {
52 * takeFromBuffer(currentBuffer);
53 * if (currentBuffer.isEmpty())
54 * currentBuffer = exchanger.exchange(currentBuffer);
55 * }
56 * } catch (InterruptedException ex) { ... handle ...}
57 * }
58 * }
59 *
60 * void start() {
61 * new Thread(new FillingLoop()).start();
62 * new Thread(new EmptyingLoop()).start();
63 * }
64 * }}</pre>
65 *
66 * <p>Memory consistency effects: For each pair of threads that
67 * successfully exchange objects via an {@code Exchanger}, actions
68 * prior to the {@code exchange()} in each thread
69 * <a href="package-summary.html#MemoryVisibility"><i>happen-before</i></a>
70 * those subsequent to a return from the corresponding {@code exchange()}
71 * in the other thread.
72 *
73 * @since 1.5
74 * @author Doug Lea and Bill Scherer and Michael Scott
75 * @param <V> The type of objects that may be exchanged
76 */
77 public class Exchanger<V> {
78
79 /*
80 * Overview: The core algorithm is, for an exchange "slot",
81 * and a participant (caller) with an item:
82 *
83 * for (;;) {
84 * if (slot is empty) { // offer
85 * place item in a Node;
86 * if (can CAS slot from empty to node) {
87 * wait for release;
88 * return matching item in node;
89 * }
90 * }
91 * else if (can CAS slot from node to empty) { // release
92 * get the item in node;
93 * set matching item in node;
94 * release waiting thread;
95 * }
96 * // else retry on CAS failure
97 * }
98 *
99 * This is among the simplest forms of a "dual data structure" --
100 * see Scott and Scherer's DISC 04 paper and
101 * http://www.cs.rochester.edu/research/synchronization/pseudocode/duals.html
102 *
103 * This works great in principle. But in practice, like many
104 * algorithms centered on atomic updates to a single location, it
105 * scales horribly when there are more than a few participants
106 * using the same Exchanger. So the implementation instead uses a
107 * form of elimination arena, that spreads out this contention by
108 * arranging that some threads typically use different slots,
109 * while still ensuring that eventually, any two parties will be
110 * able to exchange items. That is, we cannot completely partition
111 * across threads, but instead give threads arena indices that
112 * will on average grow under contention and shrink under lack of
113 * contention. We approach this by defining the Nodes that we need
114 * anyway as ThreadLocals, and include in them per-thread index
115 * and related bookkeeping state. (We can safely reuse per-thread
116 * nodes rather than creating them fresh each time because slots
117 * alternate between pointing to a node vs null, so cannot
118 * encounter ABA problems. However, we do need some care in
119 * resetting them between uses.)
120 *
121 * Implementing an effective arena requires allocating a bunch of
122 * space, so we only do so upon detecting contention (except on
123 * uniprocessors, where they wouldn't help, so aren't used).
124 * Otherwise, exchanges use the single-slot slotExchange method.
125 * On contention, not only must the slots be in different
126 * locations, but the locations must not encounter memory
127 * contention due to being on the same cache line (or more
128 * generally, the same coherence unit). Because, as of this
129 * writing, there is no way to determine cacheline size, we define
130 * a value that is enough for common platforms. Additionally,
131 * extra care elsewhere is taken to avoid other false/unintended
132 * sharing and to enhance locality, including adding padding (via
133 * @Contended) to Nodes, embedding "bound" as an Exchanger field.
134 *
135 * The arena starts out with only one used slot. We expand the
136 * effective arena size by tracking collisions; i.e., failed CASes
137 * while trying to exchange. By nature of the above algorithm, the
138 * only kinds of collision that reliably indicate contention are
139 * when two attempted releases collide -- one of two attempted
140 * offers can legitimately fail to CAS without indicating
141 * contention by more than one other thread. (Note: it is possible
142 * but not worthwhile to more precisely detect contention by
143 * reading slot values after CAS failures.) When a thread has
144 * collided at each slot within the current arena bound, it tries
145 * to expand the arena size by one. We track collisions within
146 * bounds by using a version (sequence) number on the "bound"
147 * field, and conservatively reset collision counts when a
148 * participant notices that bound has been updated (in either
149 * direction).
150 *
151 * The effective arena size is reduced (when there is more than
152 * one slot) by giving up on waiting after a while and trying to
153 * decrement the arena size on expiration. The value of "a while"
154 * is an empirical matter. We implement by piggybacking on the
155 * use of spin->yield->block that is essential for reasonable
156 * waiting performance anyway -- in a busy exchanger, offers are
157 * usually almost immediately released, in which case context
158 * switching on multiprocessors is extremely slow/wasteful. Arena
159 * waits just omit the blocking part, and instead cancel. The spin
160 * count is empirically chosen to be a value that avoids blocking
161 * 99% of the time under maximum sustained exchange rates on a
162 * range of test machines. Spins and yields entail some limited
163 * randomness (using a cheap xorshift) to avoid regular patterns
164 * that can induce unproductive grow/shrink cycles. (Using a
165 * pseudorandom also helps regularize spin cycle duration by
166 * making branches unpredictable.) Also, during an offer, a
167 * waiter can "know" that it will be released when its slot has
168 * changed, but cannot yet proceed until match is set. In the
169 * mean time it cannot cancel the offer, so instead spins/yields.
170 * Note: It is possible to avoid this secondary check by changing
171 * the linearization point to be a CAS of the match field (as done
172 * in one case in the Scott & Scherer DISC paper), which also
173 * increases asynchrony a bit, at the expense of poorer collision
174 * detection and inability to always reuse per-thread nodes. So
175 * the current scheme is typically a better tradeoff.
176 *
177 * On collisions, indices traverse the arena cyclically in reverse
178 * order, restarting at the maximum index (which will tend to be
179 * sparsest) when bounds change. (On expirations, indices instead
180 * are halved until reaching 0.) It is possible (and has been
181 * tried) to use randomized, prime-value-stepped, or double-hash
182 * style traversal instead of simple cyclic traversal to reduce
183 * bunching. But empirically, whatever benefits these may have
184 * don't overcome their added overhead: We are managing operations
185 * that occur very quickly unless there is sustained contention,
186 * so simpler/faster control policies work better than more
187 * accurate but slower ones.
188 *
189 * Because we use expiration for arena size control, we cannot
190 * throw TimeoutExceptions in the timed version of the public
191 * exchange method until the arena size has shrunken to zero (or
192 * the arena isn't enabled). This may delay response to timeout
193 * but is still within spec.
194 *
195 * Essentially all of the implementation is in methods
196 * slotExchange and arenaExchange. These have similar overall
197 * structure, but differ in too many details to combine. The
198 * slotExchange method uses the single Exchanger field "slot"
199 * rather than arena array elements. However, it still needs
200 * minimal collision detection to trigger arena construction.
201 * (The messiest part is making sure interrupt status and
202 * InterruptedExceptions come out right during transitions when
203 * both methods may be called. This is done by using null return
204 * as a sentinel to recheck interrupt status.)
205 *
206 * As is too common in this sort of code, methods are monolithic
207 * because most of the logic relies on reads of fields that are
208 * maintained as local variables so can't be nicely factored --
209 * mainly, here, bulky spin->yield->block/cancel code), and
210 * heavily dependent on intrinsics (VarHandles) to use inlined
211 * embedded CAS and related memory access operations (that tend
212 * not to be as readily inlined by dynamic compilers when they are
213 * hidden behind other methods that would more nicely name and
214 * encapsulate the intended effects). This includes the use of
215 * putXRelease to clear fields of the per-thread Nodes between
216 * uses. Note that field Node.item is not declared as volatile
217 * even though it is read by releasing threads, because they only
218 * do so after CAS operations that must precede access, and all
219 * uses by the owning thread are otherwise acceptably ordered by
220 * other operations. (Because the actual points of atomicity are
221 * slot CASes, it would also be legal for the write to Node.match
222 * in a release to be weaker than a full volatile write. However,
223 * this is not done because it could allow further postponement of
224 * the write, delaying progress.)
225 */
226
227 /**
228 * The byte distance (as a shift value) between any two used slots
229 * in the arena. 1 << ASHIFT should be at least cacheline size.
230 */
231 private static final int ASHIFT = 7;
232
233 /**
234 * The maximum supported arena index. The maximum allocatable
235 * arena size is MMASK + 1. Must be a power of two minus one, less
236 * than (1<<(31-ASHIFT)). The cap of 255 (0xff) more than suffices
237 * for the expected scaling limits of the main algorithms.
238 */
239 private static final int MMASK = 0xff;
240
241 /**
242 * Unit for sequence/version bits of bound field. Each successful
243 * change to the bound also adds SEQ.
244 */
245 private static final int SEQ = MMASK + 1;
246
247 /** The number of CPUs, for sizing and spin control */
248 private static final int NCPU = Runtime.getRuntime().availableProcessors();
249
250 /**
251 * The maximum slot index of the arena: The number of slots that
252 * can in principle hold all threads without contention, or at
253 * most the maximum indexable value.
254 */
255 static final int FULL = (NCPU >= (MMASK << 1)) ? MMASK : NCPU >>> 1;
256
257 /**
258 * The bound for spins while waiting for a match. The actual
259 * number of iterations will on average be about twice this value
260 * due to randomization. Note: Spinning is disabled when NCPU==1.
261 */
262 private static final int SPINS = 1 << 10;
263
264 /**
265 * Value representing null arguments/returns from public
266 * methods. Needed because the API originally didn't disallow null
267 * arguments, which it should have.
268 */
269 private static final Object NULL_ITEM = new Object();
270
271 /**
272 * Sentinel value returned by internal exchange methods upon
273 * timeout, to avoid need for separate timed versions of these
274 * methods.
275 */
276 private static final Object TIMED_OUT = new Object();
277
278 /**
279 * Nodes hold partially exchanged data, plus other per-thread
280 * bookkeeping. Padded via @Contended to reduce memory contention.
281 */
282 @jdk.internal.vm.annotation.Contended static final class Node {
283 int index; // Arena index
284 int bound; // Last recorded value of Exchanger.bound
285 int collides; // Number of CAS failures at current bound
286 int hash; // Pseudo-random for spins
287 Object item; // This thread's current item
288 volatile Object match; // Item provided by releasing thread
289 volatile Thread parked; // Set to this thread when parked, else null
290 }
291
292 /** The corresponding thread local class */
293 static final class Participant extends ThreadLocal<Node> {
294 public Node initialValue() { return new Node(); }
295 }
296
297 /**
298 * Per-thread state.
299 */
300 private final Participant participant;
301
302 /**
303 * Elimination array; null until enabled (within slotExchange).
304 * Element accesses use emulation of volatile gets and CAS.
305 */
306 private volatile Node[] arena;
307
308 /**
309 * Slot used until contention detected.
310 */
311 private volatile Node slot;
312
313 /**
314 * The index of the largest valid arena position, OR'ed with SEQ
315 * number in high bits, incremented on each update. The initial
316 * update from 0 to SEQ is used to ensure that the arena array is
317 * constructed only once.
318 */
319 private volatile int bound;
320
321 /**
322 * Exchange function when arenas enabled. See above for explanation.
323 *
324 * @param item the (non-null) item to exchange
325 * @param timed true if the wait is timed
326 * @param ns if timed, the maximum wait time, else 0L
327 * @return the other thread's item; or null if interrupted; or
328 * TIMED_OUT if timed and timed out
329 */
330 private final Object arenaExchange(Object item, boolean timed, long ns) {
331 Node[] a = arena;
332 int alen = a.length;
333 Node p = participant.get();
334 for (int i = p.index;;) { // access slot at i
335 int b, m, c;
336 int j = (i << ASHIFT) + ((1 << ASHIFT) - 1);
337 if (j < 0 || j >= alen)
338 j = alen - 1;
339 Node q = (Node)AA.getVolatile(a, j);
340 if (q != null && AA.compareAndSet(a, j, q, null)) {
341 Object v = q.item; // release
342 q.match = item;
343 Thread w = q.parked;
344 if (w != null)
345 LockSupport.unpark(w);
346 return v;
347 }
348 else if (i <= (m = (b = bound) & MMASK) && q == null) {
349 p.item = item; // offer
350 if (AA.compareAndSet(a, j, null, p)) {
351 long end = (timed && m == 0) ? System.nanoTime() + ns : 0L;
352 Thread t = Thread.currentThread(); // wait
353 for (int h = p.hash, spins = SPINS;;) {
354 Object v = p.match;
355 if (v != null) {
356 MATCH.setRelease(p, null);
357 p.item = null; // clear for next use
358 p.hash = h;
359 return v;
360 }
361 else if (spins > 0) {
362 h ^= h << 1; h ^= h >>> 3; h ^= h << 10; // xorshift
363 if (h == 0) // initialize hash
364 h = SPINS | (int)t.getId();
365 else if (h < 0 && // approx 50% true
366 (--spins & ((SPINS >>> 1) - 1)) == 0)
367 Thread.yield(); // two yields per wait
368 }
369 else if (AA.getVolatile(a, j) != p)
370 spins = SPINS; // releaser hasn't set match yet
371 else if (!t.isInterrupted() && m == 0 &&
372 (!timed ||
373 (ns = end - System.nanoTime()) > 0L)) {
374 p.parked = t; // minimize window
375 if (AA.getVolatile(a, j) == p) {
376 if (ns == 0L)
377 LockSupport.park(this);
378 else
379 LockSupport.parkNanos(this, ns);
380 }
381 p.parked = null;
382 }
383 else if (AA.getVolatile(a, j) == p &&
384 AA.compareAndSet(a, j, p, null)) {
385 if (m != 0) // try to shrink
386 BOUND.compareAndSet(this, b, b + SEQ - 1);
387 p.item = null;
388 p.hash = h;
389 i = p.index >>>= 1; // descend
390 if (Thread.interrupted())
391 return null;
392 if (timed && m == 0 && ns <= 0L)
393 return TIMED_OUT;
394 break; // expired; restart
395 }
396 }
397 }
398 else
399 p.item = null; // clear offer
400 }
401 else {
402 if (p.bound != b) { // stale; reset
403 p.bound = b;
404 p.collides = 0;
405 i = (i != m || m == 0) ? m : m - 1;
406 }
407 else if ((c = p.collides) < m || m == FULL ||
408 !BOUND.compareAndSet(this, b, b + SEQ + 1)) {
409 p.collides = c + 1;
410 i = (i == 0) ? m : i - 1; // cyclically traverse
411 }
412 else
413 i = m + 1; // grow
414 p.index = i;
415 }
416 }
417 }
418
419 /**
420 * Exchange function used until arenas enabled. See above for explanation.
421 *
422 * @param item the item to exchange
423 * @param timed true if the wait is timed
424 * @param ns if timed, the maximum wait time, else 0L
425 * @return the other thread's item; or null if either the arena
426 * was enabled or the thread was interrupted before completion; or
427 * TIMED_OUT if timed and timed out
428 */
429 private final Object slotExchange(Object item, boolean timed, long ns) {
430 Node p = participant.get();
431 Thread t = Thread.currentThread();
432 if (t.isInterrupted()) // preserve interrupt status so caller can recheck
433 return null;
434
435 for (Node q;;) {
436 if ((q = slot) != null) {
437 if (SLOT.compareAndSet(this, q, null)) {
438 Object v = q.item;
439 q.match = item;
440 Thread w = q.parked;
441 if (w != null)
442 LockSupport.unpark(w);
443 return v;
444 }
445 // create arena on contention, but continue until slot null
446 if (NCPU > 1 && bound == 0 &&
447 BOUND.compareAndSet(this, 0, SEQ))
448 arena = new Node[(FULL + 2) << ASHIFT];
449 }
450 else if (arena != null)
451 return null; // caller must reroute to arenaExchange
452 else {
453 p.item = item;
454 if (SLOT.compareAndSet(this, null, p))
455 break;
456 p.item = null;
457 }
458 }
459
460 // await release
461 int h = p.hash;
462 long end = timed ? System.nanoTime() + ns : 0L;
463 int spins = (NCPU > 1) ? SPINS : 1;
464 Object v;
465 while ((v = p.match) == null) {
466 if (spins > 0) {
467 h ^= h << 1; h ^= h >>> 3; h ^= h << 10;
468 if (h == 0)
469 h = SPINS | (int)t.getId();
470 else if (h < 0 && (--spins & ((SPINS >>> 1) - 1)) == 0)
471 Thread.yield();
472 }
473 else if (slot != p)
474 spins = SPINS;
475 else if (!t.isInterrupted() && arena == null &&
476 (!timed || (ns = end - System.nanoTime()) > 0L)) {
477 p.parked = t;
478 if (slot == p) {
479 if (ns == 0L)
480 LockSupport.park(this);
481 else
482 LockSupport.parkNanos(this, ns);
483 }
484 p.parked = null;
485 }
486 else if (SLOT.compareAndSet(this, p, null)) {
487 v = timed && ns <= 0L && !t.isInterrupted() ? TIMED_OUT : null;
488 break;
489 }
490 }
491 MATCH.setRelease(p, null);
492 p.item = null;
493 p.hash = h;
494 return v;
495 }
496
497 /**
498 * Creates a new Exchanger.
499 */
500 public Exchanger() {
501 participant = new Participant();
502 }
503
504 /**
505 * Waits for another thread to arrive at this exchange point (unless
506 * the current thread is {@linkplain Thread#interrupt interrupted}),
507 * and then transfers the given object to it, receiving its object
508 * in return.
509 *
510 * <p>If another thread is already waiting at the exchange point then
511 * it is resumed for thread scheduling purposes and receives the object
512 * passed in by the current thread. The current thread returns immediately,
513 * receiving the object passed to the exchange by that other thread.
514 *
515 * <p>If no other thread is already waiting at the exchange then the
516 * current thread is disabled for thread scheduling purposes and lies
517 * dormant until one of two things happens:
518 * <ul>
519 * <li>Some other thread enters the exchange; or
520 * <li>Some other thread {@linkplain Thread#interrupt interrupts}
521 * the current thread.
522 * </ul>
523 * <p>If the current thread:
524 * <ul>
525 * <li>has its interrupted status set on entry to this method; or
526 * <li>is {@linkplain Thread#interrupt interrupted} while waiting
527 * for the exchange,
528 * </ul>
529 * then {@link InterruptedException} is thrown and the current thread's
530 * interrupted status is cleared.
531 *
532 * @param x the object to exchange
533 * @return the object provided by the other thread
534 * @throws InterruptedException if the current thread was
535 * interrupted while waiting
536 */
537 @SuppressWarnings("unchecked")
538 public V exchange(V x) throws InterruptedException {
539 Object v;
540 Node[] a;
541 Object item = (x == null) ? NULL_ITEM : x; // translate null args
542 if (((a = arena) != null ||
543 (v = slotExchange(item, false, 0L)) == null) &&
544 ((Thread.interrupted() || // disambiguates null return
545 (v = arenaExchange(item, false, 0L)) == null)))
546 throw new InterruptedException();
547 return (v == NULL_ITEM) ? null : (V)v;
548 }
549
550 /**
551 * Waits for another thread to arrive at this exchange point (unless
552 * the current thread is {@linkplain Thread#interrupt interrupted} or
553 * the specified waiting time elapses), and then transfers the given
554 * object to it, receiving its object in return.
555 *
556 * <p>If another thread is already waiting at the exchange point then
557 * it is resumed for thread scheduling purposes and receives the object
558 * passed in by the current thread. The current thread returns immediately,
559 * receiving the object passed to the exchange by that other thread.
560 *
561 * <p>If no other thread is already waiting at the exchange then the
562 * current thread is disabled for thread scheduling purposes and lies
563 * dormant until one of three things happens:
564 * <ul>
565 * <li>Some other thread enters the exchange; or
566 * <li>Some other thread {@linkplain Thread#interrupt interrupts}
567 * the current thread; or
568 * <li>The specified waiting time elapses.
569 * </ul>
570 * <p>If the current thread:
571 * <ul>
572 * <li>has its interrupted status set on entry to this method; or
573 * <li>is {@linkplain Thread#interrupt interrupted} while waiting
574 * for the exchange,
575 * </ul>
576 * then {@link InterruptedException} is thrown and the current thread's
577 * interrupted status is cleared.
578 *
579 * <p>If the specified waiting time elapses then {@link
580 * TimeoutException} is thrown. If the time is less than or equal
581 * to zero, the method will not wait at all.
582 *
583 * @param x the object to exchange
584 * @param timeout the maximum time to wait
585 * @param unit the time unit of the {@code timeout} argument
586 * @return the object provided by the other thread
587 * @throws InterruptedException if the current thread was
588 * interrupted while waiting
589 * @throws TimeoutException if the specified waiting time elapses
590 * before another thread enters the exchange
591 */
592 @SuppressWarnings("unchecked")
593 public V exchange(V x, long timeout, TimeUnit unit)
594 throws InterruptedException, TimeoutException {
595 Object v;
596 Object item = (x == null) ? NULL_ITEM : x;
597 long ns = unit.toNanos(timeout);
598 if ((arena != null ||
599 (v = slotExchange(item, true, ns)) == null) &&
600 ((Thread.interrupted() ||
601 (v = arenaExchange(item, true, ns)) == null)))
602 throw new InterruptedException();
603 if (v == TIMED_OUT)
604 throw new TimeoutException();
605 return (v == NULL_ITEM) ? null : (V)v;
606 }
607
608 // VarHandle mechanics
609 private static final VarHandle BOUND;
610 private static final VarHandle SLOT;
611 private static final VarHandle MATCH;
612 private static final VarHandle AA;
613 static {
614 try {
615 MethodHandles.Lookup l = MethodHandles.lookup();
616 BOUND = l.findVarHandle(Exchanger.class, "bound", int.class);
617 SLOT = l.findVarHandle(Exchanger.class, "slot", Node.class);
618 MATCH = l.findVarHandle(Node.class, "match", Object.class);
619 AA = MethodHandles.arrayElementVarHandle(Node[].class);
620 } catch (ReflectiveOperationException e) {
621 throw new Error(e);
622 }
623 }
624
625 }