ViewVC Help
View File | Revision Log | Show Annotations | Download File | Root Listing
root/jsr166/jsr166/src/jsr166y/LinkedTransferQueue.java
(Generate patch)

Comparing jsr166/src/jsr166y/LinkedTransferQueue.java (file contents):
Revision 1.47 by jsr166, Thu Oct 22 09:06:38 2009 UTC vs.
Revision 1.53 by jsr166, Tue Oct 27 19:59:43 2009 UTC

# Line 105 | Line 105 | public class LinkedTransferQueue<E> exte
105       * successful atomic operation per enq/deq pair. But it also
106       * enables lower cost variants of queue maintenance mechanics. (A
107       * variation of this idea applies even for non-dual queues that
108 <     * support deletion of embedded elements, such as
108 >     * support deletion of interior elements, such as
109       * j.u.c.ConcurrentLinkedQueue.)
110       *
111 <     * Once a node is matched, its item can never again change.  We
112 <     * may thus arrange that the linked list of them contains a prefix
113 <     * of zero or more matched nodes, followed by a suffix of zero or
114 <     * more unmatched nodes. (Note that we allow both the prefix and
115 <     * suffix to be zero length, which in turn means that we do not
116 <     * use a dummy header.)  If we were not concerned with either time
117 <     * or space efficiency, we could correctly perform enqueue and
118 <     * dequeue operations by traversing from a pointer to the initial
119 <     * node; CASing the item of the first unmatched node on match and
120 <     * CASing the next field of the trailing node on appends.  While
121 <     * this would be a terrible idea in itself, it does have the
122 <     * benefit of not requiring ANY atomic updates on head/tail
123 <     * fields.
111 >     * Once a node is matched, its match status can never again
112 >     * change.  We may thus arrange that the linked list of them
113 >     * contain a prefix of zero or more matched nodes, followed by a
114 >     * suffix of zero or more unmatched nodes. (Note that we allow
115 >     * both the prefix and suffix to be zero length, which in turn
116 >     * means that we do not use a dummy header.)  If we were not
117 >     * concerned with either time or space efficiency, we could
118 >     * correctly perform enqueue and dequeue operations by traversing
119 >     * from a pointer to the initial node; CASing the item of the
120 >     * first unmatched node on match and CASing the next field of the
121 >     * trailing node on appends. (Plus some special-casing when
122 >     * initially empty).  While this would be a terrible idea in
123 >     * itself, it does have the benefit of not requiring ANY atomic
124 >     * updates on head/tail fields.
125       *
126       * We introduce here an approach that lies between the extremes of
127 <     * never versus always updating queue (head and tail) pointers
128 <     * that reflects the tradeoff of sometimes requiring extra traversal
129 <     * steps to locate the first and/or last unmatched nodes, versus
130 <     * the reduced overhead and contention of fewer updates to queue
131 <     * pointers. For example, a possible snapshot of a queue is:
127 >     * never versus always updating queue (head and tail) pointers.
128 >     * This offers a tradeoff between sometimes requiring extra
129 >     * traversal steps to locate the first and/or last unmatched
130 >     * nodes, versus the reduced overhead and contention of fewer
131 >     * updates to queue pointers. For example, a possible snapshot of
132 >     * a queue is:
133       *
134       *  head           tail
135       *    |              |
# Line 139 | Line 141 | public class LinkedTransferQueue<E> exte
141       * similarly for "tail") is an empirical matter. We have found
142       * that using very small constants in the range of 1-3 work best
143       * over a range of platforms. Larger values introduce increasing
144 <     * costs of cache misses and risks of long traversal chains.
144 >     * costs of cache misses and risks of long traversal chains, while
145 >     * smaller values increase CAS contention and overhead.
146       *
147       * Dual queues with slack differ from plain M&S dual queues by
148       * virtue of only sometimes updating head or tail pointers when
# Line 158 | Line 161 | public class LinkedTransferQueue<E> exte
161       * targets.  Even when using very small slack values, this
162       * approach works well for dual queues because it allows all
163       * operations up to the point of matching or appending an item
164 <     * (hence potentially releasing another thread) to be read-only,
165 <     * thus not introducing any further contention. As described
166 <     * below, we implement this by performing slack maintenance
167 <     * retries only after these points.
164 >     * (hence potentially allowing progress by another thread) to be
165 >     * read-only, thus not introducing any further contention. As
166 >     * described below, we implement this by performing slack
167 >     * maintenance retries only after these points.
168       *
169       * As an accompaniment to such techniques, traversal overhead can
170       * be further reduced without increasing contention of head
171 <     * pointer updates.  During traversals, threads may sometimes
172 <     * shortcut the "next" link path from the current "head" node to
173 <     * be closer to the currently known first unmatched node. Again,
174 <     * this may be triggered with using thresholds or randomization.
171 >     * pointer updates: Threads may sometimes shortcut the "next" link
172 >     * path from the current "head" node to be closer to the currently
173 >     * known first unmatched node, and similarly for tail. Again, this
174 >     * may be triggered with using thresholds or randomization.
175       *
176       * These ideas must be further extended to avoid unbounded amounts
177       * of costly-to-reclaim garbage caused by the sequential "next"
# Line 196 | Line 199 | public class LinkedTransferQueue<E> exte
199       * mechanics because an update may leave head at a detached node.
200       * And while direct writes are possible for tail updates, they
201       * increase the risk of long retraversals, and hence long garbage
202 <     * chains which can be much more costly than is worthwhile
202 >     * chains, which can be much more costly than is worthwhile
203       * considering that the cost difference of performing a CAS vs
204       * write is smaller when they are not triggered on each operation
205       * (especially considering that writes and CASes equally require
206       * additional GC bookkeeping ("write barriers") that are sometimes
207       * more costly than the writes themselves because of contention).
208       *
209 <     * Removal of internal nodes (due to timed out or interrupted
210 <     * waits, or calls to remove or Iterator.remove) uses a scheme
211 <     * roughly similar to that in Scherer, Lea, and Scott
212 <     * SynchronousQueue. Given a predecessor, we can unsplice any node
213 <     * except the (actual) tail of the queue. To avoid build-up of
214 <     * cancelled trailing nodes, upon a request to remove a trailing
215 <     * node, it is placed in field "cleanMe" to be unspliced later.
209 >     * Removal of interior nodes (due to timed out or interrupted
210 >     * waits, or calls to remove(x) or Iterator.remove) can use a
211 >     * scheme roughly similar to that described in Scherer, Lea, and
212 >     * Scott's SynchronousQueue. Given a predecessor, we can unsplice
213 >     * any node except the (actual) tail of the queue. To avoid
214 >     * build-up of cancelled trailing nodes, upon a request to remove
215 >     * a trailing node, it is placed in field "cleanMe" to be
216 >     * unspliced upon the next call to unsplice any other node.
217 >     * Situations needing such mechanics are not common but do occur
218 >     * in practice; for example when an unbounded series of short
219 >     * timed calls to poll repeatedly time out but never otherwise
220 >     * fall off the list because of an untimed call to take at the
221 >     * front of the queue. Note that maintaining field cleanMe does
222 >     * not otherwise much impact garbage retention even if never
223 >     * cleared by some other call because the held node will
224 >     * eventually either directly or indirectly lead to a self-link
225 >     * once off the list.
226       *
227       * *** Overview of implementation ***
228       *
229 <     * We use a threshold-based approach to updates, with a target
230 <     * slack of two.  The slack value is hard-wired: a path greater
229 >     * We use a threshold-based approach to updates, with a slack
230 >     * threshold of two -- that is, we update head/tail when the
231 >     * current pointer appears to be two or more steps away from the
232 >     * first/last node. The slack value is hard-wired: a path greater
233       * than one is naturally implemented by checking equality of
234       * traversal pointers except when the list has only one element,
235 <     * in which case we keep max slack at one. Avoiding tracking
236 <     * explicit counts across situations slightly simplifies an
235 >     * in which case we keep slack threshold at one. Avoiding tracking
236 >     * explicit counts across method calls slightly simplifies an
237       * already-messy implementation. Using randomization would
238       * probably work better if there were a low-quality dirt-cheap
239       * per-thread one available, but even ThreadLocalRandom is too
240       * heavy for these purposes.
241       *
242 <     * With such a small slack value, path short-circuiting is rarely
243 <     * worthwhile. However, it is used (in awaitMatch) immediately
244 <     * before a waiting thread starts to block, as a final bit of
245 <     * helping at a point when contention with others is extremely
246 <     * unlikely (since if other threads that could release it are
247 <     * operating, then the current thread wouldn't be blocking).
242 >     * With such a small slack threshold value, it is rarely
243 >     * worthwhile to augment this with path short-circuiting; i.e.,
244 >     * unsplicing nodes between head and the first unmatched node, or
245 >     * similarly for tail, rather than advancing head or tail
246 >     * proper. However, it is used (in awaitMatch) immediately before
247 >     * a waiting thread starts to block, as a final bit of helping at
248 >     * a point when contention with others is extremely unlikely
249 >     * (since if other threads that could release it are operating,
250 >     * then the current thread wouldn't be blocking).
251 >     *
252 >     * We allow both the head and tail fields to be null before any
253 >     * nodes are enqueued; initializing upon first append.  This
254 >     * simplifies some other logic, as well as providing more
255 >     * efficient explicit control paths instead of letting JVMs insert
256 >     * implicit NullPointerExceptions when they are null.  While not
257 >     * currently fully implemented, we also leave open the possibility
258 >     * of re-nulling these fields when empty (which is complicated to
259 >     * arrange, for little benefit.)
260       *
261       * All enqueue/dequeue operations are handled by the single method
262       * "xfer" with parameters indicating whether to act as some form
263       * of offer, put, poll, take, or transfer (each possibly with
264       * timeout). The relative complexity of using one monolithic
265       * method outweighs the code bulk and maintenance problems of
266 <     * using nine separate methods.
266 >     * using separate methods for each case.
267       *
268       * Operation consists of up to three phases. The first is
269       * implemented within method xfer, the second in tryAppend, and
# Line 249 | Line 276 | public class LinkedTransferQueue<E> exte
276       *    case matching it and returning, also if necessary updating
277       *    head to one past the matched node (or the node itself if the
278       *    list has no other unmatched nodes). If the CAS misses, then
279 <     *    a retry loops until the slack is at most two. Traversals
280 <     *    also check if the initial head is now off-list, in which
281 <     *    case they start at the new head.
279 >     *    a loop retries advancing head by two steps until either
280 >     *    success or the slack is at most two. By requiring that each
281 >     *    attempt advances head by two (if applicable), we ensure that
282 >     *    the slack does not grow without bound. Traversals also check
283 >     *    if the initial head is now off-list, in which case they
284 >     *    start at the new head.
285       *
286       *    If no candidates are found and the call was untimed
287       *    poll/offer, (argument "how" is NOW) return.
288       *
289       * 2. Try to append a new node (method tryAppend)
290       *
291 <     *    Starting at current tail pointer, try to append a new node
292 <     *    to the list (or if head was null, establish the first
293 <     *    node). Nodes can be appended only if their predecessors are
294 <     *    either already matched or are of the same mode. If we detect
295 <     *    otherwise, then a new node with opposite mode must have been
296 <     *    appended during traversal, so must restart at phase 1. The
297 <     *    traversal and update steps are otherwise similar to phase 1:
298 <     *    Retrying upon CAS misses and checking for staleness.  In
299 <     *    particular, if a self-link is encountered, then we can
300 <     *    safely jump to a node on the list by continuing the
301 <     *    traversal at current head.
291 >     *    Starting at current tail pointer, find the actual last node
292 >     *    and try to append a new node (or if head was null, establish
293 >     *    the first node). Nodes can be appended only if their
294 >     *    predecessors are either already matched or are of the same
295 >     *    mode. If we detect otherwise, then a new node with opposite
296 >     *    mode must have been appended during traversal, so we must
297 >     *    restart at phase 1. The traversal and update steps are
298 >     *    otherwise similar to phase 1: Retrying upon CAS misses and
299 >     *    checking for staleness.  In particular, if a self-link is
300 >     *    encountered, then we can safely jump to a node on the list
301 >     *    by continuing the traversal at current head.
302       *
303       *    On successful append, if the call was ASYNC, return.
304       *
305       * 3. Await match or cancellation (method awaitMatch)
306       *
307       *    Wait for another thread to match node; instead cancelling if
308 <     *    current thread was interrupted or the wait timed out. On
308 >     *    the current thread was interrupted or the wait timed out. On
309       *    multiprocessors, we use front-of-queue spinning: If a node
310       *    appears to be the first unmatched node in the queue, it
311       *    spins a bit before blocking. In either case, before blocking
# Line 290 | Line 320 | public class LinkedTransferQueue<E> exte
320       *    to decide to occasionally perform a Thread.yield. While
321       *    yield has underdefined specs, we assume that might it help,
322       *    and will not hurt in limiting impact of spinning on busy
323 <     *    systems.  We also use much smaller (1/4) spins for nodes
324 <     *    that are not known to be front but whose predecessors have
325 <     *    not blocked -- these "chained" spins avoid artifacts of
323 >     *    systems.  We also use smaller (1/2) spins for nodes that are
324 >     *    not known to be front but whose predecessors have not
325 >     *    blocked -- these "chained" spins avoid artifacts of
326       *    front-of-queue rules which otherwise lead to alternating
327       *    nodes spinning vs blocking. Further, front threads that
328       *    represent phase changes (from data to request node or vice
329       *    versa) compared to their predecessors receive additional
330 <     *    spins, reflecting the longer code path lengths necessary to
331 <     *    release them under contention.
330 >     *    chained spins, reflecting longer paths typically required to
331 >     *    unblock threads during phase changes.
332       */
333  
334      /** True if on multiprocessor */
# Line 306 | Line 336 | public class LinkedTransferQueue<E> exte
336          Runtime.getRuntime().availableProcessors() > 1;
337  
338      /**
339 <     * The number of times to spin (with on average one randomly
340 <     * interspersed call to Thread.yield) on multiprocessor before
341 <     * blocking when a node is apparently the first waiter in the
342 <     * queue.  See above for explanation. Must be a power of two. The
343 <     * value is empirically derived -- it works pretty well across a
344 <     * variety of processors, numbers of CPUs, and OSes.
339 >     * The number of times to spin (with randomly interspersed calls
340 >     * to Thread.yield) on multiprocessor before blocking when a node
341 >     * is apparently the first waiter in the queue.  See above for
342 >     * explanation. Must be a power of two. The value is empirically
343 >     * derived -- it works pretty well across a variety of processors,
344 >     * numbers of CPUs, and OSes.
345       */
346      private static final int FRONT_SPINS   = 1 << 7;
347  
348      /**
349       * The number of times to spin before blocking when a node is
350 <     * preceded by another node that is apparently spinning.
350 >     * preceded by another node that is apparently spinning.  Also
351 >     * serves as an increment to FRONT_SPINS on phase changes, and as
352 >     * base average frequency for yielding during spins. Must be a
353 >     * power of two.
354       */
355 <    private static final int CHAINED_SPINS = FRONT_SPINS >>> 2;
355 >    private static final int CHAINED_SPINS = FRONT_SPINS >>> 1;
356  
357      /**
358       * Queue nodes. Uses Object, not E, for items to allow forgetting
# Line 469 | Line 502 | public class LinkedTransferQueue<E> exte
502                      if (isData == haveData)   // can't match
503                          break;
504                      if (p.casItem(item, e)) { // match
505 <                        Thread w = p.waiter;
506 <                        while (p != h) {      // update head
507 <                            Node n = p.next;  // by 2 unless singleton
508 <                            if (n != null)
509 <                                p = n;
477 <                            if (head == h && casHead(h, p)) {
505 >                        for (Node q = p; q != h;) {
506 >                            Node n = q.next;  // update head by 2
507 >                            if (n != null)    // unless singleton
508 >                                q = n;
509 >                            if (head == h && casHead(h, q)) {
510                                  h.forgetNext();
511                                  break;
512                              }                 // advance and retry
513                              if ((h = head)   == null ||
514 <                                (p = h.next) == null || !p.isMatched())
514 >                                (q = h.next) == null || !q.isMatched())
515                                  break;        // unless slack < 2
516                          }
517 <                        LockSupport.unpark(w);
517 >                        LockSupport.unpark(p.waiter);
518                          return item;
519                      }
520                  }
# Line 497 | Line 529 | public class LinkedTransferQueue<E> exte
529                  if (pred == null)
530                      continue retry;           // lost race vs opposite mode
531                  if (how >= SYNC)
532 <                    return awaitMatch(pred, s, e, how, nanos);
532 >                    return awaitMatch(s, pred, e, how, nanos);
533              }
534              return e; // not waiting
535          }
# Line 506 | Line 538 | public class LinkedTransferQueue<E> exte
538      /**
539       * Tries to append node s as tail.
540       *
509     * @param haveData true if appending in data mode
541       * @param s the node to append
542 +     * @param haveData true if appending in data mode
543       * @return null on failure due to losing race with append in
544       * different mode, else s's predecessor, or s itself if no
545       * predecessor
546       */
547      private Node tryAppend(Node s, boolean haveData) {
548 <        for (Node t = tail, p = t;;) { // move p to actual tail and append
548 >        for (Node t = tail, p = t;;) { // move p to last node and append
549              Node n, u;                        // temps for reads of next & tail
550              if (p == null && (p = head) == null) {
551                  if (casHead(null, s))
# Line 521 | Line 553 | public class LinkedTransferQueue<E> exte
553              }
554              else if (p.cannotPrecede(haveData))
555                  return null;                  // lost race vs opposite mode
556 <            else if ((n = p.next) != null)    // Not tail; keep traversing
556 >            else if ((n = p.next) != null)    // not last; keep traversing
557                  p = p != t && t != (u = tail) ? (t = u) : // stale tail
558                      (p != n) ? n : null;      // restart if off list
559              else if (!p.casNext(null, s))
560                  p = p.next;                   // re-read on CAS failure
561              else {
562 <                if (p != t) {                 // Update if slack now >= 2
562 >                if (p != t) {                 // update if slack now >= 2
563                      while ((tail != t || !casTail(t, s)) &&
564                             (t = tail)   != null &&
565                             (s = t.next) != null && // advance and retry
# Line 541 | Line 573 | public class LinkedTransferQueue<E> exte
573      /**
574       * Spins/yields/blocks until node s is matched or caller gives up.
575       *
544     * @param pred the predecessor of s or s or null if none
576       * @param s the waiting node
577 +     * @param pred the predecessor of s, or s itself if it has no
578 +     * predecessor, or null if unknown (the null case does not occur
579 +     * in any current calls but may in possible future extensions)
580       * @param e the comparison value for checking match
581       * @param how either SYNC or TIMEOUT
582       * @param nanos timeout value
583       * @return matched item, or e if unmatched on interrupt or timeout
584       */
585 <    private Object awaitMatch(Node pred, Node s, Object e,
585 >    private Object awaitMatch(Node s, Node pred, Object e,
586                                int how, long nanos) {
587          long lastTime = (how == TIMEOUT) ? System.nanoTime() : 0L;
588          Thread w = Thread.currentThread();
# Line 571 | Line 605 | public class LinkedTransferQueue<E> exte
605                  if ((spins = spinsFor(pred, s.isData)) > 0)
606                      randomYields = ThreadLocalRandom.current();
607              }
608 <            else if (spins > 0) {             // spin, occasionally yield
609 <                if (randomYields.nextInt(FRONT_SPINS) == 0)
610 <                    Thread.yield();
611 <                --spins;
608 >            else if (spins > 0) {             // spin
609 >                if (--spins == 0)
610 >                    shortenHeadPath();        // reduce slack before blocking
611 >                else if (randomYields.nextInt(CHAINED_SPINS) == 0)
612 >                    Thread.yield();           // occasionally yield
613              }
614              else if (s.waiter == null) {
615 <                shortenHeadPath();            // reduce slack before blocking
581 <                s.waiter = w;                 // request unpark
615 >                s.waiter = w;                 // request unpark then recheck
616              }
617              else if (how == TIMEOUT) {
618                  long now = System.nanoTime();
# Line 588 | Line 622 | public class LinkedTransferQueue<E> exte
622              }
623              else {
624                  LockSupport.park(this);
625 +                s.waiter = null;
626                  spins = -1;                   // spin if front upon wakeup
627              }
628          }
# Line 599 | Line 634 | public class LinkedTransferQueue<E> exte
634       */
635      private static int spinsFor(Node pred, boolean haveData) {
636          if (MP && pred != null) {
637 <            boolean predData = pred.isData;
638 <            if (predData != haveData)         // front and phase change
639 <                return FRONT_SPINS + (FRONT_SPINS >>> 1);
605 <            if (predData != (pred.item != null)) // probably at front
637 >            if (pred.isData != haveData)      // phase change
638 >                return FRONT_SPINS + CHAINED_SPINS;
639 >            if (pred.isMatched())             // probably at front
640                  return FRONT_SPINS;
641              if (pred.waiter == null)          // pred apparently spinning
642                  return CHAINED_SPINS;
# Line 754 | Line 788 | public class LinkedTransferQueue<E> exte
788          s.forgetContents(); // clear unneeded fields
789          /*
790           * At any given time, exactly one node on list cannot be
791 <         * deleted -- the last inserted node. To accommodate this, if
792 <         * we cannot delete s, we save its predecessor as "cleanMe",
791 >         * unlinked -- the last inserted node. To accommodate this, if
792 >         * we cannot unlink s, we save its predecessor as "cleanMe",
793           * processing the previously saved version first. Because only
794           * one node in the list can have a null next, at least one of
795           * node s or the node previously saved can always be
# Line 1158 | Line 1192 | public class LinkedTransferQueue<E> exte
1192          }
1193      }
1194  
1161
1195      // Unsafe mechanics
1196  
1197      private static final sun.misc.Unsafe UNSAFE = getUnsafe();
# Line 1181 | Line 1214 | public class LinkedTransferQueue<E> exte
1214          }
1215      }
1216  
1217 +    /**
1218 +     * Returns a sun.misc.Unsafe.  Suitable for use in a 3rd party package.
1219 +     * Replace with a simple call to Unsafe.getUnsafe when integrating
1220 +     * into a jdk.
1221 +     *
1222 +     * @return a sun.misc.Unsafe
1223 +     */
1224      private static sun.misc.Unsafe getUnsafe() {
1225          try {
1226              return sun.misc.Unsafe.getUnsafe();

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines