ViewVC Help
View File | Revision Log | Show Annotations | Download File | Root Listing
root/jsr166/jsr166/src/jsr166y/ForkJoinWorkerThread.java
(Generate patch)

Comparing jsr166/src/jsr166y/ForkJoinWorkerThread.java (file contents):
Revision 1.27 by dl, Mon Aug 3 13:01:15 2009 UTC vs.
Revision 1.28 by dl, Mon Aug 3 13:40:07 2009 UTC

# Line 57 | Line 57 | public class ForkJoinWorkerThread extend
57       * considered individually, is not wait-free. One thief cannot
58       * successfully continue until another in-progress one (or, if
59       * previously empty, a push) completes.  However, in the
60 <     * aggregate, we ensure at least probabilistic non-blockingness. If
61 <     * an attempted steal fails, a thief always chooses a different
62 <     * random victim target to try next. So, in order for one thief to
63 <     * progress, it suffices for any in-progress deq or new push on
64 <     * any empty queue to complete. One reason this works well here is
65 <     * that apparently-nonempty often means soon-to-be-stealable,
66 <     * which gives threads a chance to activate if necessary before
67 <     * stealing (see below).
60 >     * aggregate, we ensure at least probabilistic
61 >     * non-blockingness. If an attempted steal fails, a thief always
62 >     * chooses a different random victim target to try next. So, in
63 >     * order for one thief to progress, it suffices for any
64 >     * in-progress deq or new push on any empty queue to complete. One
65 >     * reason this works well here is that apparently-nonempty often
66 >     * means soon-to-be-stealable, which gives threads a chance to
67 >     * activate if necessary before stealing (see below).
68       *
69       * This approach also enables support for "async mode" where local
70       * task processing is in FIFO, not LIFO order; simply by using a
# Line 80 | Line 80 | public class ForkJoinWorkerThread extend
80       * protected by volatile base reads, reads of the queue array and
81       * its slots do not need volatile load semantics, but writes (in
82       * push) require store order and CASes (in pop and deq) require
83 <     * (volatile) CAS semantics. Since these combinations aren't
84 <     * supported using ordinary volatiles, the only way to accomplish
85 <     * these efficiently is to use direct Unsafe calls. (Using external
83 >     * (volatile) CAS semantics.  (See "Idempotent work stealing" by
84 >     * Michael, Saraswat, and Vechev, PPoPP 2009
85 >     * http://portal.acm.org/citation.cfm?id=1504186 for an algorithm
86 >     * with similar properties, but without support for nulling
87 >     * slots.)  Since these combinations aren't supported using
88 >     * ordinary volatiles, the only way to accomplish these
89 >     * efficiently is to use direct Unsafe calls. (Using external
90       * AtomicIntegers and AtomicReferenceArrays for the indices and
91       * array is significantly slower because of memory locality and
92 <     * indirection effects.) Further, performance on most platforms is
93 <     * very sensitive to placement and sizing of the (resizable) queue
94 <     * array.  Even though these queues don't usually become all that
95 <     * big, the initial size must be large enough to counteract cache
92 >     * indirection effects.)
93 >     *
94 >     * Further, performance on most platforms is very sensitive to
95 >     * placement and sizing of the (resizable) queue array.  Even
96 >     * though these queues don't usually become all that big, the
97 >     * initial size must be large enough to counteract cache
98       * contention effects across multiple queues (especially in the
99       * presence of GC cardmarking). Also, to improve thread-locality,
100       * queues are currently initialized immediately after the thread
# Line 105 | Line 111 | public class ForkJoinWorkerThread extend
111       * counter (activeCount) held by the pool. It uses an algorithm
112       * similar to that in Herlihy and Shavit section 17.6 to cause
113       * threads to eventually block when all threads declare they are
114 <     * inactive. (See variable "scans".)  For this to work, threads
115 <     * must be declared active when executing tasks, and before
116 <     * stealing a task. They must be inactive before blocking on the
117 <     * Pool Barrier (awaiting a new submission or other Pool
118 <     * event). In between, there is some free play which we take
119 <     * advantage of to avoid contention and rapid flickering of the
120 <     * global activeCount: If inactive, we activate only if a victim
121 <     * queue appears to be nonempty (see above).  Similarly, a thread
122 <     * tries to inactivate only after a full scan of other threads.
123 <     * The net effect is that contention on activeCount is rarely a
124 <     * measurable performance issue. (There are also a few other cases
125 <     * where we scan for work rather than retry/block upon
120 <     * contention.)
114 >     * inactive. For this to work, threads must be declared active
115 >     * when executing tasks, and before stealing a task. They must be
116 >     * inactive before blocking on the Pool Barrier (awaiting a new
117 >     * submission or other Pool event). In between, there is some free
118 >     * play which we take advantage of to avoid contention and rapid
119 >     * flickering of the global activeCount: If inactive, we activate
120 >     * only if a victim queue appears to be nonempty (see above).
121 >     * Similarly, a thread tries to inactivate only after a full scan
122 >     * of other threads.  The net effect is that contention on
123 >     * activeCount is rarely a measurable performance issue. (There
124 >     * are also a few other cases where we scan for work rather than
125 >     * retry/block upon contention.)
126       *
127       * 3. Selection control. We maintain policy of always choosing to
128       * run local tasks rather than stealing, and always trying to

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines