55 |
|
* considered individually, is not wait-free. One thief cannot |
56 |
|
* successfully continue until another in-progress one (or, if |
57 |
|
* previously empty, a push) completes. However, in the |
58 |
< |
* aggregate, we ensure at least probabilistic non-blockingness. If |
59 |
< |
* an attempted steal fails, a thief always chooses a different |
60 |
< |
* random victim target to try next. So, in order for one thief to |
61 |
< |
* progress, it suffices for any in-progress deq or new push on |
62 |
< |
* any empty queue to complete. One reason this works well here is |
63 |
< |
* that apparently-nonempty often means soon-to-be-stealable, |
64 |
< |
* which gives threads a chance to activate if necessary before |
65 |
< |
* stealing (see below). |
58 |
> |
* aggregate, we ensure at least probabilistic |
59 |
> |
* non-blockingness. If an attempted steal fails, a thief always |
60 |
> |
* chooses a different random victim target to try next. So, in |
61 |
> |
* order for one thief to progress, it suffices for any |
62 |
> |
* in-progress deq or new push on any empty queue to complete. One |
63 |
> |
* reason this works well here is that apparently-nonempty often |
64 |
> |
* means soon-to-be-stealable, which gives threads a chance to |
65 |
> |
* activate if necessary before stealing (see below). |
66 |
|
* |
67 |
|
* This approach also enables support for "async mode" where local |
68 |
|
* task processing is in FIFO, not LIFO order; simply by using a |
78 |
|
* protected by volatile base reads, reads of the queue array and |
79 |
|
* its slots do not need volatile load semantics, but writes (in |
80 |
|
* push) require store order and CASes (in pop and deq) require |
81 |
< |
* (volatile) CAS semantics. Since these combinations aren't |
82 |
< |
* supported using ordinary volatiles, the only way to accomplish |
83 |
< |
* these efficiently is to use direct Unsafe calls. (Using external |
81 |
> |
* (volatile) CAS semantics. (See "Idempotent work stealing" by |
82 |
> |
* Michael, Saraswat, and Vechev, PPoPP 2009 |
83 |
> |
* http://portal.acm.org/citation.cfm?id=1504186 for an algorithm |
84 |
> |
* with similar properties, but without support for nulling |
85 |
> |
* slots.) Since these combinations aren't supported using |
86 |
> |
* ordinary volatiles, the only way to accomplish these |
87 |
> |
* efficiently is to use direct Unsafe calls. (Using external |
88 |
|
* AtomicIntegers and AtomicReferenceArrays for the indices and |
89 |
|
* array is significantly slower because of memory locality and |
90 |
< |
* indirection effects.) Further, performance on most platforms is |
91 |
< |
* very sensitive to placement and sizing of the (resizable) queue |
92 |
< |
* array. Even though these queues don't usually become all that |
93 |
< |
* big, the initial size must be large enough to counteract cache |
90 |
> |
* indirection effects.) |
91 |
> |
* |
92 |
> |
* Further, performance on most platforms is very sensitive to |
93 |
> |
* placement and sizing of the (resizable) queue array. Even |
94 |
> |
* though these queues don't usually become all that big, the |
95 |
> |
* initial size must be large enough to counteract cache |
96 |
|
* contention effects across multiple queues (especially in the |
97 |
|
* presence of GC cardmarking). Also, to improve thread-locality, |
98 |
|
* queues are currently initialized immediately after the thread |
109 |
|
* counter (activeCount) held by the pool. It uses an algorithm |
110 |
|
* similar to that in Herlihy and Shavit section 17.6 to cause |
111 |
|
* threads to eventually block when all threads declare they are |
112 |
< |
* inactive. (See variable "scans".) For this to work, threads |
113 |
< |
* must be declared active when executing tasks, and before |
114 |
< |
* stealing a task. They must be inactive before blocking on the |
115 |
< |
* Pool Barrier (awaiting a new submission or other Pool |
116 |
< |
* event). In between, there is some free play which we take |
117 |
< |
* advantage of to avoid contention and rapid flickering of the |
118 |
< |
* global activeCount: If inactive, we activate only if a victim |
119 |
< |
* queue appears to be nonempty (see above). Similarly, a thread |
120 |
< |
* tries to inactivate only after a full scan of other threads. |
121 |
< |
* The net effect is that contention on activeCount is rarely a |
122 |
< |
* measurable performance issue. (There are also a few other cases |
123 |
< |
* where we scan for work rather than retry/block upon |
118 |
< |
* contention.) |
112 |
> |
* inactive. For this to work, threads must be declared active |
113 |
> |
* when executing tasks, and before stealing a task. They must be |
114 |
> |
* inactive before blocking on the Pool Barrier (awaiting a new |
115 |
> |
* submission or other Pool event). In between, there is some free |
116 |
> |
* play which we take advantage of to avoid contention and rapid |
117 |
> |
* flickering of the global activeCount: If inactive, we activate |
118 |
> |
* only if a victim queue appears to be nonempty (see above). |
119 |
> |
* Similarly, a thread tries to inactivate only after a full scan |
120 |
> |
* of other threads. The net effect is that contention on |
121 |
> |
* activeCount is rarely a measurable performance issue. (There |
122 |
> |
* are also a few other cases where we scan for work rather than |
123 |
> |
* retry/block upon contention.) |
124 |
|
* |
125 |
|
* 3. Selection control. We maintain policy of always choosing to |
126 |
|
* run local tasks rather than stealing, and always trying to |
386 |
|
*/ |
387 |
|
protected void onTermination(Throwable exception) { |
388 |
|
// Execute remaining local tasks unless aborting or terminating |
389 |
< |
while (exception == null && !pool.isTerminating() && base != sp) { |
389 |
> |
while (exception == null && pool.isProcessingTasks() && base != sp) { |
390 |
|
try { |
391 |
|
ForkJoinTask<?> t = popTask(); |
392 |
|
if (t != null) |