57 |
|
* considered individually, is not wait-free. One thief cannot |
58 |
|
* successfully continue until another in-progress one (or, if |
59 |
|
* previously empty, a push) completes. However, in the |
60 |
< |
* aggregate, we ensure at least probabilistic non-blockingness. If |
61 |
< |
* an attempted steal fails, a thief always chooses a different |
62 |
< |
* random victim target to try next. So, in order for one thief to |
63 |
< |
* progress, it suffices for any in-progress deq or new push on |
64 |
< |
* any empty queue to complete. One reason this works well here is |
65 |
< |
* that apparently-nonempty often means soon-to-be-stealable, |
66 |
< |
* which gives threads a chance to activate if necessary before |
67 |
< |
* stealing (see below). |
60 |
> |
* aggregate, we ensure at least probabilistic |
61 |
> |
* non-blockingness. If an attempted steal fails, a thief always |
62 |
> |
* chooses a different random victim target to try next. So, in |
63 |
> |
* order for one thief to progress, it suffices for any |
64 |
> |
* in-progress deq or new push on any empty queue to complete. One |
65 |
> |
* reason this works well here is that apparently-nonempty often |
66 |
> |
* means soon-to-be-stealable, which gives threads a chance to |
67 |
> |
* activate if necessary before stealing (see below). |
68 |
|
* |
69 |
|
* This approach also enables support for "async mode" where local |
70 |
|
* task processing is in FIFO, not LIFO order; simply by using a |
80 |
|
* protected by volatile base reads, reads of the queue array and |
81 |
|
* its slots do not need volatile load semantics, but writes (in |
82 |
|
* push) require store order and CASes (in pop and deq) require |
83 |
< |
* (volatile) CAS semantics. Since these combinations aren't |
84 |
< |
* supported using ordinary volatiles, the only way to accomplish |
85 |
< |
* these efficiently is to use direct Unsafe calls. (Using external |
83 |
> |
* (volatile) CAS semantics. (See "Idempotent work stealing" by |
84 |
> |
* Michael, Saraswat, and Vechev, PPoPP 2009 |
85 |
> |
* http://portal.acm.org/citation.cfm?id=1504186 for an algorithm |
86 |
> |
* with similar properties, but without support for nulling |
87 |
> |
* slots.) Since these combinations aren't supported using |
88 |
> |
* ordinary volatiles, the only way to accomplish these |
89 |
> |
* efficiently is to use direct Unsafe calls. (Using external |
90 |
|
* AtomicIntegers and AtomicReferenceArrays for the indices and |
91 |
|
* array is significantly slower because of memory locality and |
92 |
< |
* indirection effects.) Further, performance on most platforms is |
93 |
< |
* very sensitive to placement and sizing of the (resizable) queue |
94 |
< |
* array. Even though these queues don't usually become all that |
95 |
< |
* big, the initial size must be large enough to counteract cache |
92 |
> |
* indirection effects.) |
93 |
> |
* |
94 |
> |
* Further, performance on most platforms is very sensitive to |
95 |
> |
* placement and sizing of the (resizable) queue array. Even |
96 |
> |
* though these queues don't usually become all that big, the |
97 |
> |
* initial size must be large enough to counteract cache |
98 |
|
* contention effects across multiple queues (especially in the |
99 |
|
* presence of GC cardmarking). Also, to improve thread-locality, |
100 |
|
* queues are currently initialized immediately after the thread |
111 |
|
* counter (activeCount) held by the pool. It uses an algorithm |
112 |
|
* similar to that in Herlihy and Shavit section 17.6 to cause |
113 |
|
* threads to eventually block when all threads declare they are |
114 |
< |
* inactive. (See variable "scans".) For this to work, threads |
115 |
< |
* must be declared active when executing tasks, and before |
116 |
< |
* stealing a task. They must be inactive before blocking on the |
117 |
< |
* Pool Barrier (awaiting a new submission or other Pool |
118 |
< |
* event). In between, there is some free play which we take |
119 |
< |
* advantage of to avoid contention and rapid flickering of the |
120 |
< |
* global activeCount: If inactive, we activate only if a victim |
121 |
< |
* queue appears to be nonempty (see above). Similarly, a thread |
122 |
< |
* tries to inactivate only after a full scan of other threads. |
123 |
< |
* The net effect is that contention on activeCount is rarely a |
124 |
< |
* measurable performance issue. (There are also a few other cases |
125 |
< |
* where we scan for work rather than retry/block upon |
120 |
< |
* contention.) |
114 |
> |
* inactive. For this to work, threads must be declared active |
115 |
> |
* when executing tasks, and before stealing a task. They must be |
116 |
> |
* inactive before blocking on the Pool Barrier (awaiting a new |
117 |
> |
* submission or other Pool event). In between, there is some free |
118 |
> |
* play which we take advantage of to avoid contention and rapid |
119 |
> |
* flickering of the global activeCount: If inactive, we activate |
120 |
> |
* only if a victim queue appears to be nonempty (see above). |
121 |
> |
* Similarly, a thread tries to inactivate only after a full scan |
122 |
> |
* of other threads. The net effect is that contention on |
123 |
> |
* activeCount is rarely a measurable performance issue. (There |
124 |
> |
* are also a few other cases where we scan for work rather than |
125 |
> |
* retry/block upon contention.) |
126 |
|
* |
127 |
|
* 3. Selection control. We maintain policy of always choosing to |
128 |
|
* run local tasks rather than stealing, and always trying to |