1 |
|
/* |
2 |
|
* Written by Doug Lea with assistance from members of JCP JSR-166 |
3 |
|
* Expert Group and released to the public domain, as explained at |
4 |
< |
* http://creativecommons.org/licenses/publicdomain |
4 |
> |
* http://creativecommons.org/publicdomain/zero/1.0/ |
5 |
|
*/ |
6 |
|
|
7 |
|
package jsr166y; |
151 |
|
* Updates tend not to contend with each other except during |
152 |
|
* bursts while submitted tasks begin or end. In some cases when |
153 |
|
* they do contend, threads can instead do something else |
154 |
< |
* (usually, scan for tesks) until contention subsides. |
154 |
> |
* (usually, scan for tasks) until contention subsides. |
155 |
|
* |
156 |
|
* To enable packing, we restrict maximum parallelism to (1<<15)-1 |
157 |
|
* (which is far in excess of normal operating range) to allow |
195 |
|
* shutdown schemes. |
196 |
|
* |
197 |
|
* Wait Queuing. Unlike HPC work-stealing frameworks, we cannot |
198 |
< |
* let workers spin indefinitely scanning for tasks when none are |
199 |
< |
* can be immediately found, and we cannot start/resume workers |
200 |
< |
* unless there appear to be tasks available. On the other hand, |
201 |
< |
* we must quickly prod them into action when new tasks are |
202 |
< |
* submitted or generated. We park/unpark workers after placing |
203 |
< |
* in an event wait queue when they cannot find work. This "queue" |
204 |
< |
* is actually a simple Treiber stack, headed by the "id" field of |
205 |
< |
* ctl, plus a 15bit counter value to both wake up waiters (by |
206 |
< |
* advancing their count) and avoid ABA effects. Successors are |
207 |
< |
* held in worker field "nextWait". Queuing deals with several |
208 |
< |
* intrinsic races, mainly that a task-producing thread can miss |
209 |
< |
* seeing (and signalling) another thread that gave up looking for |
210 |
< |
* work but has not yet entered the wait queue. We solve this by |
211 |
< |
* requiring a full sweep of all workers both before (in scan()) |
212 |
< |
* and after (in awaitWork()) a newly waiting worker is added to |
213 |
< |
* the wait queue. During a rescan, the worker might release some |
214 |
< |
* other queued worker rather than itself, which has the same net |
215 |
< |
* effect. |
198 |
> |
* let workers spin indefinitely scanning for tasks when none can |
199 |
> |
* be found immediately, and we cannot start/resume workers unless |
200 |
> |
* there appear to be tasks available. On the other hand, we must |
201 |
> |
* quickly prod them into action when new tasks are submitted or |
202 |
> |
* generated. We park/unpark workers after placing in an event |
203 |
> |
* wait queue when they cannot find work. This "queue" is actually |
204 |
> |
* a simple Treiber stack, headed by the "id" field of ctl, plus a |
205 |
> |
* 15bit counter value to both wake up waiters (by advancing their |
206 |
> |
* count) and avoid ABA effects. Successors are held in worker |
207 |
> |
* field "nextWait". Queuing deals with several intrinsic races, |
208 |
> |
* mainly that a task-producing thread can miss seeing (and |
209 |
> |
* signalling) another thread that gave up looking for work but |
210 |
> |
* has not yet entered the wait queue. We solve this by requiring |
211 |
> |
* a full sweep of all workers both before (in scan()) and after |
212 |
> |
* (in tryAwaitWork()) a newly waiting worker is added to the wait |
213 |
> |
* queue. During a rescan, the worker might release some other |
214 |
> |
* queued worker rather than itself, which has the same net |
215 |
> |
* effect. Because enqueued workers may actually be rescanning |
216 |
> |
* rather than waiting, we set and clear the "parked" field of |
217 |
> |
* ForkJoinWorkerThread to reduce unnecessary calls to unpark. |
218 |
> |
* (Use of the parked field requires a secondary recheck to avoid |
219 |
> |
* missed signals.) |
220 |
|
* |
221 |
|
* Signalling. We create or wake up workers only when there |
222 |
|
* appears to be at least one task they might be able to find and |
233 |
|
* Trimming workers. To release resources after periods of lack of |
234 |
|
* use, a worker starting to wait when the pool is quiescent will |
235 |
|
* time out and terminate if the pool has remained quiescent for |
236 |
< |
* SHRINK_RATE nanosecs. |
236 |
> |
* SHRINK_RATE nanosecs. This will slowly propagate, eventually |
237 |
> |
* terminating all workers after long periods of non-use. |
238 |
|
* |
239 |
|
* Submissions. External submissions are maintained in an |
240 |
|
* array-based queue that is structured identically to |
241 |
< |
* ForkJoinWorkerThread queues (which see) except for the use of |
242 |
< |
* submissionLock in method addSubmission. Unlike worker queues, |
243 |
< |
* multiple external threads can add new submissions. |
241 |
> |
* ForkJoinWorkerThread queues except for the use of |
242 |
> |
* submissionLock in method addSubmission. Unlike the case for |
243 |
> |
* worker queues, multiple external threads can add new |
244 |
> |
* submissions, so adding requires a lock. |
245 |
|
* |
246 |
|
* Compensation. Beyond work-stealing support and lifecycle |
247 |
|
* control, the main responsibility of this framework is to take |
278 |
|
* if blocking would leave less than one active (non-waiting, |
279 |
|
* non-blocked) worker. Additionally, to avoid some false alarms |
280 |
|
* due to GC, lagging counters, system activity, etc, compensated |
281 |
< |
* blocking for joins is only attempted after a number of rechecks |
282 |
< |
* proportional to the current apparent deficit (where retries are |
283 |
< |
* interspersed with Thread.yield, for good citizenship). The |
284 |
< |
* variable blockedCount, incremented before blocking and |
285 |
< |
* decremented after, is sometimes needed to distinguish cases of |
286 |
< |
* waiting for work vs blocking on joins or other managed sync, |
287 |
< |
* but both the cases are equivalent for most pool control, so we |
288 |
< |
* can update non-atomically. (Additionally, contention on |
283 |
< |
* blockedCount alleviates some contention on ctl). |
281 |
> |
* blocking for joins is only attempted after rechecks stabilize |
282 |
> |
* (retries are interspersed with Thread.yield, for good |
283 |
> |
* citizenship). The variable blockedCount, incremented before |
284 |
> |
* blocking and decremented after, is sometimes needed to |
285 |
> |
* distinguish cases of waiting for work vs blocking on joins or |
286 |
> |
* other managed sync. Both cases are equivalent for most pool |
287 |
> |
* control, so we can update non-atomically. (Additionally, |
288 |
> |
* contention on blockedCount alleviates some contention on ctl). |
289 |
|
* |
290 |
|
* Shutdown and Termination. A call to shutdownNow atomically sets |
291 |
|
* the ctl stop bit and then (non-atomically) sets each workers |
483 |
|
* negative, there is at least one waiting worker, and when e is |
484 |
|
* negative, the pool is terminating. To deal with these possibly |
485 |
|
* negative fields, we use casts in and out of "short" and/or |
486 |
< |
* signed shifts to maintain signedness. Note: AC_SHIFT is |
482 |
< |
* redundantly declared in ForkJoinWorkerThread in order to |
483 |
< |
* integrate a surplus-threads check. |
486 |
> |
* signed shifts to maintain signedness. |
487 |
|
*/ |
488 |
|
volatile long ctl; |
489 |
|
|
527 |
|
|
528 |
|
/** |
529 |
|
* Index (mod submission queue length) of next element to take |
530 |
< |
* from submission queue. |
530 |
> |
* from submission queue. Usage is identical to that for |
531 |
> |
* per-worker queues -- see ForkJoinWorkerThread internal |
532 |
> |
* documentation. |
533 |
|
*/ |
534 |
|
volatile int queueBase; |
535 |
|
|
536 |
|
/** |
537 |
|
* Index (mod submission queue length) of next element to add |
538 |
< |
* in submission queue. |
538 |
> |
* in submission queue. Usage is identical to that for |
539 |
> |
* per-worker queues -- see ForkJoinWorkerThread internal |
540 |
> |
* documentation. |
541 |
|
*/ |
542 |
|
int queueTop; |
543 |
|
|
575 |
|
private int nextWorkerIndex; |
576 |
|
|
577 |
|
/** |
578 |
< |
* SeqLock and index masking for for updates to workers array. |
579 |
< |
* Locked when SG_UNIT is set. Unlocking clears bit by adding |
578 |
> |
* SeqLock and index masking for updates to workers array. Locked |
579 |
> |
* when SG_UNIT is set. Unlocking clears bit by adding |
580 |
|
* SG_UNIT. Staleness of read-only operations can be checked by |
581 |
|
* comparing scanGuard to value before the reads. The low 16 bits |
582 |
|
* (i.e, anding with SMASK) hold (the smallest power of two |
714 |
|
*/ |
715 |
|
private boolean scan(ForkJoinWorkerThread w, int a) { |
716 |
|
int g = scanGuard; // mask 0 avoids useless scans if only one active |
717 |
< |
int m = parallelism == 1 - a? 0 : g & SMASK; |
717 |
> |
int m = (parallelism == 1 - a && blockedCount == 0) ? 0 : g & SMASK; |
718 |
|
ForkJoinWorkerThread[] ws = workers; |
719 |
|
if (ws == null || ws.length <= m) // staleness check |
720 |
|
return false; |
761 |
|
} |
762 |
|
|
763 |
|
/** |
764 |
< |
* Tries to enqueue worker in wait queue and await change in |
765 |
< |
* worker's eventCount. Before blocking, rescans queues to avoid |
766 |
< |
* missed signals. If the pool is quiescent, possibly terminates |
767 |
< |
* worker upon exit. |
764 |
> |
* Tries to enqueue worker w in wait queue and await change in |
765 |
> |
* worker's eventCount. If the pool is quiescent, possibly |
766 |
> |
* terminates worker upon exit. Otherwise, before blocking, |
767 |
> |
* rescans queues to avoid missed signals. Upon finding work, |
768 |
> |
* releases at least one worker (which may be the current |
769 |
> |
* worker). Rescans restart upon detected staleness or failure to |
770 |
> |
* release due to contention. Note the unusual conventions about |
771 |
> |
* Thread.interrupt here and elsewhere: Because interrupts are |
772 |
> |
* used solely to alert threads to check termination, which is |
773 |
> |
* checked here anyway, we clear status (using Thread.interrupted) |
774 |
> |
* before any call to park, so that park does not immediately |
775 |
> |
* return due to status being set via some other unrelated call to |
776 |
> |
* interrupt in user code. |
777 |
|
* |
778 |
|
* @param w the calling worker |
779 |
|
* @param c the ctl value on entry |
781 |
|
*/ |
782 |
|
private boolean tryAwaitWork(ForkJoinWorkerThread w, long c) { |
783 |
|
int v = w.eventCount; |
784 |
< |
w.nextWait = (int)c; // w's successor record |
784 |
> |
w.nextWait = (int)c; // w's successor record |
785 |
|
long nc = (long)(v & E_MASK) | ((c - AC_UNIT) & (AC_MASK|TC_MASK)); |
786 |
|
if (ctl != c || !UNSAFE.compareAndSwapLong(this, ctlOffset, c, nc)) { |
787 |
< |
long d = ctl; // return true if lost to a deq, to force rescan |
787 |
> |
long d = ctl; // return true if lost to a deq, to force scan |
788 |
|
return (int)d != (int)c && ((d - c) & AC_MASK) >= 0L; |
789 |
|
} |
790 |
< |
boolean rescanned = false; |
791 |
< |
for (int sc;;) { |
790 |
> |
for (int sc = w.stealCount; sc != 0;) { // accumulate stealCount |
791 |
> |
long s = stealCount; |
792 |
> |
if (UNSAFE.compareAndSwapLong(this, stealCountOffset, s, s + sc)) |
793 |
> |
sc = w.stealCount = 0; |
794 |
> |
else if (w.eventCount != v) |
795 |
> |
return true; // update next time |
796 |
> |
} |
797 |
> |
if (parallelism + (int)(nc >> AC_SHIFT) == 0 && |
798 |
> |
blockedCount == 0 && quiescerCount == 0) |
799 |
> |
idleAwaitWork(w, nc, c, v); // quiescent |
800 |
> |
for (boolean rescanned = false;;) { |
801 |
|
if (w.eventCount != v) |
802 |
|
return true; |
803 |
< |
if ((sc = w.stealCount) != 0) { |
779 |
< |
long s = stealCount; // accumulate stealCount |
780 |
< |
if (UNSAFE.compareAndSwapLong(this, stealCountOffset, s, s+sc)) |
781 |
< |
w.stealCount = 0; |
782 |
< |
} |
783 |
< |
else if (!rescanned) { |
803 |
> |
if (!rescanned) { |
804 |
|
int g = scanGuard, m = g & SMASK; |
805 |
|
ForkJoinWorkerThread[] ws = workers; |
806 |
|
if (ws != null && m < ws.length) { |
824 |
|
else |
825 |
|
Thread.interrupted(); // clear before park |
826 |
|
} |
807 |
– |
else if (parallelism + (int)(ctl >> AC_SHIFT) == 0 && |
808 |
– |
blockedCount == 0 && quiescerCount == 0) |
809 |
– |
idleAwaitWork(w, v); // quiescent -- maybe shrink |
827 |
|
else { |
828 |
|
w.parked = true; // must recheck |
829 |
|
if (w.eventCount != v) { |
837 |
|
} |
838 |
|
|
839 |
|
/** |
840 |
< |
* If pool is quiescent, checks for termination, and waits for |
841 |
< |
* event signal for up to SHRINK_RATE nanosecs. On timeout, if ctl |
842 |
< |
* has not changed, terminates the worker. Upon its termination |
843 |
< |
* (see deregisterWorker), it may wake up another worker to |
844 |
< |
* possibly repeat this process. |
840 |
> |
* If inactivating worker w has caused pool to become |
841 |
> |
* quiescent, check for pool termination, and wait for event |
842 |
> |
* for up to SHRINK_RATE nanosecs (rescans are unnecessary in |
843 |
> |
* this case because quiescence reflects consensus about lack |
844 |
> |
* of work). On timeout, if ctl has not changed, terminate the |
845 |
> |
* worker. Upon its termination (see deregisterWorker), it may |
846 |
> |
* wake up another worker to possibly repeat this process. |
847 |
|
* |
848 |
|
* @param w the calling worker |
849 |
< |
* @param v the eventCount w must wait until changed |
850 |
< |
*/ |
851 |
< |
private void idleAwaitWork(ForkJoinWorkerThread w, int v) { |
852 |
< |
ForkJoinTask.helpExpungeStaleExceptions(); // help clean weak refs |
853 |
< |
if (shutdown) |
854 |
< |
tryTerminate(false); |
855 |
< |
long c = ctl; |
856 |
< |
long nc = (((c & (AC_MASK|TC_MASK)) + AC_UNIT) | |
857 |
< |
(long)(w.nextWait & E_MASK)); // ctl value to release w |
858 |
< |
if (w.eventCount == v && |
859 |
< |
parallelism + (int)(c >> AC_SHIFT) == 0 && |
860 |
< |
blockedCount == 0 && quiescerCount == 0) { |
842 |
< |
long startTime = System.nanoTime(); |
843 |
< |
Thread.interrupted(); |
844 |
< |
if (w.eventCount == v) { |
849 |
> |
* @param currentCtl the ctl value after enqueuing w |
850 |
> |
* @param prevCtl the ctl value if w terminated |
851 |
> |
* @param v the eventCount w awaits change |
852 |
> |
*/ |
853 |
> |
private void idleAwaitWork(ForkJoinWorkerThread w, long currentCtl, |
854 |
> |
long prevCtl, int v) { |
855 |
> |
if (w.eventCount == v) { |
856 |
> |
if (shutdown) |
857 |
> |
tryTerminate(false); |
858 |
> |
ForkJoinTask.helpExpungeStaleExceptions(); // help clean weak refs |
859 |
> |
while (ctl == currentCtl) { |
860 |
> |
long startTime = System.nanoTime(); |
861 |
|
w.parked = true; |
862 |
< |
if (w.eventCount == v) |
862 |
> |
if (w.eventCount == v) // must recheck |
863 |
|
LockSupport.parkNanos(this, SHRINK_RATE); |
864 |
|
w.parked = false; |
865 |
< |
if (w.eventCount == v && ctl == c && |
866 |
< |
System.nanoTime() - startTime >= SHRINK_RATE && |
867 |
< |
UNSAFE.compareAndSwapLong(this, ctlOffset, c, nc)) { |
868 |
< |
w.terminate = true; |
869 |
< |
w.eventCount = ((int)c + EC_UNIT) & E_MASK; |
865 |
> |
if (w.eventCount != v) |
866 |
> |
break; |
867 |
> |
else if (System.nanoTime() - startTime < |
868 |
> |
SHRINK_RATE - (SHRINK_RATE / 10)) // timing slop |
869 |
> |
Thread.interrupted(); // spurious wakeup |
870 |
> |
else if (UNSAFE.compareAndSwapLong(this, ctlOffset, |
871 |
> |
currentCtl, prevCtl)) { |
872 |
> |
w.terminate = true; // restore previous |
873 |
> |
w.eventCount = ((int)currentCtl + EC_UNIT) & E_MASK; |
874 |
> |
break; |
875 |
|
} |
876 |
|
} |
877 |
|
} |
907 |
|
|
908 |
|
/** |
909 |
|
* Creates or doubles submissionQueue array. |
910 |
< |
* Basically identical to ForkJoinWorkerThread version |
910 |
> |
* Basically identical to ForkJoinWorkerThread version. |
911 |
|
*/ |
912 |
|
private void growSubmissionQueue() { |
913 |
|
ForkJoinTask<?>[] oldQ = submissionQueue; |
1012 |
|
joinMe.tryAwaitDone(0L); |
1013 |
|
postBlock(); |
1014 |
|
} |
1015 |
< |
if ((ctl & STOP_BIT) != 0L) |
1015 |
> |
else if ((ctl & STOP_BIT) != 0L) |
1016 |
|
joinMe.cancelIgnoringExceptions(); |
1017 |
|
} |
1018 |
|
} |
1147 |
|
ws[k] = w; |
1148 |
|
nextWorkerIndex = k + 1; |
1149 |
|
int m = g & SMASK; |
1150 |
< |
g = k >= m? ((m << 1) + 1) & SMASK : g + (SG_UNIT<<1); |
1150 |
> |
g = k > m? ((m << 1) + 1) & SMASK : g + (SG_UNIT<<1); |
1151 |
|
} |
1152 |
|
} finally { |
1153 |
|
scanGuard = g; |
1227 |
|
if ((int)(c >> AC_SHIFT) != -parallelism) |
1228 |
|
return false; |
1229 |
|
if (!shutdown || blockedCount != 0 || quiescerCount != 0 || |
1230 |
< |
queueTop - queueBase > 0) { |
1230 |
> |
queueBase != queueTop) { |
1231 |
|
if (ctl == c) // staleness check |
1232 |
|
return false; |
1233 |
|
continue; |
1236 |
|
if (UNSAFE.compareAndSwapLong(this, ctlOffset, c, c | STOP_BIT)) |
1237 |
|
startTerminating(); |
1238 |
|
} |
1239 |
< |
if ((short)(c >>> TC_SHIFT) == -parallelism) { |
1240 |
< |
submissionLock.lock(); |
1241 |
< |
termination.signalAll(); |
1242 |
< |
submissionLock.unlock(); |
1239 |
> |
if ((short)(c >>> TC_SHIFT) == -parallelism) { // signal when 0 workers |
1240 |
> |
final ReentrantLock lock = this.submissionLock; |
1241 |
> |
lock.lock(); |
1242 |
> |
try { |
1243 |
> |
termination.signalAll(); |
1244 |
> |
} finally { |
1245 |
> |
lock.unlock(); |
1246 |
> |
} |
1247 |
|
} |
1248 |
|
return true; |
1249 |
|
} |
1250 |
|
|
1251 |
|
/** |
1252 |
|
* Runs up to three passes through workers: (0) Setting |
1253 |
< |
* termination status for each worker, followed by wakeups up |
1254 |
< |
* queued workers (1) helping cancel tasks (2) interrupting |
1253 |
> |
* termination status for each worker, followed by wakeups up to |
1254 |
> |
* queued workers; (1) helping cancel tasks; (2) interrupting |
1255 |
|
* lagging threads (likely in external tasks, but possibly also |
1256 |
|
* blocked in joins). Each pass repeats previous steps because of |
1257 |
|
* potential lagging thread creation. |
1297 |
|
|
1298 |
|
/** |
1299 |
|
* Tries to set the termination status of waiting workers, and |
1300 |
< |
* then wake them up (after which they will terminate). |
1300 |
> |
* then wakes them up (after which they will terminate). |
1301 |
|
*/ |
1302 |
|
private void terminateWaiters() { |
1303 |
|
ForkJoinWorkerThread[] ws = workers; |
1753 |
|
|
1754 |
|
/** |
1755 |
|
* Returns an estimate of the number of tasks submitted to this |
1756 |
< |
* pool that have not yet begun executing. This meThod may take |
1756 |
> |
* pool that have not yet begun executing. This method may take |
1757 |
|
* time proportional to the number of submissions. |
1758 |
|
* |
1759 |
|
* @return the number of queued submissions |
1990 |
|
* {@code isReleasable} must return {@code true} if blocking is |
1991 |
|
* not necessary. Method {@code block} blocks the current thread |
1992 |
|
* if necessary (perhaps internally invoking {@code isReleasable} |
1993 |
< |
* before actually blocking). The unusual methods in this API |
1994 |
< |
* accommodate synchronizers that may, but don't usually, block |
1995 |
< |
* for long periods. Similarly, they allow more efficient internal |
1996 |
< |
* handling of cases in which additional workers may be, but |
1997 |
< |
* usually are not, needed to ensure sufficient parallelism. |
1998 |
< |
* Toward this end, implementations of method {@code isReleasable} |
1999 |
< |
* must be amenable to repeated invocation. |
1993 |
> |
* before actually blocking). These actions are performed by any |
1994 |
> |
* thread invoking {@link ForkJoinPool#managedBlock}. The |
1995 |
> |
* unusual methods in this API accommodate synchronizers that may, |
1996 |
> |
* but don't usually, block for long periods. Similarly, they |
1997 |
> |
* allow more efficient internal handling of cases in which |
1998 |
> |
* additional workers may be, but usually are not, needed to |
1999 |
> |
* ensure sufficient parallelism. Toward this end, |
2000 |
> |
* implementations of method {@code isReleasable} must be amenable |
2001 |
> |
* to repeated invocation. |
2002 |
|
* |
2003 |
|
* <p>For example, here is a ManagedBlocker based on a |
2004 |
|
* ReentrantLock: |