737 |
|
* function of number of idle workers. |
738 |
|
*/ |
739 |
|
final int getEstimatedSurplusTaskCount() { |
740 |
< |
// The halving approximates weighting idle vs non-idle workers |
741 |
< |
return (sp - base) - (pool.getIdleThreadCount() >>> 1); |
740 |
> |
/* |
741 |
> |
* The goal here is to provide a very cheap heuristic guide |
742 |
> |
* for task partitioning when programmers, frameworks, tools, |
743 |
> |
* or languages have little or no idea about task granularity. |
744 |
> |
* In essence by offering this method, we ask users only about |
745 |
> |
* tradeoffs in overhead vs expected throughput and its |
746 |
> |
* variance, rather than how finely to partition tasks. |
747 |
> |
* |
748 |
> |
* In a steady state strict (tree-structured) computation, |
749 |
> |
* each thread makes available for stealing enough tasks for |
750 |
> |
* other threads to remain active. Inductively, if all threads |
751 |
> |
* play by the same rules, each thread should make available |
752 |
> |
* only a constant number of tasks. |
753 |
> |
* |
754 |
> |
* The minimum useful constant is just 1. But using a value of |
755 |
> |
* 1 would require immediate replenishment upon each steal to |
756 |
> |
* maintain enough tasks, which is infeasible. Further, |
757 |
> |
* partitionings/granularities of offered tasks should |
758 |
> |
* minimize steal rates, which in general means that threads |
759 |
> |
* nearer the top of computation tree should generate more |
760 |
> |
* than those nearer the bottom. In perfect steady state, each |
761 |
> |
* thread is at approximately the same level of computation |
762 |
> |
* tree. However, producing extra tasks amortizes the |
763 |
> |
* uncertainty of progress and diffusion assumptions. |
764 |
> |
* |
765 |
> |
* So, users will want to use values larger, but not much |
766 |
> |
* larger than 1 to both smooth over transient shortages and |
767 |
> |
* hedge against uneven progress; as traded off against the |
768 |
> |
* cost of extra task overhead. We leave the user to pick a |
769 |
> |
* threshold value to compare with the results of this call to |
770 |
> |
* guide decisions, but recommend values such as 3. |
771 |
> |
* |
772 |
> |
* When all threads are active, it is on average OK to |
773 |
> |
* estimate surplus strictly locally. In steady-state, if one |
774 |
> |
* thread is maintaining say 2 surplus tasks, then so are |
775 |
> |
* others. So we can just use estimated queue length (although |
776 |
> |
* note that (sp - base) can be an overestimate because of |
777 |
> |
* stealers lagging increments of base). |
778 |
> |
* |
779 |
> |
* However, this strategy alone leads to serious mis-estimates |
780 |
> |
* in some non-steady-state conditions (ramp-up, ramp-down, |
781 |
> |
* other stalls). We can detect many of these by further |
782 |
> |
* considering the number of "idle" threads, that are known to |
783 |
> |
* have zero queued tasks. A straight compensation would lead |
784 |
> |
* to weighting of the queued task estimate by a function of |
785 |
> |
* the proportion of idle threads. However, we don't want to |
786 |
> |
* waste much calculation for the sake of weightings that only |
787 |
> |
* apply transiently, so cheapen this by (a) not bothering to |
788 |
> |
* weight at all unless there is more than one queued task (b) |
789 |
> |
* rather than compensating by a factor of (#idle/#active) |
790 |
> |
* threads, we just substract out a function of #idle that is |
791 |
> |
* a good enough approximation for conditions near the |
792 |
> |
* borderlines for threshold testing. This errs in the |
793 |
> |
* direction of reporting more extreme lack of surplus (as in |
794 |
> |
* returning negative values) in cases where users should |
795 |
> |
* almost surely be generating tasks anyway. |
796 |
> |
*/ |
797 |
> |
int n = sp - base; |
798 |
> |
return n > 1? n - (pool.getIdleThreadCount() >>> 2) : n; |
799 |
|
} |
800 |
|
|
801 |
|
/** |