--- jsr166/src/jsr166e/StripedAdder.java 2011/07/20 16:06:19 1.2 +++ jsr166/src/jsr166e/StripedAdder.java 2011/07/26 17:16:36 1.6 @@ -16,20 +16,26 @@ import java.io.ObjectOutputStream; /** * A set of variables that together maintain a sum. When updates - * (method {@link #add}) are contended across threads, the set of - * adders may grow to reduce contention. Method {@link #sum} returns - * the current combined total across these adders. This value is - * NOT an atomic snapshot (concurrent updates may occur while - * the sum is being calculated), and so cannot be used alone for - * fine-grained synchronization control. + * (method {@link #add}) are contended across threads, this set of + * adder variables may grow dynamically to reduce contention. Method + * {@link #sum} returns the current combined total across these + * adders. This value is NOT an atomic snapshot (concurrent + * updates may occur while the sum is being calculated), and so cannot + * be used alone for fine-grained synchronization control. * *
This class may be applicable when many threads frequently * update a common sum that is used for purposes such as collecting * statistics. In this case, performance may be significantly faster * than using a shared {@link AtomicLong}, at the expense of using - * significantly more space. On the other hand, if it is known that - * only one thread can ever update the sum, performance may be - * significantly slower than just updating a local variable. + * much more space. On the other hand, if it is known that only one + * thread can ever update the sum, performance may be significantly + * slower than just updating a local variable. + * + *
A StripedAdder may optionally be constructed with a given
+ * expected contention level; i.e., the number of threads that are
+ * expected to concurrently update the sum. Supplying an accurate
+ * value may improve performance by reducing the need for dynamic
+ * adjustment.
*
* @author Doug Lea
*/
@@ -37,88 +43,139 @@ public class StripedAdder implements Ser
private static final long serialVersionUID = 7249069246863182397L;
/*
- * Overview: We maintain a table of AtomicLongs (padded to reduce
- * false sharing). The table is indexed by per-thread hash codes
- * that are initialized as random values. The table doubles in
- * size upon contention (as indicated by failed CASes when
- * performing add()), but is capped at the nearest power of two >=
- * #cpus: At that point, contention should be infrequent if each
- * thread has a unique index; so we instead adjust hash codes to
- * new random values upon contention rather than expanding. A
- * single spinlock is used for resizing the table as well as
+ * A StripedAdder maintains a table of Atomic long variables. The
+ * table is indexed by per-thread hash codes.
+ *
+ * By default, the table is lazily initialized, to minimize
+ * footprint until adders are used. On first use, the table is set
+ * to size DEFAULT_INITIAL_SIZE (currently 8). Table size is
+ * bounded by the number of CPUS (if larger than the default
+ * size).
+ *
+ * Per-thread hash codes are initialized to random values.
+ * Collisions are indicated by failed CASes when performing an add
+ * operation (see method retryAdd). Upon a collision, if the table
+ * size is less than the capacity, it is doubled in size unless
+ * some other thread holds lock. If a hashed slot is empty, and
+ * lock is available, a new Adder is created. Otherwise, if the
+ * slot exists, a CAS is tried. Retries proceed by "double
+ * hashing", using a secondary hash (Marsaglia XorShift) to try to
+ * find a free slot.
+ *
+ * The table size is capped because, when there are more threads
+ * than CPUs, supposing that each thread were bound to a CPU,
+ * there would exist a perfect hash function mapping threads to
+ * slots that eliminates collisions. When we reach capacity, we
+ * search for this mapping by randomly varying the hash codes of
+ * colliding threads. Because search is random, and failures only
+ * become known via CAS failures, convergence will be slow, and
+ * because threads are typically not bound to CPUS forever, may
+ * not occur at all. However, despite these limitations, observed
+ * contention is typically low in these cases.
+ *
+ * Table entries are of class Adder; a form of AtomicLong padded
+ * to reduce cache contention on most processors. Padding is
+ * overkill for most Atomics because they are usually irregularly
+ * scattered in memory and thus don't interfere much with each
+ * other. But Atomic objects residing in arrays will tend to be
+ * placed adjacent to each other, and so will most often share
+ * cache lines without this precaution. Adders are by default
+ * constructed upon first use, which further improves per-thread
+ * locality and helps reduce footprint.
+ *
+ * A single spinlock is used for resizing the table as well as
* populating slots with new Adders. Upon lock contention, threads
- * just try other slots rather than blocking. We guarantee that at
+ * try other slots rather than blocking. After initialization, at
* least one slot exists, so retries will eventually find a
- * candidate Adder.
+ * candidate Adder. During these retries, there is increased
+ * contention and reduced locality, which is still better than
+ * alternatives.
*/
/**
- * Number of processors, to place a cap on table growth.
- */
- static final int NCPU = Runtime.getRuntime().availableProcessors();
-
- /**
- * Version of AtomicLong padded to avoid sharing cache
- * lines on most processors
+ * Padded version of AtomicLong
*/
static final class Adder extends AtomicLong {
- long p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pa, pb, pc, pd;
+ long p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pa, pb, pc, pd, pe;
Adder(long x) { super(x); }
}
+ private static final int NCPU = Runtime.getRuntime().availableProcessors();
+
/**
- * Holder for the thread-local hash code.
+ * Table bounds. DEFAULT_INITIAL_SIZE is the table size set upon
+ * first use under default constructor, and must be a power of
+ * two. There is not much point in making size a lot smaller than
+ * that of Adders though. CAP is the maximum allowed table size.
+ */
+ private static final int DEFAULT_INITIAL_SIZE = 8;
+ private static final int CAP = Math.max(NCPU, DEFAULT_INITIAL_SIZE);
+
+ /**
+ * Holder for the thread-local hash code. The code is initially
+ * random, but may be set to a different value upon collisions.
*/
static final class HashCode {
+ static final Random rng = new Random();
int code;
- HashCode(int h) { code = h; }
+ HashCode() {
+ int h = rng.nextInt();
+ code = (h == 0) ? 1 : h; // ensure nonzero
+ }
}
/**
* The corresponding ThreadLocal class
*/
static final class ThreadHashCode extends ThreadLocal