ViewVC Help
View File | Revision Log | Show Annotations | Download File | Root Listing
root/jsr166/jsr166/src/jsr166e/StripedAdder.java
(Generate patch)

Comparing jsr166/src/jsr166e/StripedAdder.java (file contents):
Revision 1.1 by dl, Wed Jul 20 15:00:56 2011 UTC vs.
Revision 1.3 by dl, Fri Jul 22 13:25:12 2011 UTC

# Line 16 | Line 16 | import java.io.ObjectOutputStream;
16  
17   /**
18   * A set of variables that together maintain a sum.  When updates
19 < * (method {@link #add}) are contended across threads, the set of
20 < * adders may grow to reduce contention. Method {@link #sum} returns
21 < * the current combined total across these adders. This value is
22 < * <em>NOT</em> an atomic snapshot (concurrent updates may occur while
23 < * the sum is being calculated), and so cannot be used alone for
24 < * fine-grained synchronization control.
25 < *
19 > * (method {@link #add}) are contended across threads, this set of
20 > * adder variables may grow dynamically to reduce contention. Method
21 > * {@link #sum} returns the current combined total across these
22 > * adders. This value is <em>NOT</em> an atomic snapshot (concurrent
23 > * updates may occur while the sum is being calculated), and so cannot
24 > * be used alone for fine-grained synchronization control.
25 > *
26   * <p> This class may be applicable when many threads frequently
27   * update a common sum that is used for purposes such as collecting
28   * statistics. In this case, performance may be significantly faster
# Line 31 | Line 31 | import java.io.ObjectOutputStream;
31   * only one thread can ever update the sum, performance may be
32   * significantly slower than just updating a local variable.
33   *
34 < * @author Doug Lea
34 > * <p>A StripedAdder may optionally be constructed with a given
35 > * expected contention level; i.e., the number of threads that are
36 > * expected to concurrently update the sum. Supplying an accurate
37 > * value may improve performance by reducing the need for dynamic
38 > * adjustment.
39 > *
40 > * @author Doug Lea
41   */
42   public class StripedAdder implements Serializable {
43      private static final long serialVersionUID = 7249069246863182397L;
44  
45      /*
46 <     * Overview: We maintain a table of AtomicLongs (padded to reduce
47 <     * false sharing). The table is indexed by per-thread hash codes
48 <     * that are initialized as random values.  The table doubles in
49 <     * size upon contention (as indicated by failed CASes when
50 <     * performing add()), but is capped at the nearest power of two >=
51 <     * #cpus: At that point, contention should be infrequent if each
52 <     * thread has a unique index; so we instead adjust hash codes to
53 <     * new random values upon contention rather than expanding. A
54 <     * single spinlock is used for resizing the table as well as
46 >     * Overview: We maintain a table of Atomic long variables. The
47 >     * table is indexed by per-thread hash codes that are initialized
48 >     * to random values.
49 >     *
50 >     * The table doubles in size upon contention (as indicated by
51 >     * failed CASes when performing add()), but is capped at the
52 >     * nearest power of two >= #CPUS. This reflects the idea that,
53 >     * when there are more threads than CPUs, then if each thread were
54 >     * bound to a CPU, there would exist a perfect hash function
55 >     * mapping threads to slots that eliminates collisions. When we
56 >     * reach capacity, we search for this mapping by randomly varying
57 >     * the hash codes of colliding threads.  Because search is random,
58 >     * and failures only become known via CAS failures, convergence
59 >     * will be slow, and because threads are typically not bound to
60 >     * CPUS forever, may not occur at all. However, despite these
61 >     * limitations, observed contention is typically very low in these
62 >     * cases.
63 >     *
64 >     * Table entries are of class Adder; a form of AtomicLong padded
65 >     * to reduce cache contention on most processors. Padding is
66 >     * overkill for most Atomics because they are most often
67 >     * irregularly scattered in memory and thus don't interfere much
68 >     * with each other. But Atomic objects residing in arrays will
69 >     * tend to be placed adjacent to each other, and so will most
70 >     * often share cache lines without this precaution.  Except for
71 >     * slot adders[0], Adders are constructed upon first use, which
72 >     * further improves per-thread locality and helps reduce (an
73 >     * already large) footprint.
74 >     *
75 >     * A single spinlock is used for resizing the table as well as
76       * populating slots with new Adders. Upon lock contention, threads
77 <     * just try other slots rather than blocking. We guarantee that at
78 <     * least one slot exists, so retries will eventually find a
79 <     * candidate Adder.
77 >     * try other slots rather than blocking. We guarantee that at
78 >     * least one slot (0) exists, so retries will eventually find a
79 >     * candidate Adder. During these retries, there is increased
80 >     * contention and reduced locality, which is still better than
81 >     * alternatives.
82       */
83  
84 <    /**
84 >    /**
85       * Number of processors, to place a cap on table growth.
86       */
87      static final int NCPU = Runtime.getRuntime().availableProcessors();
88  
89      /**
90 <     * Version of AtomicLong padded to avoid sharing cache
62 <     * lines on most processors
90 >     * Padded version of AtomicLong
91       */
92      static final class Adder extends AtomicLong {
93          long p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pa, pb, pc, pd;
94          Adder(long x) { super(x); }
95      }
96  
97 <    /**
98 <     * Holder for the thread-local hash code.
97 >    /**
98 >     * Holder for the thread-local hash code. The code starts off with
99 >     * a given random value, but may be set to a different
100 >     * pseudo-random value (using a cheaper but adequate xorshift
101 >     * generator) upon collisions.
102       */
103      static final class HashCode {
104          int code;
# Line 79 | Line 110 | public class StripedAdder implements Ser
110       */
111      static final class ThreadHashCode extends ThreadLocal<HashCode> {
112          static final Random rng = new Random();
113 <        public HashCode initialValue() {
113 >        public HashCode initialValue() {
114              int h = rng.nextInt();
115              return new HashCode((h == 0) ? 1 : h); // ensure nonzero
116          }
# Line 93 | Line 124 | public class StripedAdder implements Ser
124      static final ThreadHashCode threadHashCode = new ThreadHashCode();
125  
126      /**
127 <     * Table of adders. Initially of size 2; grows to be at most NCPU.
127 >     * Table of adders. Minimum size 2. Size grows to be at most NCPU.
128       */
129      private transient volatile Adder[] adders;
130  
# Line 105 | Line 136 | public class StripedAdder implements Ser
136      private final AtomicInteger mutex;
137  
138      /**
139 <     * Marsaglia XorShift for rehashing on collisions
139 >     * Marsaglia XorShift random generator for rehashing on collisions
140       */
141 <    private static int xorShift(int r) {
141 >    private static int xorShift(int r) {
142          r ^= r << 13;
143          r ^= r >>> 17;
144          return r ^ (r << 5);
145      }
146  
147      /**
148 <     * Creates a new adder with initially zero sum.
148 >     * Creates a new adder with zero sum.
149       */
150      public StripedAdder() {
151 <        Adder[] as = new Adder[2];
151 >        this(2);
152 >    }
153 >
154 >    /**
155 >     * Creates a new adder with zero sum, and with stripes presized
156 >     * for the given expected contention level.
157 >     *
158 >     * @param expectedContention the expected number of threads that
159 >     * will concurrently update the sum.
160 >     */
161 >    public StripedAdder(int expectedContention) {
162 >        int cap = (expectedContention < NCPU) ? expectedContention : NCPU;
163 >        int size = 2;
164 >        while (size < cap)
165 >            size <<= 1;
166 >        Adder[] as = new Adder[size];
167          as[0] = new Adder(0); // ensure at least one available adder
168          this.adders = as;
169          this.mutex = new AtomicInteger();
# Line 161 | Line 207 | public class StripedAdder implements Ser
207                      mutex.set(0);
208                  }
209                  if (created) {
210 <                    hc.code = h;                // Use this adder next time
210 >                    hc.code = h;                 // Use this adder next time
211                      break;
212                  }
213              }
# Line 172 | Line 218 | public class StripedAdder implements Ser
218       * Returns an estimate of the current sum.  The result is
219       * calculated by summing multiple variables, so may not be
220       * accurate if updates occur concurrently with this method.
221 <     *
221 >     *
222       * @return the estimated sum
223       */
224      public long sum() {

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines