Concurrent Programming in Java
© 1996-1999 Doug Lea

Cancellation


This set of excerpts from section 3.1 includes the main discussions of cancellation techniques that are further exemplified throughout the book.

When activities in one thread fail or change course, it may be necessary or desirable to cancel activities in other threads, regardless of what they are doing. Cancellation requests introduce inherently unforeseeable failure conditions for running threads. The asynchronous nature of cancellation leads to design tactics reminiscent of those in distributed systems where failures may occur at any time due to crashes and disconnections. Concurrent programs have the additional obligation to ensure consistent states of internal objects participating in other threads.

Cancellation is a natural occurrence in most multithreaded programs, seen in:

Footnote: The two-l spelling of cancellation seems to be most common in concurrent programming.

Interruption

The best-supported techniques for approaching cancellation rely on per-thread interruption status that is set by method Thread.interrupt, inspected by Thread.isInterrupted, cleared (and inspected) by Thread.interrupted, and sometimes responded to by throwing InterruptedException.

Footnote: Interruption facilities were not supported in JDK 1.0. Changes in policies and mechanisms across releases account for some of the irregularities in cancellation support.

Thread interrupts serve as requests that activities be cancelled. Nothing stops anyone from using interrupts for other purposes, but this is the intended convention. Interrupt-based cancellation relies on a protocol between cancellers and cancellees to ensure that objects that might be used across multiple threads do not become damaged when cancelled threads terminate. Most (ideally all) classes in the java.* packages conform to this protocol.

In almost all circumstances, cancelling the activity associated with a thread should cause the thread to terminate. But there is nothing about interrupt that forces immediate termination. This gives any interrupted thread a chance to clean up before dying, but also imposes obligations for code to check interruption status and take appropriate action on a timely basis.

This ability to postpone or even ignore cancellation requests provides a mechanism for writing code that is both very responsive and very robust. Lack of interruption may be used as a precondition checked at safe points before doing anything that would be difficult or impossible to undo later. The range of available responses includes most of the options discussed in §3.1.1:

Continuation (ignoring or clearing interruptions) may apply to threads that are intended not to terminate; for example, those that perform database management services essential to a program's basic functionality. Upon interrupt, the particular task being performed by the thread can be aborted, allowing the thread to continue to process other tasks. However, even here, it can be more manageable instead to replace the thread with a fresh one starting off in a known good initial state.

Abrupt termination (for example throwing Error) generally applies to threads that provide isolated services that do not require any cleanup beyond that provided in a finally clause at the base of a run method. However, when threads are performing services relied on by other threads (see §4.3), they should also somehow alert them or set status indicators. (Exceptions themselves are not automatically propagated across threads.)

Rollback or roll-forward techniques must be applied in threads using objects that are also relied on by other threads.

You can control how responsive your code is to interrupts in part by deciding how often to check status via Thread.currentThread().isInterrupted(). Checks need not occur especially frequently to be effective. For example, if it takes on the order of 10,000 instructions to perform all the actions associated with the cancellation and you check for cancellation about every 10,000 instructions, then on average, it would take 15,000 instructions total from cancellation request to shutdown. So long as it is not actually dangerous to continue activities, this order of magnitude suffices for the majority of applications. Typically, such reasoning leads you to place interrupt-detection code at only at those program points where it is both most convenient and most important to check cancellation. In performance-critical applications, it may be worthwhile to construct analytic models or collect empirical measurements to determine more accurately the best trade-offs between responsiveness and throughput (see also §4.4.1.7).

Checks for interruption are performed automatically within Object.wait Thread.join, Thread.sleep, and their derivatives. These methods abort upon interrupt by throwing InterruptedException, allowing threads to wake up and apply cancellation code.

By convention, interruption status is cleared when InterruptedException is thrown. This is sometimes necessary to support clean-up efforts, but it can also be the source of error and confusion. When you need to propagate interruption status after handling an InterruptedException, you must either rethrow the exception or reset the status via Thread.currentThread().interrupt(). If code in threads you create calls other code that does not properly preserve interruption status (for example, ignoring InterruptedException without resetting status), you may be able to circumvent problems by maintaining a field that remembers cancellation status, setting it whenever calling interrupt and checking it upon return from these problematic calls.

There are two situations in which threads remain dormant without being able to check interruption status or receive InterruptedException: blocking on synchronized locks and on IO. Threads do not respond to interrupts while waiting for a lock used in a synchronized method or block. However, as discussed in §2.5, lock utility classes can be used when you need to drastically reduce the possibility of getting stuck waiting for locks during cancellation. Code using lock classes dormantly blocks only to access the lock objects themselves, but not the code they protect. These blockages are intrinsically very brief (although times cannot be strictly guaranteed).

IO and resource revocation

Some IO support classes (notably java.net.Socket and related classes) provide optional means to time out on blocked reads, in which case you can check for interruption on time-out.

An alternative approach is adopted in other java.io classes - a particular form of resource revocation. If one thread performs s.close() on an IO object (for example, an InputStream) s, then any other thread attempting to use s (for example, s.read()) will receive an IOException. Revocation affects all threads using the closed IO objects and causes the IO objects to be unusable. If necessary, new IO objects can be created to replace them.

This ties in well with other uses of resource revocation (for example, for security purposes). The policy also protects applications from having a possibly shared IO object automatically rendered unusable by the act of cancelling only one of the threads using it. Most classes in java.io do not, and cannot, clean-fail upon IO exceptions. For example, if a low-level IO exception occurs in the midst of a StreamTokenizer or ObjectInputStream operation, there is no sensible recovery action that will preserve the intended guarantees. So, as a matter of policy, JVMs do not automatically interrupt IO operations.

This imposes an additional obligation on code dealing with cancellation. If a thread may be performing IO, any attempt to cancel it in the midst of IO operations must be aware of the IO object being used and must be willing to close the IO object. If this is acceptable, you may instigate cancellation by both closing the IO object and interrupting the thread. For example:

class CancellableReader {                        // Incomplete
  private Thread readerThread; // only one at a time supported
  private FileInputStream dataFile;

  public synchronized void startReaderThread() 
   throws IllegalStateException, FileNotFoundException {
    if (readerThread != null) throw new IllegalStateException();
    dataFile = new FileInputStream("data");
    readerThread = new Thread(new Runnable() {
      public void run() { doRead(); }
    });
    readerThread.start();
  }

  protected synchronized void closeFile() { // utility method
    if (dataFile != null) {
      try { dataFile.close(); } 
      catch (IOException ignore) {}
      dataFile = null;
    }
  }

  protected void doRead() {
    try {
      while (!Thread.interrupted()) {
        try {
          int c = dataFile.read();
          if (c == -1) break;
          else process(c);
        }
        catch (IOException ex) {
          break; // perhaps first do other cleanup
        }
      }
    }
    finally {
      closeFile();
      synchronized(this) { readerThread = null; }
    }
  }

  public synchronized void cancelReaderThread() {
    if (readerThread != null) readerThread.interrupt();
    closeFile();
  }
}
  
Most other cases of cancelled IO arise from the need to interrupt threads waiting for input that you somehow know will not arrive, or will not arrive in time to do anything about. With most socket-based streams, you can manage this by setting socket time-out parameters. With others, you can rely on InputStream.available, and hand-craft your own timed polling loop to avoid blocking in IO during a time-out (see §4.1.5). These constructions can use a timed back-off retry protocol similar to the one described in §3.1.1.5. For example:
class ReaderWithTimeout {                // Generic code sketch
  // ...
  void attemptRead(InputStream stream, long timeout) throws... {
    long startTime = System.currentTimeMillis();
    try {
      for (;;) {
        if (stream.available() > 0) {
          int c = stream.read();
          if (c != -1) process(c);
          else break; // eof
        }
        else {
          try {
            Thread.sleep(100); // arbitrary fixed back-off time
          }
          catch (InterruptedException ie) {
            /* ... quietly wrap up and return ... */ 
          }
          long now = System.currentTimeMillis();
          if (now - startTime >= timeout) {
            /* ... fail ...*/
          }
        }
      }
    }
    catch (IOException ex) { /* ... fail ... */ }
  }
}
  
Footnote: Some JDK releases also supported InterruptedIOException, but it was only partially implemented, and only on some platforms. As of this writing, future releases are projected to discontinue support, due in part to its undesirable consequence of rendering IO objects unusable. But since InterruptedIOException was defined as a subclass of IOException, the constructions here work approximately as described on releases that include InterruptedIOException support, although with an additional uncertainty: An interrupt may show up as either an InterruptedIOException or InterruptedException. One partial solution is to catch InterruptedIOException and then rethrow it as InterruptedException.

Asynchronous termination

The stop method was originally included in class Thread, but its use has since been deprecated. Thread.stop causes a thread to abruptly throw a ThreadDeath exception regardless of what it is doing. (Like interrupt, stop does not abort waits for locks or IO. But, unlike interrupt, it is not strictly guaranteed to abort wait, sleep, or join.)

This can be an arbitrarily dangerous operation. Because Thread.stop generates asynchronous signals, activities can be terminated while they are in the midst of operations or code segments that absolutely must roll back or roll forward for the sake of program safety and object consistency. For a bare generic example, consider:

class C {                                         // Fragments
  private int v;  // invariant: v >= 0

  synchronized void f() {
    v = -1  ;   // temporarily set to illegal value as flag
    compute();  // possible stop point (*)
    v = 1;      // set to legal value
  }

  synchronized void g() { 
    while (v != 0) { 
      --v; 
      something(); 
    } 
  }
} 
  
If a Thread.stop happens to cause termination at line (*), then the object will be broken: Upon thread termination, it will remain in an inconsistent state because variable v is set to an illegal value. Any calls on the object from other threads might make it perform undesired or dangerous actions. For example, here the loop in method g will spin 2*Integer.MAX_VALUE times as v wraps around the negatives.

The use of stop makes it extremely difficult to apply rollback or roll-forward recovery techniques. At first glance, this problem might not seem so serious - after all, any uncaught exception thrown by the call to compute would also corrupt state. However, the effects of Thread.stop are more insidious since there is nothing you can do in these methods that would eliminate the ThreadDeath exception (thrown by Thread.stop) while still propagating cancellation requests. Further, unless you place a catch(ThreadDeath) after every line of code, you cannot reconstruct the current object state precisely enough to recover, and so you may encounter undetected corruption. In contrast, you can usually bullet-proof code to eliminate or deal with other kinds of run-time exceptions without such heroic efforts.

In other words, the reason for deprecating Thread.stop was not to fix its faulty logic, but to correct for misjudgments about its utility. It is humanly impossible to write all methods in ways that allow a cancellation exception to occur at every bytecode. (This fact is well known to developers of low-level operating system code. Programming even those few, very short routines that must be asynch-cancel-safe can be a major undertaking.)

Note that any executing method is allowed to catch and then ignore the ThreadDeath exception thrown by stop. Thus, stop is no more guaranteed to terminate a thread than is interrupt, it is merely more dangerous. Any use of stop implicitly reflects an assessment that the potential damage of attempting to abruptly terminate an activity is less than the potential damage of not doing so.

Resource control

Cancellation may play a part in the design of any system that loads and executes foreign code. Attempts to cancel code that does not conform to standard protocols face a difficult problem. The code may just ignore all interrupts, and even catch and discard ThreadDeath exceptions, in which case invocations of Thread.interrupt and Thread.stop will have no effect.

You cannot control exactly what foreign code does or how long it does it. But you can and should apply standard security measures to limit undesirable effects. One approach is to create and use a SecurityManager and related classes that deny all checked resource requests when a thread has run too long. (Details go beyond the scope of this book; see Further Readings.) This form of resource denial, in conjunction with resource revocation strategies discussed in §3.1.2.2 can together prevent foreign code from taking any actions that might otherwise contend for resources with other threads that should continue. As a byproduct, these measures often eventually cause threads to fail due to exceptions.

Additionally, you can minimize contention for CPU resources by invoking setPriority(Thread.MIN_PRIORITY) for a thread. A SecurityManager may be used to prevent the thread from re-raising its priority.

Multiphase cancellation

Sometimes, even ordinary code must be cancelled with more extreme prejudice than you would ordinarily like. To deal with such possibilities, you can set up a generic multiphase cancellation facility that tries to cancel tasks in the least disruptive manner possible and, if they do not terminate soon, tries a more disruptive technique.

Multiphase cancellation is a pattern seen at the process level in most operating systems. For example, it is used in Unix shutdowns, which first try to terminate tasks using kill -1, followed if necessary by kill -9. An analogous strategy is used by the task managers in most window systems.

Here is a sketch of sample version. (More details on the use of Thread.join seen here may be found in §4.3.2.)

class Terminator {

  // Try to kill; return true if known to be dead

  static boolean terminate(Thread t, long maxWaitToDie) { 

    if (!t.isAlive()) return true;  // already dead

    // phase 1 -- graceful cancellation

    t.interrupt();       
    try { t.join(maxWaitToDie); } 
    catch(InterruptedException e){} //  ignore 

    if (!t.isAlive()) return true;  // success

    // phase 2 -- trap all security checks

    theSecurityMgr.denyAllChecksFor(t); // a made-up method
    try { t.join(maxWaitToDie); } 
    catch(InterruptedException ex) {} 

    if (!t.isAlive()) return true; 

    // phase 3 -- minimize damage

    t.setPriority(Thread.MIN_PRIORITY);
    return false;
  }

} 
  
Notice here that the terminate method itself ignores interrupts. This reflects the policy choice that cancellation attempts must continue once they have begun. Cancelling a cancellation otherwise invites problems in dealing with code that has already started termination-related cleanup.

Because of variations in the behavior of Thread.isAlive on different JVM implementations (see §1.1.2), it is possible for this method to return true before all traces of the killed thread have disappeared.


Doug Lea
Last modified: Mon Oct 18 06:43:04 EDT 1999