Chapter 22: Coordination | |
Coordinator objects introduced in Chapter 18, as well as other composite objects, may not be in ``full control'' of their operation participants. This complicates translation of guards and effects into concrete form. Generally, any guard evaluating conditions on objects accessed via nonexclusively managed links must ensure that the referenced objects do not change state between the test and the beginning of the operation. Similarly, the coordinator must assure that actions across all participants remain atomic with respect to external observers. Thus, the coordinator must transiently act as an exclusive controller of these objects. A number of approaches are available for carrying this out. To describe them, we must first address the nature of the problems they are intended to solve.
An intrinsic danger of compositional design is that since we are always building objects by linking together other objects, it is possible to lose track of where any link may actually lead. This creates the potential for interference.
Interference occurs when two competing operations or sets of operations are applied to the same object. Generally, any concrete operation that contains the possibility of interference cannot always make good on its abstract effects, and is potentially incorrect. There are two possible results of interference:
While these two effects can normally be traded off for one another, neither is the least bit desirable. In both cases, the source of interference may be far removed from sight. Interference may occur when two clients (perhaps oblivious to each other's existence) both try to manipulate the same object. For example, two different Transaction objects, both holding a link to the same Account, may both be trying to step the account through mutually incompatible operations.
In our design framework, no two objects may interfere with each other if they are interested in a single operation in a single object. Our noninterruptibility rules prevent contention. Sadly enough, these provisions cannot automatically extend to transactions in which a sequence of operations on possibly many objects must be triggered in a certain way and processed in a certain order, without interleaving any other stray operations. While this is most obviously problematic in systems of concurrently interacting objects, it is just as disastrous in purely sequential designs in which subtasks interfere with the main tasks they are allegedly supporting.
Interference is not restricted to situations involving multiple controlling objects. Local interference occurs when a single object ``inadvertently'' tries to use another (perhaps even itself!) in two different roles. This aliasing results when two different locally accessible links are attached to the same object. The most obvious forms of aliasing are similar to those found in any system involving references, pointers, or any other link-like construct. For example, in the matrix multiplication routine:
op matMul(lft: Matrix, rgt: Matrix, result: Matrix);
It may be the case that two or more of lft, rgt, and result are actually connected to the same matrix object. If the usual matrix multiplication algorithm is applied, and result is bound to the same matrix as either of the sources, the procedure will produce incorrect results.
Aliasing is at least in part a human factors issue. It is just too easy to overlook aliasing possibilities. Often enough, when potential problems are identified, solutions are not hard to come by. Unfortunately, some object-oriented design constructions tend to hide aliasing opportunities. Potential aliasing is hard to discern when arguments or components are of different declared type, but could match the same object. For example, suppose we had:
class Account ... end class SavingsAccount is Account ... end class CheckingAccount is Account ... end class BahamaAccount is SavingsAccount, CheckingAccount ... end op transfer(src: SavingsAccount, dest: CheckingAccount);
If the operations on src and dest within transfer interfere with each other, then it would be a very bad idea to invoke transfer(x,x), for some x:BahamaAccount. Of course, it may be the case that transfer will simply leave x in its original state, in which case all would be well. But unless you are alerted to the possibility, you are unlikely to notice that this might be a problem. A similar situation involves aliasing between an argument and a component. For example:
class TransactionLog ... end class SystemLog is TransactionLog ... end class AccountLog is TransactionLog ... end
class Account ... logger: AccountLog; op update(... syslog: TransactionLog) { ...; logger.recordBalanceChange; syslog.recordTransaction } end
If the update operation were invoked with an argument that happened to match an account's own logger, there may be two records in the same log, which is probably not what was intended.
Yet another way to mask aliasing opportunities is through relays and other objects that are used as ``pass by role'' intermediaries. For example, if a dependency manager maintains a set of objects interested in particular change notices, a design may inadvertently result in an object being notified of its own changes. Again, maybe this is OK; maybe not.
In standard OO systems, any operation has the power to detect simple aliasing (just through a link identity test) and then perform evasive action.
For example, a matrix multiply routine may invoke a slow-but-safe multiply In Place procedure when it receives aliased arguments. This idea may be generalized to arbitrary projections. Any two-parameter operation, say, p(a,b), may delegate to a special form p1(ab) when a is known to be the same as b. Three-parameter operations (e.g., q(a,b,c)) may delegate to two- and one-parameter versions (e.g., q1(a,bc), q2(ab,c), q3(ac,b), q4(abc)). This is a bit tedious. But it is also a useful efficiency measure. Actions for aliased cases are often either significantly faster or significantly slower than others. By splitting them out, the best method may be applied to the case at hand.
However, this strategy works only when distinct objects are entirely
independent and self-contained, thus sharing no internal links. This
is typical, but not definitionally forced for common matrix classes.
For example, if one of the arguments to matMul were some kind of
wrapper or view of another (e.g., if lft were of a class that
held a link to rgt and forwarded all operations to it), then the
same problems result even though lft ~=
rgt.
There are ways of specifying classes to (declaratively) preclude aliasing. For matrices, this would first require a set of guarantees stating that no two cells in a single matrix were aliased. For example, a Matrix class (or subclass) might include a function to determine whether the identities of two cells are the same:
fn eqCell(i: int, j: int, ii: int, jj: int): bool = (at(i, j) = at(ii, jj)) = ((i = ii) /\ (j = jj))
This version says that the identities of two cells are the same only if the indices are. Another function allIndependent, which loops through all possible indices and checks eqCell, may then be declared as inv. A relational class containing two or three matrices may then rely on these invariants to construct additional functions advertising independence between the cells of different matrices. The complexity of such declarations corresponds to the extensiveness of independence guarantees demanded by standard implementations of matMul.
Without such assurance, alias checks via identity tests are only definitive when they report true. In that case, the two links really are connected to the same object. In all other cases, lack of top-level identity need not imply that two objects are independent and cannot always be used to prevent interference.
There cannot be a magical cure for interference. The potential for interference is intrinsic to any system of agents communicating through sharable channels. In special cases, exhaustive static formal analysis of a design might prove it interference-free across all possible executions. However, this is both exceedingly rare and difficult to undertake. Instead, a mixture of prevention, detection, and control must be used. Strategies include centralization, export control, sender rules, sender-receiver protocols, and recovery techniques. We have discussed techniques for centralization (unique objects) and controlling exports (by protecting links, making copies and views, and using collections that never reveal identities of their members) in previous chapters. However, there are limits to these safeguards. It is impossible to remove all link exports without removing all potential for interaction. We address the other methods in turn.
As a design rule, all senders may promise to access target objects only if they are sure that the targets are under their (temporary) exclusive control. This ensures that all operations on the target objects are atomic and interference-free with respect to all other observers.
Clients may obtain and advertise control by holding locks. Clients use lockable objects by first locking, then operating, then releasing. If another client has control over the target, others will wait (blocking on lock) until they are done, assuming that all senders play by these rules. After obtaining a lock, one object may issue messages to another with the same confidence about noninterference as does the holder of an unexported own link. Objects must release locks immediately when they are no longer needed. Lockable forms of any class are easy to define:
class Lockable locked: bool; op lock: () when ~locked then locked' else pend end op release: () ==> ~locked' end end class LockableX is Lockable, X end
There is often no reason for each object to possess its own lock. The repositories described in Chapter 18 may manage locks for all of their members. Traversable forms may also support collection-wide locking. A single centralized lock manager may even be employed:
class LockMgr locked(x: Any): bool; op lock(x: Any): () when ~locked(x) then locked(x)' else pend end op release(x: Any): () ==> ~locked(x)' end end
Note that lock operations are simple renamings of Semaphore operations. Locks may be implemented as views of semaphores. The safest locking protocol is two phase, in which all locks for all objects involved in an operation are obtained before any are released (see, e.g., [12] for details).
Locking frameworks are not necessarily effective as defined thus far. Suppose a Square (as in Chapter 17) were of a type that contained exported (shared) links to the inner Point objects. A lock on the square would not gain control over the component points. Interference would still be possible. For this reason, any lock-based framework must also include nested locking protocols that are dependent on the linkage details of classes. For example:
class LockableSquare is Square ... locals lowerLeft: LockablePoint; upperRight: LockablePoint; end locked: bool; op lock: () when ~locked then lowerLeft.lock',upperRight.lock',locked' else pend end op release:() ==> ~lowerLeft.locked',~upperRight.locked',~locked' end end
Nested locking requires care. Objects reachable via more than one path should not receive multiple lock requests.
Regular locks are excessive when clients merely want to inspect their targets (i.e., invoke fns), not try to change them (i.e., invoke ops). Any number of ``readers'' may coexist without contention or interference. Bottlenecks may be alleviated by differentiating between read locks used only for inspections (e.g., in guard evaluation), and write locks used in any actions sending state-changing messages. Any number of readers may simultaneously hold read locks, but only one may hold a write lock. Care is needed to implement fair access to both read and write locks (see, e.g., [5] for details). One basic form is as follows:
class LockMgr2 ... readLocks(x: Any): int; writeLocked(x: Any): bool; op rLock(x: Any): () when ~writeLocked(x) then readLocks(x)' = readLocks(x) + 1 else pend end op rRelease(x: Any): () ==> readLocks(x)' = readLocks(x) - 1 end op wLock(x: Any): () when readLocks(x) = 0 /\ ~writeLocked(x) then writeLocked(x)' else pend end op wRelease(x: Any): () ==> ~writeLocked(x)' end end
Rather than depending on senders to obey locking protocols, receivers may willingly ``enslave'' themselves to selected ``masters''. Objects may keep track of their masters and listen only to them. For example:
class SlavePoint ... locals master: Any; p: Point; end op shiftX(v: real, sender: Any) when sender = master then p.shiftX(v)' else % error % end end
Note that SlavePoint is a view of a Point, not a subclass.
The shiftX operation must be supplied a sender argument that must match the master. This may be generalized to support access control lists that maintain a set of privileged senders. Of course, this design does not in itself provide full security. Without system-level support for this protocol, other objects may be able to send master as an argument and thus obtain access.
Unbalanced protection techniques placing nearly all responsibility on either senders or receivers can be fragile. For example, locking depends heavily on the correctness of all message senders in properly obtaining and releasing locks. These protocols may be made more secure by associating keys with locks. The receivers should also know the keys and check them. All other operations can be reparameterized to require keys. For example, using simple locks and integer-based keys generated by some KeyMgr:
class LockablePoint3 ... locals currentKey: int; p: Point; end locked: bool; op lock(key: int): () when ~locked then locked', currentKey' = key elsewhen key = currentKey then % empty % else pend end op release(key: int): () when key = currentKey then ~locked' else % error % end op shiftX(v: real, key: int): () when key = currentKey then p.shiftX(v)' else % error % end end
Keys help solve the nested locking problem. The empty effect associated with the condition for a lock request with the same key as already being used allows objects that are reachable from multiple paths to receive multiple lock requests without causing lock-up. Keys thus serve the role of ``visit markers'' necessary in any linked graph traversal algorithm (see, e.g., [2,40]).
The use of keys alone is not foolproof. A client may tell a third object about a key, allowing the other object to obtain incorrect access. Inappropriate key distribution may be controlled by wrapping view-based classes around the keys themselves. An extreme tactic is to create one-shot key objects that fail to work after a single use. For example, assuming simple integer keys where zero is never a legal key value:
class OneShotKey own k: Int <> own used: Bool <> init ~used? op key: int { if ~used? then used.t!; reply k? else reply 0 end } end
Lock operations may then use such objects rather than raw keys. There are many useful variations and improvements. These include support for a fixed number of uses (rather than one), means for requesting additional use once a key has elapsed, sharing keys, ``aging'' keys on the basis of time rather than use, and operations allowing receivers to revoke locks when necessary. It is, of course, a very bad idea to define a clone operation for smart keys. Construction must be managed carefully.
There is no conceptual difference between keys, pseudo-IDs, and ordinary links. All are employed to obtain access to particular objects. An entire system based on smart links may impose arbitrary security and locking mechanisms ``behind the scenes'' of normal design efforts. Given the complexity and fragility of access and control mechanisms, this strategy is very attractive for large development efforts.
Simpler, special-purpose versions of smart keys and links include a set of policies and techniques called islands [24]. These partially automate alias-free import/export protocols. The idea is for objects to support a sendAndNullOutL operation for each link L that is used when sending components as arguments to operations that form the ``bridge'' to a set (``island'') of alias-propagating objects. The sendAndNullOutL operation sends out the current identity, but then unbinds the link so that it may no longer be used by the sender object. It may later be rebound, typically after receiving a message from the original recipient that it may do so. A similar transfer operation is supported in Hermes [42] .
Interference control, access control, and locking mechanisms can become arbitrarily complex. The strategies we have listed in this section are geared more toward accident prevention than protection against malicious invasion. The more hostile the environment and the more critical the consequences of interference, the more elaborate and ``heavyweight'' need be the protocols. These may include combinations of authentication, multilevel access, certification, encryption, and randomization mechanisms, as provided by system tools and services (see, e.g., [39,43,14]).
Unless proven otherwise, no system is free from safety and/or liveness failures stemming from interference. There is hardly ever a real choice to be had between designing for safety versus liveness. Safety failures normally lead to undetectable corruption, but liveness failures lead to deadlock, which is at least in principle detectable, and thus recoverable from.
Under any of the listed safety measures, potentially interfering operations may cause each other to block forever. Even simple aliasing may suffice to break things:
op rotate2(x: LockableSquare, y: LockableSquare) { lock(x); lock(y); doRotation(x, y); release(x); release(y) } op user(a: LockableSquare) { rotate2(a, a) }
Here, the second call to lock on a will block forever. Even an aliasing check of `` x=y'' in rotate2 would not necessarily repair this if the top-level objects were different but contained shared components with simple nested locks.
Any design using widespread locking should include provisions for dynamically detecting and dealing with deadlock conditions. Some algorithms for doing so are described in [15,3]. However, deadlock detection is often supported in a much simpler fashion. Objects requesting locks may use time-outs (Chapter 20). They treat the time-out replies as indications of deadlock and initiate the associated recovery measures. This is itself risky, since time-outs may occur for other reasons. But the use of conservative recovery strategies means that false alarms do no more than slow systems down, not cause yet other errors.
The most defensible recovery protocol is one in which all modifications to all targets are either committed to in the normal case, or aborted if deadlock or other failures occur, where aborts leave all objects in their initial states, available for later retry. In-between results are not allowed.
These mechanics also apply in many other contexts, including cases in which safety failures happen to be detectable before they do any permanent harm, machines crash, access rights are denied, or software errors are encountered.
In simple cases where actions do not depend on each other, this may be accomplished by storing up all mutative operations (perhaps via a wrapper queue), and then committing by executing the queued operations, or aborting by clearing the queue. The queued operations may be rearranged and optimized before execution. Another set of tractable cases involves operations that have unique ``undo'' or ``antimessage'' counterparts that reverse the effects of operations. On failure, a sequence of undo requests may be sent to the participants.
More general transaction control mechanisms save the states of participants before an operation begins and ``roll back'' objects to these saved states on failure. One way to control such mechanics is to support operations in a mixin class such as:
class TransactionParticipant ... op beginTransaction(t: TransactionID); op commitTransaction(t: TransactionID); op abortTransaction(t: TransactionID); end;
However, many situations requiring locking and recovery also involve interactions with database managers or other services that provide particular transaction protocols of their own that must be accommodated. Indeed, for large system designs, it is all but impossible to maintain scattershot selections of methods. Commitment to particular tools, services, conventions, and protocols becomes necessary.
Many transaction support packages exist, but only a few are designed specifically for OO systems. One example is Kala [41] . Kala provides several OO transaction support mechanisms among its other services. Kala deals well with the fact that an object's logical state may be distributed among many other objects connected by links. As we have seen, this fact complicates copying, alias detection, and locking strategies that might be needed in transaction support. Kala addresses this through a mechanism in which each state (or version) of an object may be independently saved without at the same time always recursively saving all linked objects. However, user (programmer) level links are actually ``baskets'' of lower level links, normally directed at the current versions. These connections can be changed upon transaction commits, rollbacks, and partial rollbacks stemming from failures of nested subtasks.
We may finally address the main issue of converting guarded multiparticipant transition specifications into concrete form. Joint actions are operations in which state changes in one or more objects lead to coordinated effects in one or more other participants. All transitions involving references to external participants are joint actions, whether OOA models describe them as such or not. In particular, operations within relation-based coordinator objects are often of this form. These normally require transformations that provide ``handshaking'' to control interference.
We will first describe dynamic coordination using an example that is ``all control''. Analysis models and abstract classes may describe effects that occur ``automatically'', whenever one or more participating objects are in the proper state. For example, assume an OOA transition that says that a transfer should be started whenever a checking account is overdrawn, a savings account is underdrawn, and the customer has requested that the checking account should be automatically transferred. (This is an illustrative variation of the overdraft protection service described in Chapter 10.) We may represent this initially in a form that is concrete in actions but not in synchronization control:
class TransferMgr locals ch: fixed Checking; sv: fixed Savings; cmr: fixed Customer; amt: fixed Cash; op doTransfer { sv.withdraw(amt); ch.deposit(amt); ... } end op transfer when ch.overdrawn /\ sv.canWithdraw(amt) /\ rqd(cmr, ch) then { doTransfer; self.transfer } else pend end end op mkTransfer(x: Checking, y: Savings, z: Customer, a: Cash) { t := new TransferMgr(ch := x, sv := y, cmr := z, amt := a); t.transfer; }
For the moment, we have arranged that the transfer operation be ``always requested'' by artificially generating operation events and self-propagating them along.
The primary design issue here is that transfer cannot be concretely defined by sequentially testing ch .overdrawn,
then sv .canWithdraw(amt), then rqd( cmr, ch), and then executing doTransfer if all conditions hold. Assuming that none of these objects are exclusively managed by TransferMgr, all may change state between test time and trigger time.
A second, closely related issue stems from the fact that OOA models require all effects produced within action specifications to be logically atomic. The Transfer Mgr :: doTransfer operation must ensure that both balances are properly updated before returning. Neither participant may engage in any other activities until this action is complete. Both participants must be under the control of the TransferMgr for the duration of the operation.
Unless it can somehow be proven that interference is impossible, dynamic control methods are required to address these problems. Beyond this, a strategy is needed for transforming logical guards into computations. The main alternatives are polling and notifications. Polling methods are generally poorer, but still sometimes useful.
In a polling approach, the coordinator for a joint action repeatedly asks participant objects about their states, and then performs the indicated actions if all conditions hold. The self-propagation strategy used at the abstract level may be replaced with a simple loop. Locking is needed in order to freeze participants during testing and/or to control them exclusively during actions. This is facilitated by the use of separate read and write locks.
For example, assuming that the doTransfer operation does not send any messages to the Customer but only reads its status while evaluating rqd in the guard, it may be read locked, while the others are write (or read/write) locked:
op transfer { while true do ch.wLock; sv.wLock; cmr.rLock; if ch.overdrawn /\ sv.canWithdraw(amt) /\ rqd(cmr, ch) then cmr.rRelease; doTransfer else cmr.rRelease end; ch.wRelease; sv.wRelease end }
Underneath the lock control, this is just a variant of the original specification.
Polling need not require interference protocols if the guards and operations are somehow known not to interfere with other processing. Polling itself can be limited to those intervals when this is known to be true. For example, assuming that no transactions are made at night, a nightly transaction logger that only recorded differences over a 24-hour period could maintain a list of all accounts and their balances, recheck them once per night to discover which ones changed, and write log files accordingly.
Polling techniques may test many times for proper conditions before firing. While the number of polling iterations may be bounded in special cases (see [21]), polling is usually both inefficient and unreliable.
Notification-based strategies instead use more efficient event-driven processing. In the case of guards for joint actions, the participants themselves may notify the coordinator that an action may need to be invoked. Notification-based techniques are also more reliable. Assuming queuing, the coordinator can never ``miss'' an opportunity to trigger an operation.
When participant objects change state in ways that might trigger a joint action, they may send a notification to the coordinator, while perhaps also locking themselves in preparation for possible control. (A usually better option is to wait for the coordinator to issue the lock via a callback sequence.) When there are multiple participants, notification from any one of them can trigger a lock-and-check sequence for the others. For example, using keyed locks:
class LockableChecking is Checking ... locals mgr: TransferMgr; keyMgr: KeyMgr; end op withdraw(...) { ... if overdrawn then key := keyMgr.nextKey; self.lock(key); mgr.checkingOvd(key); end } end class TransferMgrV2 is TransferMgr ... op checkingOvd(k: Key) { sv.lock(k); cmr.lock(k); if sv.canWithdraw(amt) /\ rqd(cmr, ch) then doTransfer end; ch.release(k); sv.release(k); cmr.release(k) } op savingsOver; % similar op customerApproved; % similar end
The main disadvantage of notification methods is that without planning, they can involve a fair amount of mangling of existing classes in order to insert the right notifications at the right times. To avoid this, classes may be designed to support notifications on any attribute change, as described in Chapter 17. The specific notifications may be added later, even dynamically during execution.
Notification techniques open up a large and varied design space. In the remainder of this section, we survey some common variants.
As with polling, notification techniques do not always require interference control measures. In fact, unless an object must enter a controlled transition or interaction sequence as a result of the state change, it is fine to use simple one-way message passing for notification purposes.
Some triggering conditions are ``persistent''. Once true, they remain true forever, or at least over the lifetimes of the relevant objects. To stretch an example, suppose that once a customer approved automatic transfer services, they were irrevocable. In this case, only a single notification need be sent to the TransferMgr. It would never need to be checked again. However, even this is wasteful. It would be simpler to construct the TransferMgr itself only when the customer requested the service.
Objects that check and respond to single persistent conditions are sometimes termed watches and event monitors. The most common persistent conditions are time-based. Because time increases monotonically, any guard depending on it being after a certain time or after a certain event will stay true persistently.
When persistent conditions hold on a certain periodic basis, daemons may be constructed to manage the corresponding actions. For example, the following daemon could be used to implement the Automatic Payment Service described in Chapter 10:
class PeriodicAction period: TimePeriod; action: Wrapper; end
class TimeTriggerManager local s: TABLE[PeriodicAction]; local t: fixed Timer; op alarm(id: int): () { if s.has(id) then s.at(id).action; timer.replyAfter(s.at(id).period, WRAP(alarm(id))) end } op put(c: PeriodicAction): int { local job: int := s.put(c); timer.replyAfter(s.at(job).period, WRAP(alarm(job))); reply job } op remove(id: int) { s.remove(id) } end
Some joint action specifications require an action whenever an object is changed in any way. These changes are difficult to sense by other objects. Particular attribute values are not important, only the fact that they have been changed. For example:
class Shape ... viewer: Viewer; op setX(newX: real) { if xCoord ~= newX then ... viewer.redraw; ... end } end
A form of this tactic may be used to translate the sv.canWithdraw(amt) condition in TransferMgr into notifications. Because the savings account does not know the transfer amount, it should simply notify the TransferMgr when its balance changes. The manager may then perform the complete test.
Some joint action effects and/or relational constraints are amenable to faster processing than is possible with generic propagation techniques. For example, if a set of Shape objects must always bear a certain geometric relation to each other (e.g., must be spaced uniformly within some region), then changes in any one of them may trigger a special-purpose constraint handler (e.g., a quadratic equation solver; see [20]) that simultaneously changes attributes in all of the constituent objects.
A single class may be used to mediate notification events among many different sets of objects. Dependency-based designs may be better decoupled and organized through relay classes similar to those described in Chapter 21. These maintain sets of connections between objects and mediate their communication. For example:
class ViewerRelay ... m: MAP[Shape, Viewer]; op register(s: Shape, v: Viewer) { m.put(s, v) } op changeNotice(s: Shape) { if m.has(s) then m.at(s).redraw end } end
Relays are exceedingly common and useful designs for coping with input events that may affect varied and changing audiences. Here, change sources need not know the exact identities of their audiences, as long as they know of the appropriate relay object. This technique may be extended to support ``pass by interest'' protocols in which senders describe the characteristics of audiences and/or the nature of the state change. The relay then determines the best recipient for the message at hand. It is also possible to add more intelligence to relays in order to actively mediate, rather than blindly forward events.
As should already be obvious, we do not predefine a true broadcast primitive in ODL. We instead adopt the more common (and much better supported) view that object-oriented message passing is intrinsically point to point, although often mediated through dispatching and routing. However, the same strategies used for notification relays may be adapted readily to obtain the effects of uncoordinated broadcast. A relay object can register a set of objects that might be interested in receiving some notice and then generate multiple propagation messages. For example:
class Rcvr op receive(m: Message); end class Broadcaster members: SET[Rcvr]; op bcast(m: Message); % relay msg to all members op attach(r: Rcvr); % add r to rcvrs op detach(r: Rcvr); % delete r from rcvrs end
This may be further refined and extended. For example, the object may offer a filtering service that allows members to receive only those kinds of messages for which they express interest.
An effective merger of polling and notification strategies is to externalize queuing to form a standard buffered producer-consumer design. Notifier (producer) objects send messages to a common queue serving as a ``blackboard''. The coordinator (consumer) object repeatedly takes these messages and performs the associated actions. In the simplest case, the consumer may just take the form:
op mainLoop { while true do msg := blackboard.take; perform(msg); end }
The blackboard may actually be split into several queues, perhaps even one per message type. This makes it easier for consumers to control the kinds of messages they are waiting for. Many such designs are discussed by Carriero and Galerntner [11]. These are readily amenable for expression in an object-oriented framework.
The coordination of joint actions does not always require explicit coordinator objects. Each of the participating objects may include protocols for dealing directly with the others. For example, the notification-based version of TransferMgr design could be further transformed to allow each of the participants to step the others through a transaction under appropriate conditions. The transformations are similar to those seen for double-dispatching in Chapter 21. For example:
class Checking_3 is Checking ... locals sv: fixed Savings; cmr: fixed Customer; keyMgr: KeyMgr; end op withdraw(...) { ... if overdrawn then key := keyMgr.nextKey; sv.lock(key); cmr.lock(key); if sv.canWithdraw(amt) /\ rqd(cmr, self) then ... end; ... end } end
The other classes must be modified accordingly. This eliminates the need for mediation at the expense of extreme identity and protocol coupling among participants, sometimes resulting in designs that are difficult to maintain.
When objects are already structurally coupled for other reasons, similar localization techniques may be applied in a less disruptive fashion. Notification techniques may be used to move responsibility for managing class and relational constraints down to component objects rather than their hosts. For example, in our Square class we could define special versions of Point that maintain proper distance from each other whenever either is changed:
class DistancedPoint ... locals _x: Real <>; _y: Real <>; nbr: DistancedPoint; end x: real { _x? } y: real { _y? } op noPropShiftX(v: real) { _x.add(v); } op shiftX(v: real): () { _x.add(v); nbr.noPropShiftX(v); } end class Square_5 locals lowerLeft: DistancedPoint; upperRight: DistancedPoint; end inv upperRight.x > lowerLeft.x, upperRight.x - lowerLeft.x = upperRight.y - lowerLeft.y inv lowerLeft.nbr = upperRight, upperRight.nbr = lowerLeft op shiftHorizontally(offset: real): () { lowerLeft.shiftX(offset); } end
The invocation of lowerLeft.shiftX in shiftHorizontally maintains all invariants without explicit action on the part of the Square. The two forms of Point:: shiftX (or any of several similar setups) are necessary to prevent looping of notifications.
Group interaction frameworks may be used to translate models usually denoted by single classes at the analysis level. These include a number of ``master-slave'' or ``host-helper'' designs, in which a controller schedules the actions of a group of worker objects. The normal intent is to hide the fact that these workers exist. These designs represent increasingly larger scale versions of basic delegation techniques.
In most controller designs it is important to ensure that the ``outer'' controller object appear as a single entity to all outside clients, not as a visible composite of specialists or helpers. This normally requires that none of the internal helpers ever ``leak'' their identities to clients. In many situations, this is arranged merely by ensuring that the controller serve as a barrier, forwarding requests to helpers, and later forwarding back results. A similar practice is to recast bidirectional client-host interactions into those in which the delegate performs a callback to the original client through an anonymous wrapper.
As long as the delegates perform only internal operations, there is nothing more to worry about. However, these designs do not intrinsically ensure that operations within slave or helper classes do not send messages that reveal identities to other objects who may thereafter bypass the host and deal directly (and incorrectly) with the helpers. Any message sent from a delegate that includes self as an argument could lead to this.
One way to avoid such problems automatically is to arrange that hosts send self to delegates as an extra argument for all delegated messages. Delegates then uniformly employ the received apparentSelf for all other messages. Some OO languages (e.g., SELF [44] ) use variants of this protocol as built-in mechanisms, and to some, this is the only ``true'' sense of delegation (see, for example, [16,46]). One obstacle to using this strategy is that the hosts must be prepared to ``catch'' all messages ordinarily coming back to their delegates in order to redelegate them or do anything else with them. Enabling hosts to do this by adding new public forwarding operations to outer interfaces is at best tedious, especially when delegation links may be rebound. For such reasons, OO languages using such protocols build them into the language proper.
However, as they scale up, these strategies encounter further problems revolving around visibility, identity, and interference. The structures surveyed in the remainder of this section are most appropriate when interactions between controllers and helpers are communication-closed. When this does not hold, the architectures and protocols must be modified to employ the kinds of coordination techniques described earlier in this chapter.
A collection may be distributed among a set of objects, controlled by a single master that scatters requests and then gathers replies. Requests for set membership, etc., may then be multicast to all subcollections and run in parallel. The master does not usually need to gather all replies. For example, a set membership inquiry may be satisfied when any one of the slave subcollections replies affirmatively.
This idea leads to more general designs in which clients of a master-slave system may sequentially access any number of ``answers'' to queries, tasks, or problems. These answers may be maintained in a queue or similar collection held by the master, and accessed by clients. A bit of bookkeeping is necessary to keep track of things. For example:
class Master locals slaves: SET[Slave]; answers: QUEUE[Answer]; solving: Bool; init ~solving? problemID: Int; init problemID? = 0; nAnswers: Int; end op query(p: Problem) when ~solving? then { solving.t!; clear(answers); nAnswers.clr; problemID.inc; slaves.applyS(WRAP1(#1.doProblem(p, problemID?))) } else pend end op slaveReply(a: Answer, pid: int) when solving /\ pid=problemID? then { answers.put(a); nAnswers.inc; if nAnswers? = slaves.size then solving.f! end } else end op getNextAnswer: Answer { reply answers.take } end
This design only supports solution of one problem at a time. Multiple problems may be handled by associating different queues with different problems and then reparameterizing things accordingly.
Such designs may be expressed more easily via the definition of classes describing the higher-level structure and protocols. Worker groups (sometimes called process groups) are SETs or other collections of objects that all band together in computing a task or service. Worker groups may have features including:
Hiding these details within groups themselves better encapsulates protocol mechanics while also simplifying the design of classes that use them. One such protocol is synchronous control, in which a controller steps other objects through actions in a way that is known to conform to all task dependencies and interaction requirements. For example, the following design solves the ImageCell problem (Chapter 19) in a somewhat simpler and possibly more efficient way than our first attempt:
class CellStepper ... local g: WorkerGroup[ImageCell]; op step: () { g.bcastA(WRAP1(#1.getNorth)); g.bcastA(WRAP1(#1.getSouth)); g.bcastA(WRAP1(#1.getEast)); g.bcastA(WRAP1(#1.getWest)); g.bcastA(WRAP1(#1.updateBrightness)) } end
We assume here that WorkerGroup::bcastA is a multicast protocol that sends a message to all members and then waits for a completion reply from all of them before returning. The ImageCell operations must, of course, be redefined accordingly.
This arrangement avoids within-object dependency tracking at the price of synchronous processing. However, this can be an asset. It opens up a set of design methods based on ``data parallel programming'' [23], where a group of objects all do one thing, and then all another, and so on. These strategies are well suited to many fine-grained parallel programming environments. They are useful even in asynchronous systems. The framework results in virtual synchrony among slaves, thus simplifying controller design.
Many other coordination tasks become simpler when objects are structured into managed groups for the sake of performing particular tasks. For example, most, if not all intragroup communication may be performed via group multicast. Individual members do not need to keep track of others directly. For another example, locking may be performed via token passing, in which a single lock token is passed from member to member. Members only execute when they possess the token.
Worker group designs may be employed to improve fault tolerance. For example, rather than distributing helper objects that maintain or solve different parts of tasks, they may all handle the same task. In this way, if any of them fail, answers may still be obtained.
Replicated service objects may be arranged in server groups in which client service requests are somehow multicast to all replicates. Any of several alternative designs may be employed, including:
To achieve fault tolerance, each replicate should be self-contained, and thus share no (or at least few) connections with other downstream service objects. This often requires the replication of additional objects that must also be kept in synchrony to maintain global consistency. For example, in a standby design, all server objects might maintain separate versions of some x:Int. The primary object should multicast all updates to standby objects. However, these updates should be acted upon only when it is known that the primary has successfully completed its task.
Controllers and group managers themselves may be replicated. Further capabilities may be added to controllers in order to manage computation and detect problems. For example, a controller may send occasional probes that track participants, monitor progress, and detect failures. Some architectures and algorithms are described by Andrews [6]. Dealing with failure of a controller is a much more difficult issue. The members of the group must come to agreement about the nature of the failure and responsibility for recovery (see, e.g., [30]).
ISIS [9,8] is a toolkit of protocols and related mechanisms (implemented in C) that facilitate development of reliable group interaction designs. ISIS includes support for fault-tolerant tracking and control of process groups, along with other higher-level protocols useful in common design architectures. ISIS supports a causal multicast protocol that ensures that each of a series of messages to a group are received in issued order by all members, along with coordinator-cohort support, group-based locking protocols, and so on.
Further refinements apply to problems that must be solved iteratively by looping across phases that (1) divide the task across slaves, (2) have them compute some results, (3) gather up results, and (4) check if a full solution has finally been reached. Many large-scale scientific and engineering problems are readily, almost mechanically, decomposable in such terms. For example, assuming other appropriate declarations, a special kind of worker group could be designed as follows:
class ComputeGroup locals slaves: SET[Slave]; currentTask: Task; ctl: ProblemSplitter; numberReporting: Int; busy: Bool; client: Client; op scatter { numberReporting.clear; ctl.distributeProblem(slaves, currentTask); slaves.applyS(WRAP1(#1.computeAndReport)) } end op gather(c: Chunk, s: Slave) { currentTask.incorporate(c); numberReporting.inc; if numberReporting? = slaves.size then if ~currentTask.done then scatter else client.result(currentTask); busy.f! end end } op newTask(t: Task, c: Client) when ~busy? then { busy.t!; currentTask := t; client := c; scatter } else pend end end
Extensions include designs employing dynamic load balancing in which tasks are reconfigured based on the time it took for slaves to perform previous steps.
An alternative framework for iterative computation is to have the central coordinator serve as a blackboard, or work queue for the compute objects. This design is a minor variant of blackboard-based notification structures. In this case, the host initializes a set of worker objects, each of the form:
class TaskPerformer ... job: Job; m: TaskManager; op mainLoop { while true do job := m.take; transform(job); m.put(job) end } end
The take operation may be parameterized so that different kinds of TaskPerformers perform steps on jobs needing only their special talents. The results may then be fed back to be picked up by different specialists.
This is almost a mirror-image of previous designs. Rather than having the master send messages that may sit in queues until objects are ready to process them, the slaves themselves only take new tasks when they are done with others. Conversion from one form to the other is straightforward.
Open systems, in the sense coined by Hewitt [22] and Agha [1], are those in which new objects may dynamically join configurations, enter contracts, and generally behave as evolving self-organizing societies of interacting intelligent agents. Many of the specific architectural elements proposed for open systems (e.g., groups of receptionists that deal with external events) as well as for related distributed artificial intelligence frameworks (see, e.g., [26,17]) are very much usable in the design of less experimental applications. However, techniques for reliably supporting the general paradigm do not appear well enough developed to admit routine exploitation.
Among the primary hurdles in constructing open systems is supporting full-scale metalevel reasoning. Since new objects belonging to new classes may be introduced at any point, some or all objects must ``understand'' the underlying meaning of the base language so they can dynamically decode and create new attributes and operations, construct and instantiate new classes, and so on. In a sense, the objects themselves must be able to act as designers. Support requires more extensive forms of metaclasses and the like than are presented in Chapters 8 and 18. It also requires further exploitation of reflection than we have so far described. Objects and groups of objects may need to reconfigure themselves dynamically to deal with new situations, as described in [31].
While perhaps exotic, many metalevel facilities are currently supported in languages including CLOS [28] and have been used to good effect. For example, Paepcke [33,34,35,36] discusses the extension of CLOS into PCLOS through the use of metalevel features in which objects behave as local, persistent, and/or cached entities depending on context. Similarly, Rao [37] describes a windowing system supporting programmer-accessible protocol specialization. For example, objects do not need to perform expensive screen updates when they find themselves embedded in nonoverlapping cells in a spreadsheet. Both of these applications are made simpler through the use of CLOS mechanisms enabling state and protocol descriptions to be dynamically modified rather than hard-wired.
Massively distributed object systems may soon pervade the planet. These open object systems do not require the use of fine-grained metalevel reasoning facilities. In fact, because of their scale and intended range of applicability, they require only the most minimal assumptions and information about participating coarse-grained distributed objects. This information generally consists only of a listing of those services that each object is willing to provide to others, especially those residing in foreign systems. Service specifications must be expressed in a common interface description language (e.g., CORBA IDL[32] -- see Chapter 23).
The specification, design, and implementation of global open object systems is a very different enterprise than that of the application systems discussed in this book. In Chapter 23, we briefly discuss strategies for dealing with foreign systems. However, these methods can break down in the development of such systems themselves. Service interface descriptions are quite a bit weaker than specifications possible in self-contained systems. For example, one may no longer rely on object identity as an analysis, design, or implementation construct. Essentially all communication must use transparent service-based routing mechanisms resting on these weaker assumptions. Available coordination protocols, security measures, and testing techniques are similarly limited.
Transitions interfere when two or more sequences of actions may be interleaved in undesirable ways. Pragmatically, it is impossible to guarantee that no interference will occur in a system. However, safeguards, design policies, and control protocols may be used as needed to ensure atomicity and consistency. The translation of guarded transitions into concrete form must take potential interference into account.
Delegation provides a basis for scaling up control architectures. A number of ready-made design architectures exist and may be plugged into design efforts.
There are currently no general accounts or catalogs of OO design architectures. However, beyond the references cited in the text, many articles and papers describe case studies and overviews of particular examples. These may be found in proceedings of the OOPSLA, ECOOP, Usenix C++, and TOOLS conferences, as well as the Journal of Object-Oriented Programming and other magazines and journals.
Synchronization and interference control are central concerns in systems of all sorts. Chandy and Misra [13] describe design issues from a non-OO perspective. Andrews [5] is an excellent technical reference for many control and protocol issues. Apt and Olderog [7] discuss verification. Burns et al [10] describe joint action control in Ada . Aksit [3], Detlefs et al [15], and Guerraoui et al [19] describe particular OO transaction architectures.
Further discussions of aliasing in OO contexts may be found in Hogg et al [25]. Language-based interference analysis methods are discussed by Reynolds [38], America and de Boer [4], Jones [27] and Wills [45].