Chapter 2: Introduction to Analysis | |
The analysis phase of object-oriented software development is an activity aimed at clarifying the requirements of an intended software system, with:
The final item in this list distinguishes object-oriented analysis from other approaches, such as Structured Analysis [14] and Jackson's method [6].
Constructing a complex artifact is an error-prone process. The intangibility of the intermediate results in the development of a software product amplifies sensitivity to early errors. For example, Davis [4] reports the results of studies done in the early 1970's at GTE, TRW and IBM regarding the costs to repair errors made in the different phases of the life cycle. As seen in the following summary table, there is about a factor of 30 between the costs of fixing an error during the requirement phase and fixing that error in the acceptance test, and a factor of 100 with respect to the maintenance phase. Given the fact that maintenance can be a sizable fraction of software costs, getting the requirements correct should yield a substantial payoff.
Development Phase | Relative Cost of Repair |
Requirements | 0.1 -- 0.2 |
Design | 0.5 |
Coding | 1 |
Unit test | 2 |
Acceptance test | 5 |
Maintenance | 20 |
Further savings are indeed possible. Rather than being aimed at a particular target system, an analysis may attempt to understand a domain that is common to a family of systems to be developed. A domain analysis factors out what is otherwise repeated for each system in the family. Domain analysis lays the foundation for reuse across systems.
There are several common input scenarios, generally corresponding to the amount of ``homework'' done by the customer:
Across such scenarios we may classify inputs as follows:
Not all these categories are present in all inputs. It is the task of the analyst to alert the customer when there are omissions.
As observed by Rumbaugh et al [10], the input of a fuzzy target specification is liable to change due to the very nature of the analysis activity. Increased understanding of the task at hand may lead to deviations of the initial problem characterization. Feedback from the analyst to the initiating customer is crucial. Feedback failure leads to the following consideration [10]: If an analyst does exactly what the customer asked for, but the result does not meet the customer's real need, the analyst will be blamed anyway.
The output of an analysis for a single target system is, in a sense, the same as the input, and may be classified into the same categories. The main task of the analysis activity is to elaborate, to detail, and to fill in ``obvious'' omissions. Resource and miscellaneous requirements often pass right through, although these categories may be expanded as the result of new insights obtained during analysis.
The output of the analysis activity should be channeled in two directions. The client who provided the initial target specification is one recipient. The client should be convinced that the disambiguated specification describes the intended system faithfully and in sufficient detail. The analysis output might thus serve as the basis for a contractual agreement between the customer and a third party (the second recipient) that will design and implement the described system. Of course, especially for small projects, the client, user, analyst, designer, and implementor parties may overlap, and may even all be the same people.
An analyst must deal with the delicate question of the feasibility of the client's requirements. For example, a transportation system with the requirement that it provide interstellar travel times of the order of seconds is quite unambiguous, but its realization violates our common knowledge. Transposed into the realm of software systems, we should insist that desired behavior be plausibly implementable.
Unrealistic resource and/or performance constraints are clear reasons for nonrealizability. Less obvious reasons often hide in behavioral characterizations. Complex operations such as Understand, Deduce, Solve, Decide, Induce, Generalize, Induct, Abduct and Infer are not as yet recommended in a system description unless these notions correspond to well-defined concepts in a certain technical community.
Even if analysts accept in good faith the feasibility of the stated requirements, they certainly do not have the last word in this matter. Designers and implementors may come up with arguments that demonstrate infeasibility. System tests may show nonsatisfaction of requirements. When repeated fixes in the implementation and/or design do not solve the problem, backtracking becomes necessary in order to renegotiate the requirements. When the feasibility of requirements is suspect, prototyping of a relevant ``vertical slice'' is recommended. A mini-analysis and mini-design for such a prototype can prevent rampant throwaway prototyping.
Most attention in the analysis phase is given to an elaboration of functional requirements. This is performed via the construction of models describing objects, classes, relations, interactions, etc.
We quote from Alan Davis [4]:
A Software Requirements Specification is a document containing a complete description of what the software will do without describing how it will do it.
Subsequently, he argues that this what/how distinction is less obvious than it seems. He suggests that in analogy to the saying ``One person's floor is another person's ceiling'' we have ``One person's how is another person's what''. He gives examples of multiple what/how layers that connect all the way from user needs to the code.
We will follow a few steps of his reasoning using the single requirement from our ATM example that clients can obtain cash from any of their accounts.
The ability of clients to obtain cash is an example of functionality specified by the user.
The requirements already exclude a human intermediary. Thus, we can consider different techniques for human-machine interaction, for example screen and keyboard interaction, touch screen interaction, audio and voice recognition. We can also consider different authentication techniques such as PIN, iris analysis, handline analysis. These considerations address the how dimension.
The suggestion to construct this set of all systems (and apply behavior abstraction?) strikes us as artificial for the present discussion, although an enumeration of techniques may be important to facilitate a physical design decision.
This is debatable and depends on the intended meaning of ``exact behavior''. If this refers to the mechanism of the intended system then we subscribe to the quotation. However, it could also encompass the removal of ambiguity by elaborating on the description of the customer-ATM interaction. If so, we still reside in what country. For example, we may want to exemplify that customer authentication precedes all transactions, that account selection is to be done for those customers who own multiple accounts, etc.
We may indeed be more specific by elaborating an interaction sequence with: ``A client can obtain cash from an ATM by doing the following things: Obtaining proper access to the ATM, selecting one of his or her accounts when more than one owned, selecting the cash dispense option, indicating the desired amount, and obtaining the delivered cash.'' We can go further in our example by detailing what it means to obtain proper access to an ATM, by stipulating that a bank card has to be inserted, and that a PIN has to be entered after the system has asked for it.
Davis continues by arguing that one can define what these components do without describing how they work internally.
In spite of such attempts to blur how versus what, we feel that these labels still provide a good initial demarcation of the analysis versus the design phase.
On the other hand, analysis methods (and not only OO analysis methods) do have a how flavor. This is a general consequence of any modeling technique. Making a model of an intended system is a constructive affair. A model of the dynamic dimension of the intended system describes how that system behaves. However, analysts venture into how-country only to capture the intended externally observable behavior, while ignoring the mechanisms that realize this behavior.
The object-oriented paradigm puts another twist on this discussion. OO analysis models are grounded in object models that often retain their general form from analysis (via design) into an implementation. As a result, there is an illusion that what and how get blurred (or even should be blurred). We disagree with this fuzzification. It is favorable indeed that the transitions between analysis, design, and implementation are easier (as discussed in Chapter 15), but we do want to keep separate the different orientations inherent in analysis, design, and implementation activities.
We should also note that the use of models of any form is debatable. A model often contains spurious elements that are not strictly demanded to represent the requirements. The coherence and concreteness of a model and its resulting (mental) manipulability is, however, a distinct advantage.
OO analysis models center on objects. The definition of objects given in Chapter 1 is refined here for the analysis phase. The bird's eye view definition is that an object is a conceptual entity that:
Since we are staying away from solution construction in the analysis phase, the objects allowed in this stage are constrained. The output of the analysis should make sense to the customer of the system development activity. Thus we should insist that the objects correspond with customers' notions, and add:
Another ramification of avoiding solution construction pertains to the object's operator descriptions. We will stay away from procedural characterizations in favor of declarative ones.
Some OO analysis methods have made the distinction between active and passive objects. For instance Colbert [3] defines an object as active if it ``displays independent motive power'', while a passive object ``acts only under the motivation of an active object''.
We do not ascribe to these distinctions, at least at the analysis level. Our definition of objects makes them all active, as far as we can tell. This active versus passive distinction seems to be more relevant for the design phase (cf., Bailin [1]).
This notion of objects being active is motivated by the need to faithfully represent the autonomy of the entities in the ``world'', the domain of interest. For example, people, cars, accounts, banks, transactions, etc., are all behaving in a parallel, semi-independent fashion. By providing OO analysts with objects that have at least one thread of control, they have the means to stay close to a natural representation of the world. This should facilitate explanations of the analysis output to a customer. However, a price must be paid for this. Objects in the programming realm deviate from this computational model. They may share a single thread of control in a module. Consequently, bridging this gap is a major responsibility of the design phase.
A representation of a system is based on a core vocabulary. The foundation of this vocabulary includes both static and dynamic dimensions. Each of these dimensions complements the other. Something becomes significant as a result of how it behaves in interaction with other things, while it is distinguished from those other things by more or less static features. This distinction between static and dynamic dimensions is one of the axes that we use to distinguish the models used in analysis.
Our other main distinction refers to whether a model concentrates on a single object or whether interobject connections are addressed. The composition of these two axes give us the following matrix:
inside object | between objects | |
static | attribute constraint |
relationship acquaintanceship |
dynamic | state net and/or interface |
interaction and/or causal connection |
Detailed treatments of the cells in this matrix are presented in the following chapters.
The static row includes a disguised version of entity-relationship (ER) modeling . ER modeling was initially developed for database design. Entities correspond to objects, and relations occur between objects.1 Entities are described using attributes . Constraints capture limitations among attribute value combinations. Acquaintanceships represent the static connections among interacting objects.
1Footnote:
The terms ``relation'' and ``relationship'' are generally interchangeable. ``Relationship'' emphasizes the notion as a noun phrase.
The dynamic row indicates that some form of state machinery is employed to describe the behavior of a prototypical element of a class . Multiple versions of causal connections capture the ``social'' behavior of objects.
Inheritance impacts all four cells by specifying relationships among classes. Inheritance allows the construction of compact descriptions by factoring out commonalities.
The four models form a core. Additional models are commonly added to give summary views and/or to highlight a particular perspective. The core models are usually represented in graphic notations. Summary models are subgraphs that suppress certain kinds of detail.
For instance, a summary model in the static realm may remove all attributes and relationship interconnections in a class graph to highlight inheritance structures. Alternatively, we may want to show everything associated with a certain class C, for example, its attributes, relationships in which C plays a role, and inheritance links in which C plays a role.
An example in the dynamic realm is a class interaction graph where the significance of a link between two classes signifies that an instance of one class can connect in some way or another with an instance of another class. Different interaction mechanisms can give rise to various interaction summary graphs. Another model component can capture prototypical interaction sequences between a target system and its context. Jacobson [7] has labeled this component use cases . They are discussed in Chapters 10 and 12.
All of these different viewpoints culminate in the construction of a model of the intended system as discussed in Chapter 10.
Several factors prevent analysis from being performed according to a fixed regime. The input to the analysis phase varies not only in completeness but also in precision. Backtracking to earlier phases is required to the extent of the incompleteness and the fuzziness of the input. Problem size, team size, corporate culture, etc., will influence the analysis process as well.
After factoring out these sources of variation, we may still wonder whether there is an underlying ``algorithm'' for the analysis process. Investigation of current OO analysis methods reveals that:
We have similarly adopted a weak bias. Our presentation belongs to the cluster of methods that focuses on the static dimension first and, after having established the static models, gives attention to the dynamic aspects. However, this position is mutable if and when necessary. For instance in developing large systems, we need top-down functional decompositions to get a grip on manageable subsystems. Such a decomposition requires a preliminary investigation of the dynamic realm. Chapter 9 (Ensembles) discusses these issues in more detail. A prescribed sequence for analysis is given via an example in Chapter 10 (Constructing a System Model). A formalization of this ``algorithm'' is given in Chapter 12 (The Analysis Process).
Analysis provides a description of what a system will do. Recasting requirements in the (semi) formalism of analysis notations may reveal incompleteness, ambiguities, and contradictions. Consistent and complete analysis models enable early detection and repair of errors in the requirements before they become too costly to revise.
Inputs to analysis may be diverse, but are categorizable along the dimensions of functionality, resource constraints, performance constraints and auxiliary constraints.
Four different core models are used to describe the functional aspects of a target system. These core models correspond with the elements in a matrix with axes static versus dynamic, and inside an object versus in between objects.
Analysis is intrinsically non-algorithmic. In an initial iteration we prefer to address first the static dimension and subsequently the behavioral dimension. However, large systems need decompositions that rely on early attention to functional, behavioral aspects.
We witness an explosion of analysis methods. A recent comparative study [5] describes ten methods. A publication half a year later [9] lists seventeen methods. Major sources include Shlaer and Mellor [11,12], Booch [2], Rumbaugh et al [10], Wirfs-Brock et al [13], and Jacobson et al [8].