C++ as an Interface Definition Language

by Doug Lea.

An interface encapsulates a coherent set of services and attributes (broadly, a Role), without explicitly binding this functionality to that of any particular object. In general, one object may support several interfaces, and conversely, one interface may be implemented by several objects working in tandem. CORBA IDL is probably the best-known (but very impure) example of an IDL. IBM SOM is in part an IDL, but also contains additional features; many akin to the C++ variants described below.

OO Interfaces may be described as structured collections of operations, where each operation has a name and a type. The type is described as a signature of arguments and results, perhaps along with some indication of of semantics (e.g., preconditions and postconditions) and/or protocols (e.g., descriptions of client-visible events such as callbacks produced upon invocation -- see for example PSL ).

The types appearing in interface signatures in a pure OO IDL must either be value types representing abstract values in an opaque fashion (e.g., integers, strings), or be handle types that represent capabilities or connections to entities providing the services described in an indicated interface.

One interface may be described as a subinterface of another if it extends its properties (normally by listing additional services.) In some IDLs one interface need not explicitly list that it is a subinterface of another; if it contains all of the same features, plus possibly more, it is considered a subinterface (this is known as type conformance -- see OOSD). It is common when defining interface hierarchies to use fairly fine-grained interfaces at the top, each defining only a few operations, and to use (multiple) interface inheritance to define ``fatter'', more useful ones as various combinations of these basic bits of functionality.

C++ does not directly support all of these notions, but contains mechanisms that achieve some of the effect.

Abstract Classes

Abstract classes can be used to define the C++ version of interfaces. A C++ abstract class describes functionality shared by objects of all concrete classes that are explicitly listed (perhaps indirectly) as subclasses. An object may play multiple roles by inheriting and implementing multiple interfaces. (The language doesn't directly support notions that an object may play different roles at different times, or that a role is implemented by a collection of objects, but these effects can usually be had in one way or another.)

C++ interface-style abstract classes take an idiomatic form:

class AnInterface {
public:
  virtual T1   aService(T2, T3) = 0;
  ...
  virtual T4   anAttribute() const = 0; // get value
  virtual void anAttribute(T4) = 0;     // set value
  ...
  virtual T5   aReadOnlyAttribute() const = 0;

  virtual ~AnInterface() {}

protected:
  AnInterface() {}
};

Ideally, the types T should consist only of pass-by-value scalar types (int, float, enum, ...), collections of them (structs, ...) and/or pointers to objects of classes also defined via abstract classes (thus representing handles). The lack of native by-value string and array types in C++ is a problem here. One occasionally attractive alternative is to always contain fixed arrays in structs. But in practice, you can make ADT-style types work OK too. This way you can sometimes obtain simpler mechanics and also avoid value copying. ADT-style fake pointer classes may also be used instead of raw pointers for handle types, although there is no perfect way to do this.

Abstract classes sometimes lend themselves to parameterization over some type used in one or more signatures. (Some nice examples are described in Barton and Nachman's book.) Beyond the template prefix, nothing much changes except for the pragmatic problems of dealing with templates in C++. These include for example the fact that template instantiation errors are not usually reported until link time. This is best combatted by prefacing each template with a brief comment about what operations are assumed to be supported on the type (e.g., a < comparison).

Since interface classes cannot be directly instantiated, yet serve as virtual base classes for implementations, the constructors should take no arguments and should be listed as protected. Also, for similar reasons, abstract classes should have a no-op virtual destructor, not one listed as ... = 0. Depending on your compiler, you might need to define the no-op constructor and destructor operations outside the class declaration in a separate .C file.

The use of const here and elsewhere has its ups and downs. Officially, it is a good idea, since it helps enforce some of the intended semantic guarantees. But often enough, pragmatic concerns get in the way -- generally, const-ness propagates through all code that any implementations of the services touch. And C++ is sometimes too literal-minded about it to enforce it in a useful way. Sometimes (but only sometimes) it is better just to enforce these semantics manually. (In ADT-style classes, on the other hand, C++ const support tends to work pretty well and should almost always be used.)

Similar remarks hold, but moreso for exceptions. It is hardly ever a good idea to annotate a signature of a C++ abstract class with an exception list -- doing so commits all implementations to raise only the ones listed, which is often impossible to live with. Not listing any says implicitly that any exception may occur, thus requiring manual documentation about the ones that are likely.

Public Virtual Inheritance

Officially, subinterfaces should be declared as public virtual subclasses of all of their direct ancestors. This form of (possibly multiple) inheritance prevents spurious strangenesses when the same operation is inherited along more than one path. The same hold true of the leaf concrete subclasses that implement interfaces. For most purposes public virtual inheritance ought to be considered the default subclassing mechanism.

People often break this rule however. With most compilers, programmers pay very noticeable performance penalties (both time and especially space) when they use virtual base classes. Instead, they stick with 100% single-inheritance designs, in which case regular public subclassing mechanics suffice. If you do this, it alters the way you tend to define base classes. And once you go this route you are normally stuck with it. As a rule of thumb, either use public virtual subclassing consistently for all subclasses or not at all. Doing otherwise leads to dark corners.

In general, try to avoid overloading the same operation name, with the same number of arguments but with different argument or result types in subinterface classes. If you do so, be prepared to study the resolution rules carefully.

Factories

To deal effectively with interface-based designs, you need a way to separate instantiation (construction) from classes. By the rules of the interface-based design, a client should not care about which object or its immediate concrete class it gets to perform a service. So invoking a constructor declared within a particular class is out. Instead, when doing interface-based design, for each abstract class, you need to define one or more generators (see OOSD) or factories (see DP) that declare methods that return instances of (subclasses) of that interface. By convention, the methods are often named newC for each class C.

Factories often have methods that produce instances of of several related classes, but all in a compatible way. Factories should themselves be defined via interfaces, so the client need not know which concrete factory object it is using. Ideally, all such matters can be reduced to a single concrete call to construct the appropriate ``master'' concrete factor in a client application.

Factories often need to invoke special ``open'' constructors on the concrete classes they generate, that enable them to lay out all properties by initializing internals in any thay they please. The basic form of an ``open'' constructor is to have an argument associated with each internal slot (member variable) and to bind the slot to the value of the argument. These kinds of ``open'' constructors are a little dangerous to have around in general, but it is hard to get C++ to agree about the access privileges surrounding them. Ideally, you'd like to have constructors listed as private but with the factories as friends. Unfortunately (in this case), friendship is not transitive, so you can't write something saying in class C saying that all CFactorys are friends of all SubCs. Often enough, the only alternative is to leave the constructors as public but to document their intended use.

Name Spaces

It doesn't take too many classes before naming conventions for interfaces start becoming a problem. People tend to want to give the same names to different classes and operations (for example Node, put etc). But when clients use a class or operation name, they need to be sure about what they are getting. There is only one good solution here, language-based support of modules, packages, or namespaces, that support some kind of nested name prefixing scheme. The ANSI standard C++ contains a namespace construct usable for these purposes, but most compilers do not yet implement it.

Until then, you have to live with non-optimal solutions, for example manual naming conventions in which each class and/or operation name given a standard prefix reflecting its module name. The least desirable workaround is to use C++ nested classes, which are hardly ever worth fighting with.

On the other hand, nesting typedefs and enums within classes is a simple way of avoiding name-clutter for symbolic type names, and should always be used when the scope of a symbolic type name can be restricted to implementations and clients of a particular interface. For example, an interface defining array-like operations for which all index arguments must be unsigned shorts might include:

class ArrayLikeThing {
public:
  typedef unsigned short Index;

  virtual Index firstElementIndex() = 0;
  ...
};

Note that a client would have to invoke this via something like:

void f(ArrayLikeThing* a) { 
   ArrayLikeThing::Index i = a->firstElementIndex();
   ...
}

Variants

C++ supports a number of variant definition styles that lie on the border between interface-based and OO techniques. Unlike most other sublanguage interactions, most of these are pretty straightforward and useful.

Utility Operations

Sometimes you'd like to add some miscellaneous utilities that conveniently package up a certain sequence or combination of invocations on the base operations defined in an interface. For a too-simple example, suppose you have an interface Coll for a collection of some sort with a put(int x) operation, and a lot of expected clients that will need to put in items in pairs. You'd like to have something like putPair(int a, int b), with the obvious implementation. There are at least four alternatives:

Creating a new interface, say PairColl with method putPair, with the understanding that it could be implemented by a concrete subclass that holds a link to a Coll and sends it pairs of puts.
Defining a concrete class as in (1) without bothering to establish an interface class.
Writing a top-level void putPair(Coll* c, int x, int y) { c->put(x); c->put(y); }.
Adding putPair directly into the Coll class as a non-abstract operation, with the series of puts as the default implementation code.

This is a version of the subclassing versus composition issue. In general, the best answers are either (1) or (4). The first provides clean layering of code that uses a set of base functionality to provide more complex functionality. It also allows you to come up with totally different implementations (for example here to rely on objects that somehow maintain all elements in pairs). The second and third are less abstract simplifications of (1), that are sometimes appropriate for one-shot use (for example, as local helpers in a client module). In contrast, the final option doesn't have the layering benefits but does make it easier for concrete subclasses to specialize the operation; it may be that for some implementations there is a faster way of putting two items than invoking put twice.

Another case in which option (1) best applies is when you have a utility that operates on pairs of objects of some nominal type, and you ever need to be able to specialize that behavior on the basis of both types. You must either encapsulate this as a specializable interface or be prepared to handcraft multiple dispatching at the implementation level.

Default Implementations

Sometimes, there is a reasonable default implementation for an operation defined in an interface, and this default can be coded in a way that does not introduce any internal representation mechanics. Pure interface-level defaults only work nicely when they do not introduce any internal representation constraints (but see next variant).

One common example is that during development, you wight want to stub out operations by simply printing a message whenever they are invoked. This might as well be implemented as a default in the interface class itself. (Although the logistics are sometimes tricky for operations that are supposed to return something; you have to figure out some value to return.)

You can add additional scaffolding via protected methods. For example, if the printed message should take a particular form, you could declare and implement a protected operation printMsg(char*) (or whatever) and then invoke it in the default implementation of all the others.

Even further, you can set things up to rely on representations without actually declaring them. For example, suppose that this printMsg requires a C++ ostream* representing a log file, and that this logfile might be different for different objects. You can still add the default without introducing any representational mechanics by adding a pure virtual protected attribute-style method logFile(), that returns the current log file handle. This might be implemented in different ways in different subclasses -- some might just keep a pointer internally; while others might ask another object what the current log file is every time logFile is invoked. So all together, we'd have:

class X {
protected:
  virtual ostream* logFile() const = 0;
  virtual void printMsg(const char* m) { (*logfile()) << m << " called\n"; }
public
  virtual void anOp() { printMsg("anOp"); }
  ...
};

Each subclass would have to implement logFile() itself (perhaps just as an accessor for a private: ostream* logFile_), but once this is done, the default mechanisms work as defined.

Partially Abstract Classes

Sometimes, there are good reasons for claiming that some aspect of an interface's functionality must be implemented in a certain way. For example, perhaps all implementations must be compatible with representation types and conventions of some legacy code. Or perhaps there is only one way that you can imagine ever implementing a subset of the attributes or operations described in an interface. All-in-all, this is fairly common.

Once you add slots (member variables) to a class in C++, it stops acting like an interface class -- all subclasses will contain the representations, which means that they should use them to implement functionality. (Having the subclass carry around the slots but not ever using them is just asking for trouble. This is not to say that it's never an option; it's just intrinsically dangerous.)

So the first question to ask is whether you can avoid introducing member variables. Variants of the pure virtual protected attribute idiom often suffice, although they add enough performance overhead to be unattractive in some cases. For example, if all instances are required to maintain an IDnumber, and this number must be maintained in a fixed known representation, and it is commonly accessed in performance-critical code, then it would probably be overkill to delegate maintenance of IDNumbers to some other IDNumberMaintainer object that each object accessed via a virtual attribute, that in turn would probably always be implemented via a direct pointer to the IDNumberMaintainer anyway. So in this case, probably the best option is just to introduce some representation and operations that maintain it in the interface class itself.

However, you can still avoid overcommitment by encasing these mechanics in a subclass of the main interface class, so other options still remain possible while still simplifying and regularizing subclasses that rely upon the same mechanism. This is among the best way to introduce code-sharing in class hierarchies supporting standardized interfaces.

Usually the best approach in carrying this out is to write things almost as if you were embedding a little inner representation-maintainer class in the main class itself. Given this, the representation should be private, and manipulated only via methods usable by the ``outer'' objects and/or their clients. The maintenace methods typically include intialization via a non-default constructor (which is in turn problematic with virtual inheritance; you might instead need to define an explicit protected initialization method.)

There are several ways to set this up, varying in flexibility. The most flexible option is to declare the representation manipulation methods as protected, non-virtual, and possibly inline, with slightly mangled names. Then the default virtualversions of those operations that are publically exported can be written to just invoke the internal versions. For example:

class ThingWithID {
private:
  int IDRep_;
protected:
  inline int id_() const { return IDRep_; }
  ThingWithID(int ID) IDRep_(ID) {}
public:
  virtual int id() { return id_(); }
  ...
};

Copying

Whenever you enter the world of even partially concrete classes, you also have to make some policy about copying and assignment. If any subclasses will need to support a copy constructor and/or an assignent operator, then scaffolding for them must reside in all (semi-)concrete superclasses.

The vast majority of classes defined via interface-based design do not support any natural meaning for the notion of copying or assignment. In fact, this is true for many other classes as well. Unfortunately, C++ has a rule saying that the compiler will create these for you itself unless you explicitly list them. The best way to disable them entirely is to list them as private (so they are not callable) with no-op implementations; as in:

class ThingWithID {
  ...
private:
  ThingWithID(const ThingWithID& t) {}
  void operator=(const ThingWithID& t) {}
};

With some compilers, it is not even necessary to put in the no-op definitions; the compiler will then complain at link time if they are ever called. (Remember that even private methods may be called (by mistake in this case) internally.)

(These kinds of declarations are not needed in pure abstract class declarations since they are not instantiable to begin with.)

On the other hand, if you do need copyability in subclasses, then you need to add support for them in the base class. It turns out that in this example, and all others in which all slots are of native scalar type, that the versions of these operations that the C++ compiler would automatically generate would be OK, so you wouldn't really have to code this in the present example, but the form is:

class ThingWithID {
  ...
public:
  ThingWithID(const ThingWithID& t) :IDRep_(t.IDRep_){}
  ThingWithID& operator=(const ThingWithID& t) { 
     IDRep_ = t.IDRep; *return this; }
};

(When unimplemented, operator=() might as well be void, but when implemented, it should obey the usual C++ conventions for assigment operators; thus returning *this.)

Last update Mon Apr 17 11:55:02 1995 Doug Lea (dl at gee)