collections
Package
by Doug Lea.
This overview is intended to accompany version 0.96b of the package of code (tar.gz format) (zip format) and documentation for the java collection classes. Please send comments and questions to dl@cs.oswego.edu.
Collections are objects that hold other objects that
are accessed, placed, and maintained under some set of rules. There
are many different ways you could classify these different rules. This
package uses one of the simplest, splitting them into four main categories,
(Sets, Bags, Seqs, Maps), plus a few `mixin' variants.
Top-level functionality is described entirely via interfaces
,
not classes
. This allows some mixing
and matching of capabilities at the implementation level, and
also enables you to add your own alternative implementations
of this functionality that will interoperate with the versions
provided in this package.
As a user of the collection
package, you
do not need to know a lot about data structures, but you do
need to know what kinds of access and placement rules you
want in a particular situation. The basic choices are:
The base interface Collection
reflects commonalities
across all of these of kinds of rules. For reasons described below,
all of the interfaces in the following diagram describe
classes/objects that need not even be updatable (modifiable), so
restrict themselves to operations that report properties of
Collections or build new Collections with various properties.
size
and whether it isEmpty
.
canInclude
a given
value. In general Collections hold
Objects. Elements may be subclasses of Object, but
not builtin scalars like int.
You can, however, use numbers as elements via
the java.lang.Number class (or rather its subclasses) to
``wrap'' the numbers as number-holding objects first. For example:
void addFloat(UpdatableBag b, float f) { b.add(new Float(f)); }
Also, null is never a valid element, so canInclude(null) is always false. Operations screen for nulls, and raise exceptions upon attempts to add them to collections (but NOT when testing for their presence). This can be annoying, but is the most defensible policy. One reason that nulls are excluded is that some collection operations intrinsically rely upon operations Object.equals and Object.hashCode, neither of which apply to nulls.
Any given collection may make stronger demands on the
kinds of elements it will accept, and report this via
canInclude
.
Most implementation
classes support a constructor with an argument supplying an element
screener (an object of type
Predicate) that checks
whether a value is considered to be a potentially
legal element of the collection. For example, you
could supply a screener that requires
that all elements be Strings (using `instanceof').
Screeners provide a more flexible variant of
constraints found in languages with parameterized
types, although without any kind of compile time
guarantees that clients won't try to place inappropriate
elements into collections, thus causing exceptions.
includes
a given element.
Equality is always tested using Object.equals(Object other).
occurrencesOf
a given
element it holds.
elements
. The
enumeration returned is of type
CollectionEnumeration, a
stronger kind of enumeration than java.util.Enumeration.
checkImplementation
, that checks
the consistency and properties of the internal state of a
collection implementation object. Collection
is a
subinterface of
ImplementationCheckable, which is in turn a subinterface
of Assertable, the base
interface for classes with an assert
method that can
be used to help track down errors.
sameStructure
as
another Collection, c; that is whether c has the same
size and the same elements() arranged in a compatible way. This is
a form of equality test, but is not named equals
(and thus does not override Object.equals) since it is not
always the sense of equality that you would want to apply
to collections.
excluding
(i.e., removing all occurrencesOf
)
a given element. This does not
alter the Collection itself, it just makes a new one. This operation
is a lot like subtraction of numbers. When you
say a - b
for two numbers a
and
b
, you get a new number that represents
the difference between them without changing
a
or b
. The excluding
operation has the same kind of effect.
By convention, all such creational methods are named with gerunds
(e.g., including, excluding, adding).
If you want a collection
in which you actually change the contents of the current collection
rather than building a new one,
you'll need an UpdatableCollection with
an exclude
operation, described below. But even without using UpdatableCollections,
you can still create series of collections, each one
differing them the previous one by virtue of not including
a given element via code like:
Collection excludingElements(Collection c, Enumeration e) { while (e.hasMoreElements()) c = c.excluding(e.nextElement()); return c; }
For many implementations of Collection, this is a ridiculously slow way to do it. However, as described below, others (the Incr classes) are tuned to do this reasonably efficiently.
removingOneOf
(i.e., removing at most one
occurrence of) a given element.
replacingOneOf
(i.e., at most one
occurrence of) some old value with a new value.
replacingAllOf
(i.e., all
occurrences of) some old value with a new value.
duplicate
of self -- an independent
copy of the collection, but not
necessarily its elements. Elements themselves
are not necessarily cloned. (Note: duplicate
is simply the externally callable version of method
java.lang.Object.clone()
.)
java.io.Serializable
.
includes
to be true; i.e., adding an
element only if it is not present.
Maps also serve as a way to use a certain field of objects of some class as a key for purposes of storing in a collection. For example, supposing you want to use some accessor prop() defined in your class as the basis for entering and finding elements, you could use a Map, entering pairs of (x.prop(), x).
ElementSorted Collections provide a different way to use properties of objects as their access keys. You can define your class to implement interface Keyed, with a method key() that returns the key (for example it might just return prop()). Then use a Comparator that compares on keys(). The DefaultComparator class does this automatically before trying to perform comparsions, but you can write your own Comparator that does this in a more specialized and efficient way.)
Updatable collections are `normal' mutable collections. They:
insertElementsAt(int index, Enumeration e)
that adds all elements of e, in order, at position index.
UpdatableSeqs also support special (often more efficient)
methods operating on the front and back of the sequence,
Currently, there are no `wrapped' implementation types that allow one implementation to serve as another kind of collection. You can always build these yourself. For example, it's possible to have an RBTree serve as the underlying implementation for a Set by making, say, an RBSet class holding a reference to an RBTree, and forwarding include as addIfAbsent, and so on. Similarly, there are not any Seq-like classes (like Stack or Queue) that ONLY support operations on the front or back, since you can just use UpdatableSeqs for such purposes merely by not using those other operations; or if you like, making a special class holding an UpdatableSeq and forwarding it only selected messages.
The reason there are so many different implementation types it that, in the collections business, it's not the case that one size fits all. Different Implementations have different time/space tradeoffs, and different operations they are particularly good and bad at. The basic strategy for building a collection package is to force all of these implementations to share common interfaces, so that only the person doing the initial construction of a collection object that will be passed around and used polymorphically has to care about these issues. If you do not care, just use the ones handed out by DefaultImplementations, which makes pretty reasonable choices. If you do care, be sure to look over the time complexities listed in the documentation for each implementation class.
Forcing implementations to share common interfaces is a standard OO trick, but sometimes leads to classes providing methods that they'd almost rather not support. For example LinkedLists support indexed access of the elements of the list. This is pretty slow. You'd only use a LinkedList if you only rarely need to access by index.
All current implementations are subclasses of UpdatableImpl a convenient abstract class that provides default implementations of many UpdatableCollection operations.
Also, all current UpdatableCollection classes implement constructive
operations (excluding
, adding
, etc) by
cloning full copies, which is not very fast.
Current implementations include:
buckets
and the loadFactorThreshold
.
chunkSize
for buffers.
capacity
of the underlying arrays
elementComparator
or keyComparator
used for ordering.
Many of the implementation classes themselves rely on yet lower-level classes of basic data structures. These classes are also made public in this package, so that you can use them as the building-blocks of other Collection implementations. They currently include:
Immutable versions of collections are those that GUARANTEE not to support mutations. This is a stronger claim than is made by Collection, since any object known only to be a Collection MAY be coercible to Updatable. But an Immutable version is not. This makes Immutable classes exceedingly safe.
In Java, the concept of immutablity is best expressed as an in-the-small design pattern consisting of:
Collection
, Set
,
etc., interfaces.
final
, and support
only the operations described in the interface. The
use of final
means that when you think you have
an immutable object, you really do -- it's not of some
subclass that supports mutable operations as well.
Immutable
just
to give a name to this property, so that, by convention,
such classes declare implements Immutable
.
The actual interface Immutable
is a null interface,
listing no operations at all.
The collections
package currently contains four
implementation classes using this scheme.
IncrSets
IncrBags
IncrSeqs
IncrMaps
are all implementation subclasses of abstract
class
IncrImpl
that uses a `negative delta' algorithm to
do lazy copying of collections formed via creational
update methods. This algorithm can be applied to any
kind of underlying UdatableCollection, so the Incr classes
do not actually perform the underlying updates (they rely on
UpdatableCollections supplied within constructors), they
just arrange to do it with a minumum of actual copying.
They normally update in a `lazy' fashion,
making copies only when necessary. However, when you are using
a received Immutable Collection as the basis for a very extensive set of
update operations (none of which should be applied to the original),
it is sometimes more efficient
to build an UpdatableCollection, add all of the elements of the
received ImmutableCollection to it, and then work off that. (Also,
the benefits of lazy updates are not always worth their costs
for small-sized collections. For those with only a few elements,
it is probably better to just make copies.)
There are two standard usage patterns for Immutable Collections:
SomeClass.someMethod(Set s)
.
You can do this via c.someMethod(new IncrSet(hs))
.
Note, however, that constancy guarantees can only hold
so long as you don't modify hs during the course of
the call, as might happen if someMethod creates a new thread using hs
and then returns, so that you then continue on modifying hs.
If this is a potenital problem, use the next option.
One of the main reasons for supporting all of the operations
that make Immutable Collections possible is to provide tools
that help keep threads from stomping on each others collections.
This can be a serious problem in multithreaded Java programs.
Synchronization (which is supported by
all implementation classes) helps, but not enough to
avoid some common errors. Immutable collections help even more. They let
you pass around values that cannot be modified at all in
other threads without you having to make full copies of them
when you don't need to. This is the same reasoning
that led to Java having two classes for Strings: String
for pure immutable strings, and StringBuffer
for
updatable ones.
Except for operations that accept Enumerations as arguments (for example UpdatableSeq.addElementsAt), exception throws generated from Collection operations are always accompanied by state ``rollbacks''. That is, they leave the object in a behaviorally equivalent state as it was before the operation commenced. (Operations accepting Enumerations may rollback, but do not promise to.)
java.util.Enumeration
interface.
Because Collection methods use Enumerations as
the basic element access construct, classes in this
package are generally compatible with other existing Java collection
classes. (Although by necessity, there is some
duplication. For example, Dynamic array implementations are
very similar to java.util.Vector, but different enough that they
needed to be redone from scratch.)
All Collection
classes actually produce
CollectionEnumerations
to report elements.
CollectionEnumeration is a subinterface of the standard
java.util.Enumeration
that adds two methods:
corrupted
which reports true if the enumeration is
detectably corrupted because of mutative operations occurring during
traversals, and numberOfRemainingElements
, which
reports the number of elements yet to be traversed.
While they are handy, Enumerations sometimes require a bit of
care. Any CollectionEnumeration
produced by an
UpdatableCollection
can become corrupted
if
an update operation on the collection is performed during a traversal.
The standard implementation of CollectionEnumeration
classes is very conservative, and reports corruption if there is
any change in the elements() set (as reflected by
version
changes). Thus, any attempt to obtain a
nextElement
can fail, raising a
CorruptedEnumerationException. (Enumerations produced from
Immutable
collections can only fail in this way if they
were constructed around an UpdatableCollection that remains
asynchronously accessible via other means even after it has been
wrapped within an Immutable Collection. As discussed elsewhere, while
it is impossible to enforce, Immutable Collections should never be
used in this way to begin with.)
Corruption exceptions occurring during operations such as
Bag.addElements()
are particularly tricky to revover
from. Upon exceptions, these operations do not necessarily ``roll
back'' to the state they were in before the operation, so recovery is
hard to manage. (Although you can at least look at
numberOfRemainingElements
to find out how many elements
were not processed.) If you are worried about such problems, set up
your own code to manually traverse enumerations and perform the inner
operations, but recovering in special ways upon exception.
You sometimes need to adjust standard algorithms to avoid the
possibility of CorruptedEnumerationExceptions. For example, if
you want to traverse through s.elements() for Set s
and
add other new elements depending on what's there, you CANNOT do it
via code like:
void f(UpdatableSet s) { // DO NOT DO THIS Enumeration e = s.elements(); while (e.hasMoreElements()) if (someProperty(e.nextElement())) s.include(someNewObject()); // will cause exception for nextElement }
One alternative is to place new elements in an auxilliary collection during the traversal; for example:
void f(UpdatableSet s) { UpdatableBag aux = new LinkedBuffer(); Enumeration e = s.elements(); while (e.hasMoreElements()) if (someProperty(e.nextElement())) aux.add(someNewObject()); s.includeElements(aux.elements()); // add them all at once }
Or, you could instead use an ImmutableSet, returning the ultimate result:
Set f(Set s) { Enumeration e = s.elements(); while (e.hasMoreElements()) if (someProperty(e.nextElement())) s = s.including(someNewObject()); return s; }
Another alternative is to use a Seq
instead
of Sets or Bags when you need to use such algorithms,
and use indexing instead of element() traversals. For example:
void f(UpdatableSeq s) { int originalSize = s.size(); for (int i = 0; i < originalSize; ++i) if (someProperty(s.at(i)) s.insertLast(someNewObject()); }
Very few collection operations deal with other collections. Most
instead deal with Enumerations. For example, there is no such
operation as UpdatableSet.union(Set s)
to union in the
elements of another Set s
. Instead there is
UpdatableSet.includeElements(Enumeration e)
that can be
used to the same effect (via
includeElements(s.elements())
), but can also be used to
include any set of Objects traversable from an any kind of Enumeration.
It is probably good practice to design methods and classes that use
the collections
package to also take Enumerations as
arguments and return them as results instead of only accepting other
collections. This increases the range of application of your classes.
Several specialized Enumeration classes are available to make it easier to deal in Enumerations rather than Collections or other special data structures. They include:
Object arr[] = { new Float(1.0f), new Float(2.0f) } Dynarray s = ... s.insertElementsAt(0, new ArrayEnumeration(arr));
Enumeration e = coll.elements(); Enumeration panels = new FilteringEnumeration(e, new IsPanel()); while (panels.hasMoreElements()) doSomethingWith((Panel)(panels.nextElement()));To use this, you will also need to write a little class of the form:
class IsPanel implements Predicate { boolean predicate(Object v) { return (v instanceof Panel); } }
nextElement
access.
For example, if you want to process only the parent() fields
of java.awt.Component elements held by a collection coll
you could write something of the form:
Enumeration e = coll.elements(); Enumeration parents = new MappingEnumeration(e, new ParentFunction()); while (parents.hasMoreElements()) doSomethingWith((Container)(parents.nextElement()));To use this, you will also need to write a little class of the form:
class ParentFunction implements Function { Object function(Object v) { if (v instanceOf Component) return ((Component)v).getParent(); else return null; } }
(You might also interpose a FilteringEnumeration here to screen out the resulting nulls.)
All this requires too much set-up to be reasonable in most situations, but is invaluable in others.
The elements are revealed (via nextElement()) in a purely interleaved fashion, alternating between the first and second enumerations unless one of them has been exhausted, in which case all remaining elements of the other are revealed until it too is exhausted.
For example, if you want to process together the elements of two Collections a and b, you could write something of the form:
Enumeration items = new InterleavingEnumeration(a.elements(), b.elements()); while (items.hasMoreElements()) doSomethingWith(items.nextElement());
A semiformal notation is used to try to nail down interfaces. This is not an exercise in formal methods. (In fact the best way formally specify interfaces and operations in `open' systems like Java is not a completely solved issue.) It is instead an attempt to be as unambiguous as possible about the effects of operations, which is easier to do with a bit of formalism. In fact, the main reason for spelling these out is to form a basis for testing. Nearly all claims made about operations in interfaces are tested for in corresponding Checked classes. So you can be reasonably sure that any claim made has been verified to hold, at least in the cases actually tested.
Checked versions of each of the major Updatable interfaces exist -- CheckedSet CheckedBag CheckedSeq CheckedMap . Each one `wraps' a normal collection object, and for each operation, asks the inner collection to perform it, checks as best it can that the effects match the specification, and then returns any results. When you have any doubt about any implementation, you can wrap it in a Checked version and watch for ImplementationError exceptions. (This is likely to slow down your program by a few orders of magnitude!)
Implementation classes are for the most part just informally
documented, but contain links to the fuller descriptions of
the methods they implement. However, each implementation class also
supports a checkImplementation
operation, that
tests for internal representational consistency. While this
is usually required only for testing purposes, it can come in
handy in other situations as well. For example, it can help you discover
whether someone changed the ordering property of an element in
a SortedCollection
or changed it in a way that
no longer passes canInclude
for most other collections.
Constructs used in the class documentation include:
return condition
to describe
postconditions and effects true upon return of a method.
PREV(obj)
within a return condition to refer
to the state of an object at the onset of a method.
OUT(message)
to describe messages that are
sent to other objects as required aspects of functionality,
or referrred to in describing the effects of other methods.
And similarly IN(message)
for messages that are received.
foreach (int i in lo .. hi) predicate(i)
to
means that predicate holds for each value of i.
foreach (SomeClass x) predicate(x)
to
means that predicate holds for each instance of SomeClass
foreach (Object x in e) predicate(x)
to mean
that the predicate holds for each element obtained
via nextElement() for Enumeration e.
let v = expr in ...
to give a name to
an expression.
-->
to means `implies'.
{ code segment }
''
to document convenience or specialized methods that can be
defined in terms of the effects of more primitive methods.
equals
and hashCode
) on the elements
held in the collection.
Originally written by Doug Lea and released into the public domain. Thanks for the assistance and support of Sun Microsystems Labs, Agorics Inc, Loral, and everyone contributing, testing, and using this code.
This means that you can use it for any purpose whatsoever, with no restrictions.
Also, SimpleTest.java contains some very simple tests of most classes.
clone
conventions: Declared base classes to be Cloneable; added
public callable duplicate
method.