Orthogonal persistence
Many researchers have commented on the ideology of orthogonal persistence.
Orthogonal persistence allows the programmer to treat all objects in the same way - whether they
persist or not. An object is able to persist regardless of its type. In the context of OO systems
an object persists if it is reachable from another persistent object.
A system can provide orthognal persistence to varying degrees. In the most extreme case, the
programmer should not need to write transactions to break up work into atomic pieces. Instead,
all threads are persistent so it can be assumed that on recovery a computation will carry on from
where it left off. This is far from easy to implement efficiently. To support persistent threads,
it is necessary for a system to look for a consistent cut on recovery. There can easily be a
domino effect where the last "consistent cut" occurred a long time in the past.
Persistence and concurrency control don't mix well. This is to be expected because adding
concurrency control to persistent objects implies that persistent objects are treated differently
to transient objects - in conflict with the principle of orthogonal persistence. When a class is
written and no mutexes etc are used for concurrency control, it is understood that
objects of that class are not threadsafe. One would expect this to be independent of whether the
object is reachable from a persistent root.
Consider a system where accessing the state of a persistent object can throw an exception. If the
programmer doesn't distinguish between persistent and transient objects, then it follows that it
must be assumed that access to any object can throw an exception. This is unworkable - because it
becomes impossible to reason about the correctness of a program.
Consider that a process supports more than one independent persistent store - because of multiple
hard-disks or floppy disks. This leads to confusion - what does it mean for an object to be
reachable from two independent persistent stores? From which store is it loaded when it is next
accessed? What if the object exists in inconsistent states on different media? What if one media
rolls back and another doesn't? The conclusion is that multiple independent persistent stores
within a given process are incompatible with orthogonal persistence.
A significant problem with the ideology of orthogonal persistence is the question of how to make
changes to the system. If state such as threads persist then it is very difficult to see how to
fix bugs, allow classes to add or remove members or implement new interfaces. It is as though we
are trying to change as system while it is still running.
The latter is such a difficult problem, it would appear that the ideology of orthogonal persistence
(at least in its true form) is unworkable.
In Praise of Manual Persistence
provides a good description of some of the problems with orthogonal persistence.
A very amusing quote on that page:
Do you, Programmer,
take this Object to be part of the persistent state of your application,
to have and to hold,
through maintenance and iterations,
for past and future versions,
as long as the application shall live?
How CEDA breaks othogonal persistence
The ceda design recognises that the ideology of orthogonal persistence offers some advantages, but
intentionally breaks with the principle in certain respects.
- Persistence is type intrusive in the sense that only objects that implement the IPersistable
interface can directly persist as objects with identity in the persistent store.
- Persistence by reachability is broken to allow a persistent object to contain transient state.
This is useful for a number of reasons.
- In Ceda it is assumed that observers are transient (with respect to the so called the observer
pattern). Often a persistent object is an event source, and therefore needs to allow transient
observers to attach and detach in order to be notified of events. For example persistent
documents need to notify their transient views of state changes to allow the views to redraw.
A persistent object may contain transient state such as mutexes, socket connections,
file handles and worker threads. In all these cases, such state is not written to disk.
Persistent objects are able to directly cache expensive calculations, without the cache
needing to persist. The decision to persist a cache is made by the programmer.
- Threads do not persist
- All changes to the store are made transactional through explicitly declared CSpace transactions.
This allows the system to provide atomicity.
- Persistent objects need to be explicitly marked as dirty when they are modified.
- Ceda makes use of a smart pointer template class pref for all pointers to persistent
objects. A "raw pointer" can't be used by one persistent object to point at another persistent
object.
In the literature this is referred to as "software swizzling", as distinct from systems like
ObjectStore that use "hardware swizzling", because persistent pointers depend on hardware
support for detecting page faults when pointers are dereferenced.
- Explicit Serialise() functions must be written to write the state of an object to disk.
How CEDA supports orthogonal persistence
However the principle of orthogonal persistence is adhered to in the following respects
- Ceda models a single heap from which both transient and persistent objects are allocated.
The 'new' operator is used in the normal way to create all objects.
- If we limit ourselves to the objects that implement IPersistable, then Ceda implements
persistence by reachability. This is achieved by tracing outgoing IObject pointers (provided
by the IObject::VisitObjects() method).
When a persistent object is modified, a trace rooted in that object is performed to find any
IPersistable objects that have become reachable for the first time. These objects need to be
allocated OIDs and written to the persistent store.
- Persistence is orthogonal to type in the sense that objects of a given class (assumed to
derive from IPersistable) may be transient or persistent. It only depends on whether they are
reachable from a persistent root.
- Persistent objects are not treated specially from the perspective of concurrency control.
- There is no special locking mechanism to protect access to persistent objects.
- There is no "transaction framework" that will roll back persistent objects when a serious
error occurs. Instead, it is up to the programmer to deal with exceptions and other
errors and make sure objects continue to satisfy their class invariants etc.
- If we limit ourselves to pref smart pointers, we see that they are typesafe, can be compared
and can point at either transient or persistent objects. In that sense, Ceda provides
pointers that meet the requirements of orthogonal persistence.
Simply dereferencing a pref may cause a persistent object to be "faulted" off disk
into memory.