Nested relational databases

A concept of nested relational databases is discussed, which is similar in nature to object oriented data models, but with a straightforward mapping to a conventional relational representation (usually). The nested form has some advantages which can be formalised with the concept of the unique prime Cartesian factorisation of the extension of the declared constraint on each database.

A constraint on a tuple is a recurring idea. E.g. it's relevant to dbvars of relational databases, tuples in relations, TTM possreps and domains and images of functions.

The motivation is a notion of maximal partitioning of the information in a database into orthogonal parts. This means having as many variables as possible which can be updated independently. There is an emphasis on relation variables (relvars), encompassing both base and derived relvars.

For a more detailed example see Factorisation of supplier and parts database schema

Nested databases

Consider there is a need to record lots and lots of facts about Fred Flintstone, independently of other things in Bedrock. It makes sense to define an independent schema just for Fred Flintstone (but probably reusable for other characters) using predicates which implicitly concern Fred Flintstone, and therefore don't need an identifier for a person! E.g. instead of

EyeColour(P,C) :- Person P has eye colour C.

the attribute P can be eliminated from the predicate because it is implicit

EyeColour(C) :- Fred Flintstone has eye colour C.

Note that elimination of identifiers from predicates means that dee and dum might be quite common (i.e. for boolean properties of things).

The person identifier is only needed in the outer or containing database (in order to make statements about that person in relation to other things), and as far as the DBMS is concerned, the identifier can (also) be regarded as a reference to an inner database. The inner database is deleted by the DBMS when it is no longer referenced. Within the inner database the person identifier is irrelevant. This immunises it from updates to the value of the identifier, and also schema changes when the format of the identifier changes. IMO these are striking advantages. For example, in a conventional RDB a change to the format of an identifier can require schema changes to dozens of relvars.

Since we have managed to pull P out of the predicate EyeColour, we have managed to create real cartesian prime factors within the context of a representation of all the information which is exclusively about Fred Flintstone!

Evidently there's a similarity to an OO perspective, because a nested database resembles an object and a surrogate id is like an oid. However I want to emphasise the fact that this connection is superficial. The nested database is founded on logic, and it's obvious that the power of the relational approach is undiminished by nesting given that the more conventional relational representation without nesting can be uniquely derived by simply adding the identifiers back again, as attributes of global predicates (assuming predicates are globally uniquely named).

Minor note: this can lead to multiple identifiers being added back again depending on nesting depth. This upsets unnesting if the nesting depth varies for a given type of nested database, and it means that global identity involves a path into a tree structure of nested namespaces, so the appropriate way to unnest is to record these paths in a single attribute.

todo : it would be useful to understand how the usual concept of functional dependencies and normal forms fits into the picture.

Concurrency

A significant motive for nested databases is to do with concurrency. If users can update prime factors independently then concurrency is maximised. If the information in a database can be factorised into thousands or millions of (conditional) prime factors, which are each very simple, like a person's eye colour, then it becomes reasonable to support multi-user editing where edits are applied to a local copy of the data without network latency, and even branching and merging of databases is feasible (in a similar fashion to version control systems that support branching and merging of text files). The premise is that users working on separate, long lived tasks, don't tend to edit the same very fine grained prime factors, so the number of conflicts tends to be very small.