Factorisation of supplier and parts database schema

The supplier-and-parts database schema appears in Chris Date's book An introduction to Database Systems. There are three relvars in the schema, used to represent suppliers, parts and shipments. It is a conventional normalised relational database schema. Note that if one puts this schema into 6NF there are 8 instead of 3 relvars. Even in 6NF form none of the relvars are independently updatable.

We will now consider an alternative information equivalent schema using a nested relational database schema, meaning there are Database Valued Attributes (DVAs). In other words there are relational database values appearing as attribute values within a containing relational database. The "nesting" is a technique that can be used in a real schema definition to help organise the information in a way that highlights orthogonality. The advantages in terms of a separation of concerns, orthogonality of information and independent updates is discussed. See nested databases.

The key idea is to regard a relational database schema as a possrep defining a type (a dbtype).

Supplier Database

A SUPPLIER_DATABASE is a type and represents information about a given supplier. For example the SUPPLIER_DATABASE for supplier S1 has the following three relvars:

Relvar

Predicate

SNAME
Smith

the supplier has name [SNAME]

STATUS
20

the supplier has status [STATUS]

CITY
London

the supplier is located in city [CITY]

These three relvars have the constraint that they have exactly one tuple (which is equivalent to a constraint that the relvars are non-empty and have empty keys). They can be updated independently.

Note that these predicates have the form "the supplier ...". They refer to the supplier with the definite article. So these predicates presuppose the existence of the supplier, and do not make sense if the supplier doesn't exist. Putting it another way, the supplier is assumed to exist in every world situation in which the SUPPLIER_DATABASE has meaning.

Note that even though a supplier number (S#) identifies a supplier, it doesn't appear as an attribute in any of the relvars of the SUPPLIER_DATABASE.

There is a separation of concerns between identifying the supplier and representing facts about the supplier. There could be a hundred relvars for the latter and none of them are concerned with how a supplier is identified. This is handy when the key is composite or there are multiple candidate keys.

Part Database

A PART_DATABASE is a type and represents information about a given part. For example the PART_DATABASE for part P1 has the following four relvars:

Relvar

Predicate

PNAME
Nut

the part has name [PNAME]

COLOR
Red

the part has color [COLOR]

WEIGHT
12.0

the part has weight [WEIGHT]

CITY
London

the part is stored in city [CITY]

These four relvars have the constraint that they have exactly one tuple. They can be updated independently.

Supplier and parts database

SUPPLIER_DATABASE and PART_DATABASE values can appear as values of attributes within a SUPPLIER_AND_PARTS_DATABASE.

A SUPPLIER_AND_PARTS_DATABASE is a type and has three relvars:

Relvar name	Predicate	Key
S	there exists a supplier identified by [S#] described by supplier database [SDB]	{S#}
P	there exists a part identified by [P#] described by part database [PDB]	{P#}
SP	there exists a supplier identified by [S#] that ships quantity [QTY] of a part identified by [P#]	{S# P#}

(with the appropriate IND constraints, these three relvars only represent one Cartesian factor of the possrep components)

Note that these predicates are existentially quantified in concrete parts and suppliers.

In effect this leads to supplier database variables and part database variables for each concrete supplier and part that exists.

Note that the creation and destruction of the supplier databases and the part databases happens through INSERT/DELETEs on the these three relvars.

Unrolling of the supplier and parts database

In the S&P database where there are 5 suppliers, 6 parts and 12 shipments. One can regard that information as being equivalent to a representation involving 23 independently updatable relvars, using predicate unrolling on the supplier and part numbers:

there is a supplier S1 described by supplier database <SDB>
...
there is a supplier S5 described by supplier database <SDB>
there is a part P1 described by part database <PDB>
...
there is a part P6 described by part database <PDB>
supplier S1 ships quantity <QTY> of part P1
...
supplier S4 ships quantity <QTY> of part P5

In this representation, we can incorporate the factors in the nested databases to end up with 5x3 + 6x4 + 12 = 51 variables which can be updated independently.

Can we reasonably say these variables are independently updatable given that their existence is conditional? In general we can say: If the variable exists then it is independently updatable.

This idea is not expected to be controversial. E.g. in an dynamically resizable array the elements are independently updateable if they exist.

FDs and keys

FDs reflect business requirements. The original S&P schema is information equivalent to the nested schema representation. This implies that the FD’s have been captured!

In the nested schemas there are still FDs and keys. The original S&P database which is 5NF has been decomposed into three simpler distinct database schema which are each in 5NF. Within these simpler schema there are relvars which are independently updateable, even though there are no independently updatable relvars in the original schema.

Most of the original FDs are no longer applicable in the nested schemas. For example the FD {S#}->{CITY} in S is inapplicable because attributes S# and CITY don't even appear in the same database schema (never mind the same relvar).

Note that the FD {S#}->{CITY} essentially means a supplier is located in exactly one city. It doesn't imply that the city cannot be updated independently.

There's a mapping from a set of nested schemas to an information equivalent unnested schema. The FD {S#}->{CITY} is implied by the nested database schema because

There is an FD {S#}->{SDB}
There is an FD {SDB}->{CITY}
FDs are transitive

(where SDB is the attribute name for a supplier database - a database which records information about a single supplier)

(2) holds because

there is a declared constraint in the supplier database, that there is exactly one CITY tuple. i.e. The FD {}->{CITY} holds within the supplier database.
FDs must be adjusted when they are translated between databases. The FD {}->{CITY} within the supplier database becomes {SBD}->{CITY} in the "containing" suppliers and parts database.

Relaxing constraint on city of part

Consider relaxing the FD constraint {P#}->{CITY} - so that a given part can be stored in at least one city. For example part P1 might be stored at both London and Paris. In the nested design it is just a matter of relaxing the constraint on the relvar that records the cities where a given part is stored:

CITY
London
Paris

In the conventional supplier-and-parts database schema there is far more upheaval to the schema - it is necessary to decompose the parts relvar P into two relvars, one which records the name, colour and weight of each part:

P#	PNAME	COLOR	WEIGHT
P1	Nut	Red	12.0
P2	Bolt	Green	17.0
P3	Screw	Blue	17.0
P4	Screw	Red	14.0
P5	Cam	Blue	12.0
P6	Cog	Red	19.0

and another to record the cities (where we can relax the uniqueness constraint on P#):

P#	CITY
P1	London
P1	Paris
P2	Paris
P3	Oslo
P4	London
P5	Paris
P6	London

Also Inclusion Dependency (IND) constraints will need to be added to make sure the same set of part numbers appear in these two relvars.

Advantages of nested databases

The nested representation provides a foundation for defining what variables can be independently updated. This is needed to allow Operational Transformation to be used with relational data. In particular it's needed to get the long list of benefits of CEDA such as replication and synchronisation.
The factorisation of the information into different schema can reduce complexity because they can be designed, specified, documented and understood in isolation.
Having lots of FDs defined implicitly seems an advantage in complicated examples.
There tends to be less upheaval in the schema when FDs change or when primary keys change.
A nested physical representation will typically outperform an unnested representation, and given that today's DBMS products are quite limited with how well they optimise physical representations, this is useful.
A nested schema appears to be more flexible in cases where the DVAs are keys. In that case there doesn't appear to be a mapping to an unnested form without the introduction of artificial identifiers.