Factorisation of supplier and parts database schema

The supplier-and-parts database schema appears in Chris Date's book An introduction to Database Systems. There are three relvars in the schema, used to represent suppliers, parts and shipments. It is a conventional normalised relational database schema. Note that if one puts this schema into 6NF there are 8 instead of 3 relvars. Even in 6NF form none of the relvars are independently updatable.

We will now consider an alternative information equivalent schema using a nested relational database schema, meaning there are Database Valued Attributes (DVAs). In other words there are relational database values appearing as attribute values within a containing relational database. The "nesting" is a technique that can be used in a real schema definition to help organise the information in a way that highlights orthogonality. The advantages in terms of a separation of concerns, orthogonality of information and independent updates is discussed. See nested databases.

The key idea is to regard a relational database schema as a possrep defining a type (a dbtype).

Supplier Database

A SUPPLIER_DATABASE is a type and represents information about a given supplier. For example the SUPPLIER_DATABASE for supplier S1 has the following three relvars:

Relvar Predicate
SNAME
Smith
the supplier has name [SNAME]
STATUS
20
the supplier has status [STATUS]
CITY
London
the supplier is located in city [CITY]

These three relvars have the constraint that they have exactly one tuple (which is equivalent to a constraint that the relvars are non-empty and have empty keys). They can be updated independently.

Note that these predicates have the form "the supplier ...". They refer to the supplier with the definite article. So these predicates presuppose the existence of the supplier, and do not make sense if the supplier doesn't exist. Putting it another way, the supplier is assumed to exist in every world situation in which the SUPPLIER_DATABASE has meaning.

Note that even though a supplier number (S#) identifies a supplier, it doesn't appear as an attribute in any of the relvars of the SUPPLIER_DATABASE.

There is a separation of concerns between identifying the supplier and representing facts about the supplier. There could be a hundred relvars for the latter and none of them are concerned with how a supplier is identified. This is handy when the key is composite or there are multiple candidate keys.

Part Database

A PART_DATABASE is a type and represents information about a given part. For example the PART_DATABASE for part P1 has the following four relvars:

Relvar Predicate
PNAME
Nut
the part has name [PNAME]
COLOR
Red
the part has color [COLOR]
WEIGHT
12.0
the part has weight [WEIGHT]
CITY
London
the part is stored in city [CITY]

These four relvars have the constraint that they have exactly one tuple. They can be updated independently.

Supplier and parts database

SUPPLIER_DATABASE and PART_DATABASE values can appear as values of attributes within a SUPPLIER_AND_PARTS_DATABASE.

A SUPPLIER_AND_PARTS_DATABASE is a type and has three relvars:

Relvar namePredicateKey
Sthere exists a supplier identified by [S#] described by supplier database [SDB]{S#}
Pthere exists a part identified by [P#] described by part database [PDB]{P#}
SPthere exists a supplier identified by [S#] that ships quantity [QTY] of a part identified by [P#]{S# P#}

(with the appropriate IND constraints, these three relvars only represent one Cartesian factor of the possrep components)

Note that these predicates are existentially quantified in concrete parts and suppliers.

In effect this leads to supplier database variables and part database variables for each concrete supplier and part that exists.

Note that the creation and destruction of the supplier databases and the part databases happens through INSERT/DELETEs on the these three relvars.

Unrolling of the supplier and parts database

In the S&P database where there are 5 suppliers, 6 parts and 12 shipments. One can regard that information as being equivalent to a representation involving 23 independently updatable relvars, using predicate unrolling on the supplier and part numbers:

In this representation, we can incorporate the factors in the nested databases to end up with 5x3 + 6x4 + 12 = 51 variables which can be updated independently.

Can we reasonably say these variables are independently updatable given that their existence is conditional? In general we can say: If the variable exists then it is independently updatable.

This idea is not expected to be controversial. E.g. in an dynamically resizable array the elements are independently updateable if they exist.

FDs and keys

FDs reflect business requirements. The original S&P schema is information equivalent to the nested schema representation. This implies that the FD’s have been captured!

In the nested schemas there are still FDs and keys. The original S&P database which is 5NF has been decomposed into three simpler distinct database schema which are each in 5NF. Within these simpler schema there are relvars which are independently updateable, even though there are no independently updatable relvars in the original schema.

Most of the original FDs are no longer applicable in the nested schemas. For example the FD {S#}->{CITY} in S is inapplicable because attributes S# and CITY don't even appear in the same database schema (never mind the same relvar).

Note that the FD {S#}->{CITY} essentially means a supplier is located in exactly one city. It doesn't imply that the city cannot be updated independently.

There's a mapping from a set of nested schemas to an information equivalent unnested schema. The FD {S#}->{CITY} is implied by the nested database schema because

  1. There is an FD {S#}->{SDB}
  2. There is an FD {SDB}->{CITY}
  3. FDs are transitive

(where SDB is the attribute name for a supplier database - a database which records information about a single supplier)

(2) holds because

Relaxing constraint on city of part

Consider relaxing the FD constraint {P#}->{CITY} - so that a given part can be stored in at least one city. For example part P1 might be stored at both London and Paris. In the nested design it is just a matter of relaxing the constraint on the relvar that records the cities where a given part is stored:

CITY
London
Paris

In the conventional supplier-and-parts database schema there is far more upheaval to the schema - it is necessary to decompose the parts relvar P into two relvars, one which records the name, colour and weight of each part:

P#PNAMECOLORWEIGHT
P1NutRed12.0
P2BoltGreen17.0
P3ScrewBlue17.0
P4ScrewRed14.0
P5CamBlue12.0
P6CogRed19.0

and another to record the cities (where we can relax the uniqueness constraint on P#):

P#CITY
P1London
P1Paris
P2Paris
P3Oslo
P4London
P5Paris
P6London

Also Inclusion Dependency (IND) constraints will need to be added to make sure the same set of part numbers appear in these two relvars.

Advantages of nested databases