Maximal decomposition of information

Let S be a relational database schema.

C(S) (the extension of the database constraint) can be regarded as a non-empty relation so we can consider its unique prime Cartesian factorisation.

Claim: For any schema S there exists a schema S which is information equivalent to S where the cardinality of every finite prime Cartesian factor of C(S) is a prime number.

This implies the bag of integers which are the cardinalities of the prime Cartesian factors of C(S) equals the bag of prime number factors of |C(S)|.

Note that the bag of prime number factors of any positive integer is uniquely determined, by the Fundamental theorem of arithmetic [].

S represents a schema having maximal decomposition - i.e. into the most parts which can be updated independently.

Proof

The following is an outline of a constructive existential proof which involves relational expressions for each relvar in S that are defined in terms of the relvars of S:

Let F1,...,Fn be the prime number factors of N=|C(S)| (the order doesn't matter)

Let the N database values in C(S) be organised into an n dimensional array of size j=1nFj=N (again, the order doesn't matter).

For j in 1..n and i in 1..Fj, let Bij denote the subset of C(S) corresponding to the set of database values in the ith slice of the jth component of the n-dimensional array (a slice has n-1 dimensions).

For j in 1..n and i in 1..Fj, let Tij={matches(d)dBij} where matches(d) evaluates to DEE if the dbvar for schema S has value d, otherwise DUM. See how it can be defined here.

Note that Tij evaluates to DEE if the value of the dbvar for S is in slice Bij, otherwise DUM.

Let x be some attribute name. For j in 1..n, let schema S have relvar Rj which is defined in terms of the relvars of S using the expression

Rj={Tij REL { TUP { (x i) } }i1..Fj}

(this expression evaluates to REL { TUP { (x i) } } where the dbvar of S is in slice Bij)

Note that for every database value each relvar Rj of S has exactly one tuple with one attribute. This follows because any database value (i.e. element of C(S)) lies in exactly one slice of the jth coordinate. The UNION over the JOINs picks out exactly one of the TUP { (x i) }. There are Fj slices for the jth component and hence Fj possible values of Rj.

It is easy to see that a combination of the values of the Rj corresponds to one of the elements of the n-dimensional array and hence to one of the database values in C(S).

Example 1

Let there be three lights coloured red, green and blue which can be on or off. There are 23=8 world situations which need to be recorded.

Let schema S1 have a single relvar with the predicate: The light with colour [COLOUR] is on.

Let schema S2 have three relvars with the following predicates:

  1. The red light is on.
  2. The green light is on.
  3. The blue light is on.

S1 and S2 are information equivalent. S2 is an example of a schema with maximal decomposition.

Example 2

Let there be three lights coloured red, green and blue which can be on or off, except they cannot be all off or all on. There are 6 world situations which need to be recorded.

The prime factorisation of 6 is 2×3. A maximally decomposed schema can be achieved with the following two predicates:

  1. A single light is on.
  2. The light with colour [COLOUR] is the only light in its state.

Example 3

See maximal decompositon of two monadic relations with an IND constraint.