Two monadic relations with an IND constraint

This example illustrates the maximal decompositon of two monadic relations with an IND constraint.

Let \(T\) be a finite type where there are \(n\) values of type \(T\).

Let schema \(S\) have two relvars \(R_1\), \(R_2\) each having the same single attribute of type \(T\) and there's an Inclusion Dependency (IND) constraint defined: \(R_2⊆R_1\).

The number of ways we can have \(i\) tuples in \(R_1\) is \(\binom{n}{i}\).

For \(i\) tuples in \(R_1\) the number of subsets of those tuples which can appear in \(R_2\) is \(2^i\).

Therefore the number of possible database values is \(\displaystyle|C(S)| = \sum_{i=0}^{n} \binom{n}{i} 2^i\).

Example with n=2

With \(n=2\), we get \(|C(S)| = \binom{2}{0}2^0 + \binom{2}{1}2^1 + \binom{2}{2}2^2 = 1 + 4 + 4 = 9\).

Let the two values of \(T\) be \(a\) and \(b\). We can denote values of the relations \(R_1\) and \(R_2\) by subsets of \(\{a,b\}\) according to which tuples are present. The following table shows the \(9\) possible database values which satisfy the IND constraint:

R1R2
{}{}
{a}{}
{a}{a}
{b}{}
{b}{b}
{a,b}{}
{a,b}{a}
{a,b}{b}
{a,b}{a,b}


Cartesian factorisation of the information in the database

It just so happens that the expression for the number of possible database values can be simplified:

\(\displaystyle|C(S)| = \sum_{i=0}^{n} \binom{n}{i} 2^i = 3^n\).

(e.g. for \(n=2\) we have \(3^2 = 9\) database values. For \(n=3\) we get \(3^3=27\), For \(n=4\) we get \(3^4 = 81\) and so on). This can be proven with the binomial theorem. It gives us the prime number factorisation of \(|C(S)|\).

There is a corresponding schema \(S'\) which is information equivalent to \(S\) involving \(n\) relvars for the \(n\) values of type \(T\). These relvars can be updated independently.

Each relvar implicitly refers to one of the values of type \(T\) and has exactly \(3\) possible values:

  1. the value isn't present in either \(R_1\) or \(R_2\)
  2. the value is present in \(R_1\) but not \(R_2\)
  3. the value is present in both \(R_1\) and \(R_2\)

Note that these propositions treat the value as implied, in exactly the manner discussed with nested databases.

In practise \(n\) can be very large. For example \(n = 2^{64}\) if \(T\) is a 64 bit integer. Obviously we can't expect a system to physically represent \(2^{64}\) relvars. In fact a vast number of independently updatable variables is good. The maximal decomposition tells us what can be independently updatable in some logical representation, and more specifically for having expressions which denote variables which are the target of updates. It is not prescribing a physical implementation.