View update on projections

In the following updates to projection views is discussed. This is related to updates to join views.

Chris Date uses the following example, where there are relvars S, ST and SC, satisfying:

    S ⟼ (ST,SC)
        ST = S { SNO, STATUS }
        SC = S { SNO, CITY }

    (ST,SC) ⟼ S
        S = ST JOIN SC

Note there is a bijection between S and the pair (ST,SC). Therefore for any update on S there is an equivalent and uniquely determined update on (ST,SC) and vice versa.

For example if a tuple is deleted from S then a tuple must be deleted from both ST and SC. This can't be deduced from the fact that a tuple has been deleted from ST JOIN SC. However it can be deduced from the fact that ST and SC are each projections on S.

Supplier SNO is under contract and has status STATUS and is located in city CITY

Supplier SNO is under contract and has status STATUS

Supplier SNO is under contract and is located in city CITY

SNO	STATUS	CITY
S1	20	London
S2	10	Paris
S3	30	Paris
S4	20	London
S5	30	Athens

SNO	STATUS
S1	20
S2	10
S3	30
S4	20
S5	30

SNO	CITY
S1	London
S2	Paris
S3	Paris
S4	London
S5	Athens

S, ST, SC satisfy the following constraint:

    S {SNO} = ST {SNO} = SC {SNO}

Chris Date says there is a view update problem for a user that only sees ST:

Now what about a user who sees only (say) relvar ST? Well, such a user knows the predicate (see above) and knows also that {SNO} is a key for that relvar, but of course isn't aware of any constraints that mention either SC or S. Perhaps more to the point, that user isn't aware of any compensatory actions either. Clearly, then, that user can't be allowed to insert tuples into relvar ST, nor to update supplier numbers within that relvar, because such operations have the potential to violate constraints of which this user is, and must be, unaware.

Just as for insertions into a restriction, Chris Date doesn't allow views to be updated for users for which information is hidden

This is quite limiting. According to Chris Date, logical independence for writers is only available for information preserving views.

Avoiding the view update problem

Proposal:

generalise/simplify the mathematical model of a relation by dropping the constraint that all tuples are total functions on the set of attribute names in the relation heading. Call this a partial relation to distinguish from the conventional notion of a total relation.
the normal definition of projection on total relations generalises to partial relations
define the remaining RA operations on total relations as normal
introduce a monadic operator on partial relations which strips away the tuples which aren't total. This operator maps a partial relation to a total relation. Below I extend Tutorial D with a postfix operator * for this operation.
only relvar expressions giving total relations record the extension of some predicate

Consider there is a partial base relvar R with attributes X,M,V. The predicates of various relvar expressions are defined as follows:

Expression	Predicate
R	partial so no associated predicate is defined
R{X,M}	partial so no associated predicate is defined
R{X,M}*	It is known that object X has mass M
R{X,V}*	It is known that object X has volume V
R* = R{X,M}* JOIN R{X,V}*	It is known that object X has mass M and volume V
R*{X,M}	Exists V such that it is known that object X has mass M and volume V
R{X,M}* MINUS R*{X,M}	It is known that object X has mass M and the volume of X is unknown

This seems related to the no-information interpretation of NULL (see Database Relations with Null Values 1982 by Carlo Zaniolo)

Note that * and projection don't commute, and this is illustrated with the distinction between the predicates of R{X,M}* and R*{X,M}. R*{X,M} is a subset of R{X,M}*.

R{X,M}* MINUS R*{X,M} gives the tuples in R for which a value for V (and only V) is missing. This expression gives the extension of the predicate: It is known that object X has mass M and there doesn't exist any V such that it is known that X has volume V i.e. It is known that object X has mass M and the volume of X is unknown.

This solves the view update problem for updating a projection, because updates target an underlying partial relvar which allows projections on it to be updated independently without violating constraints.

Example : impossibility of updating a projection

Let it be assumed that every employee is identified by an employee number and has both a name and address. Consider a schema with the following relation:

EMP{E#, NAME, ADDRESS}

with key {E#}

having predicate there is an employee with id E# having name NAME and address ADDRESS

Consider that we define views which are projections on EMP as follows:

V1 = EMP{E#,NAME}

V2 = EMP{E#,ADDRESS}

Question: Are these views independently updatable?

Obviously they aren't, because there is a constraint connecting them:

V1{E#} = V2{E#}

This screws up independent inserts/deletes on V1,V2. It is not possible to insert or delete a tuple from V1 without also inserting or deleting a tuple from V2.

Solution

Consider the following predicates:

P1: It is known there exists an employee E# that has name NAME

P2: It is known there exists an employee E# that has address ADDRESS

P3: It is known there exists an employee E# that has name NAME and address ADDRESS

Note that P1 = P1 AND P2

Let the corresponding relvars be R1,R2,R3. Then R3 = R1 JOIN R2. Note that R1,R2 are not projections of R3.

Forget about updating the join R3. Also forget about updating the projections of the join. It's not needed.

Instead select orthogonal base relvars R1,R2 in the first place, and calculate read only R3 from them. This represents a factorisation of the (known) information into two base relvars which are independently updatable (i.e. assuming there is a Cartesian factorisation of the relation which is the extension of the db constraint on the dbval where a dbval is a tuple of relations).

Since R1 and R2 are base relvars it is assumed a change to one base relvar doesn't silently change the other.

For a database view, each Cartesian prime factor can be one of the following:

unavailable
read-only; or
read-write.

That gives 3² = 9 combinations (i.e. 9 different logical views) in this case because there are two prime factors.

R3 isn't atomic in this sense so forget about making it the target of updates. Also the projections of R3 are coupled and can't be updated independently because the extension of the constraint over them can't be Cartesian factorised. They have comparatively subtle predicates:

The predicate of R3{E#,NAME} is: there exists ADDRESS such that it is known there exists an employee E# that has name NAME and address ADDRESS

predicate of R3{E#,ADDRESS} is: there exists NAME such that it is known there exists an employee E# that has name NAME and address ADDRESS

Bijection to another representation

A bijection between a set of base relvars and another set of relvars means we have an alternative fully updatable representation of the same information. A change to a single relvar in one representation might change more than one relvar in the other.

For example there is a bijection between {R1,R2} and {S1,S2,S3} as follows:

S1 = R1 MINUS (R1 JOIN R2){E#,NAME}

S2 = R2 MINUS (R1 JOIN R2){E#,ADDRESS}

S3 = R1 JOIN R2

(with certain constraints on S1,S2,S3 which I won't bother to state).

{R1,R2} is better for writers because the constraint is Cartesian factorisable into two prime factors, whereas {S1,S2,S3} only gives one prime factor.

The representation {S1,S2,S3} may be better for readers, and more appropriate for the physical representation.

Example again

Let R be a base relvar with attributes named x,y,z. Let {x} be a candidate key of R.

Let an alternative representation of R involve a pair of relvars (R1,R2), where R1 has attributes x,y and R2 has attributes x,z. Assume {x} is a candidate key of both R1 and R2. In order to achieve a bijection R <--> (R1,R2) a constraint R1{x} = R2{x} must be imposed. The bijection involves

    R = R1 JOIN R2
    R1 = R{x,y}
    R2 = R{x,z}

With the constraint R1{x} = R2{x} it won't generally be possible to perform updates on just R1 without constraint violations. So apparently there's a view update problem. However, I think it's really a non-problem. The actual problem is asynchronous updates. The solution is:

Drop the constraint on R1 and R2 (since apparently there is a requirement to be able to update R1 and R2 independently, so the constraint cannot reflect the actual requirements - evidently because updates to R1,R2 (by different users) are not synchronised);
At least conceptually (and perhaps even physically) split the database into at least two separate databases that are updated independently (i.e. asynchronously). R1,R2 appear as base relvars in different databases. Reap the performance benefits (scalability, concurrency, locality, autonomy and lower latency);
Provide the means to form read only views over multiple databases that are updated asynchronously (e.g. master-slave replication using multiple asynchronous unidirectional message queues makes this easy); and
Consider R to be a non-updatable relvar satisfying R = R1 JOIN R2.

The effect is that the constraint is enforced in a non-updatable calculated relvar, so an application that cares about a self consistent overall picture (say for producing a report) simply uses R (or R1' = R{x,y} and R2' = R{x,z}) instead of R1,R2 directly.

Other approaches are possible as well (e.g. users explicitly mark valid snapshots of the overall data, analogously to the way programmers only commit a successfully compiling working copy of source code into a version control system).

Asynchronous updates versus modal logic

Saying "it is known that p" means "it is necessary that p". Similarly "it is not known that p" means "it is possible that not p".

Of course the RM is orthogonal to how the external predicates are defined, so as far as the dbms is concerned it's just conventional FOL.

I'm wondering whether this modal logic approach is related to my ideas about dropping constraints in the presence of asynchronous updates.

For example, going back to my original example where we had R(x,y,z) with key {x} and an alternative representation using two relvars R1(x,y), R2(x,z) each with key {x} and constraint R1{x} = R2{x}.

There is a bijection involving

    R = R1 JOIN R2
    R1 = R{x,y}
    R2 = R{x,z}.

Let the original predicates be:

    for R:   P(x,y,z)
    for R1: exists z such that P(x,y,z)
    for R2: exists y such that P(x,y,z)

It seems we can "fiddle with the predicates" as follows

    for R:   it is known that P(x,y,z)
    for R1: it is known that (exists z such that P(x,y,z))
    for R2: it is known that (exists y such that P(x,y,z))

In that case we still expect R = R1 JOIN R2, but we should drop the constraint R1{x} = R2{x}.

I note that

    R{x,y} = exists z such that (it is known that P(x,y,z))
    R{x,z} = exists y such that (it is known that P(x,y,z))

so it is clear that R{x,y} may not equal R1, and R{x,z} may not equal R2.

The conclusion is that R can be calculated from R1,R2 but not vice versa. Therefore R1,R2 should be the base relvars and R should be a read-only derived relvar.

I find it very interesting that this is exactly what I thought was appropriate when we assume R1,R2 have the original predicates but are updated asynchronously.

It seems that the two situations are "isomorphic".

Example

We should question the need to enforce consistency more often than we do (e.g. in regard to the view update problem).

For example, in certain applications I have no problem with a multi-user system which allows for shared editing on a database with only a very weak notion of consistency.

Stronger forms of consistency that go beyond what's practical to enforce are regarded as just a "status" that the database may or may not be in at a given time.

Such notions of consistency are associated with boolean valued properties calculated on the database state. The users collaboratively edit the data in order to bring it into a state that satisfies the stronger notions of integrity.

This approach seems most important where constraints are complex (an extreme example would be a civil engineering CAD package where validation of a model of a bridge depends on finite element analysis).

I've also wondered about an approach to achieving consistency by using derived variables which always enforce constraints by always dropping information that doesn't conform.

For example, consider that base relvars have no key or referential integrity constraints.

A key constraint is met in a derived relvar by removing the duplicates in the corresponding base relvar. A referential constraint is met in a derived relvar by removing tuples which break the referential constraint in the corresponding base relvar.

If we consider a contentious case for view updates, such as insertions on projection views, I see that a large part of the problem is caused by an integrity constraint of no missing information which is implicit in the schema that defines the base relvar on which the projection view is defined. That problem goes away if we allow for an underlying representation of something that is a set of tuples but isn't in fact necessarily a relation because its tuples may have inconsistent headings (think of it as a way of allowing for missing information without needing to explicitly formalise any concept of NULL).

Let's call it an m-relation (where m reminds us that is allows for missing information).

Be careful with that potentially confusing terminology - a relation is a kind of m-relation! Very importantly, don't make assumptions about the meaning of information that isn't present (e.g. don't assume it means "inapplicable" or "applicable but not recorded" or any other possible reason why the information isn't present).

There is a well defined concept of a projection operator on an m-relation that returns a normal relation.

This is achieved by ignoring those tuples with missing information with respect to the set of attributes being projected. It is possible to relate these projections (which give normal relations) to extensions of certain predicates (I won't formalise it here, but it's not complicated - and you can probably see how it works from my example below).

In particular it is assumed that the "base relvar" specified in the schema is in fact a derived variable (by simply throwing out the tuples from the underlying m-relation without the full heading).

Not surprisingly it is important to distinguish between projections on the m-relation, and projections on the base relvar. Indeed they have slightly different predicates. E.g. compare p1(X,M) and p2(X,M) in the following example, where the "base relvar" specified in the schema is assumed to have predicate p3(X,M,V):

    p1(X,M) 
        = X is known to have mass M

    p2(X,V) 
        = X is known to have volume V

    p3(X,M,V) 
        = X is known to have mass M and volume V 
        = p1(X,M) and p2(X,V)

    p4(X,M) 
        = X is known to have mass M and the volume of X is known
        = p1(X,M) and (exists V p2(X,V))
        = exists V p3(X,M,V)

Let the m-relation for p3(X,M,V) be Rm and for any subset A of {X,M,V} let Rm{A} denote the projection operator applied to the m-relation in the manner described above.

Then it is assumed:

    p1(X,M) has extension R1 =  Rm{X,M}
    p2(X,V) has extension R2 =  Rm{X,V}
    p3(X,M,V) has extension R3 =  Rm{X,V.M) = (R1 JOIN R2)
    p4(X,M) has extension R4 = R3{X,M}

I think the solution to the view update problem in this scenario comes from the fact that a user can satisfactorily update the extension of p1(X.M) without knowing the extension of p2(X,V).

By contrast, in order for a user to know the extension of p4(X,M) they would normally need to know the extension of p3(X,M,V) which flies in the face of the idea that they can usefully update a subset of the overall information.

I note that the latter represents the original intractable problem and isn't solved but instead simply avoided (because in truth the extension of p1(X,M) is not a projection on the extension of p3(X,M,V)).

In case you are wondering about the CWA, I consider it to be upheld here.

For example, if TUP{X=x,M=m} doesn't appear in the extension of p1(X,M) then it can be assumed it is not known that x has mass m.

By the way, my understanding is that this treatment of missing information is equivalent to Carlo Zaniolo's approach to nulls in relations.