View update problem - insert into a restriction

In the following we discuss an example that appears in chapter 1 of Chris Date's book View Updating and Relational Theory - Solving the View Update Problem.

Let S be a base relvar recording suppliers under contract. Each supplier has one supplier number (SNO), unique to that supplier, one name (SNAME), one status value (STATUS) and one location (CITY).

SNO	SNAME	STATUS	CITY
S1	Smith	20	London
S2	Jones	10	Paris
S3	Blake	30	Paris
S4	Clark	20	London
S5	Adams	30	Athens

Let LS be a view for the London suppliers defined as a restriction on S where the CITY value is London:

SNO	SNAME	STATUS	CITY
S1	Smith	20	London
S4	Clark	20	London

Let NLS be a view for the non-London suppliers defined as a restriction on S where the CITY value isn't London:

SNO	SNAME	STATUS	CITY
S2	Jones	10	Paris
S3	Blake	30	Paris
S5	Adams	30	Athens

{SNO} is a key for each of the tables. {SNO} in each of tables LS and NLS is a foreign key, referencing the key {SNO} in table S.

S and the pair (LS,NLS) are information equivalent meaning there is a bijective mapping between them.

    S ⟼ (LS,NLS)
        LS = (S WHERE CITY = 'London')
        NLS = (S WHERE CITY ≠ 'London')

    (LS,NLS) ⟼ S
        S = LS UNION NLS

Therefore for any update on S there is an equivalent and uniquely determined update on (LS,NLS) and vice versa.

This is consistent with the following compensating actions defined by Chris Date:

    ON DELETE d FROM LS : DELETE d FROM S ;

    ON DELETE d FROM NLS : DELETE d FROM S ; 

    ON DELETE d FROM S : DELETE ( d WHERE CITY = ‘London’ ) FROM LS ,
    DELETE ( d WHERE CITY <> ‘London’ ) FROM NLS ; 

    ON INSERT i INTO LS : INSERT i INTO S ;

    ON INSERT i INTO NLS : INSERT i INTO S ;

    ON INSERT i INTO S : INSERT ( i WHERE CITY = ‘London’ ) INTO LS ,
    INSERT ( i WHERE CITY <> ‘London’ ) INTO NLS ;

It is often pointed out that insert into a union is ambiguous, and yet in this example an insert into S isn't ambiguous even though S = LS UNION NLS. This is of course because we have more information than that - i.e. that LS and NLS are defined as certain restrictions on S.

The mapping from S to LS is lossy (obviously because it doesn't include the information in NLS). In other words this mapping is non-injective.

There is a view update problem for a user that sees only view LS (i.e., not view NLS and not base table S): An INSERT into LS can fail because of a key constraint on S.

As Chris Date says:

A user who only sees view LS and thinks of it as a base relvar mustn't be allowed to INSERT into LS because such operations might violate constraints of which this user is and must be unaware

He claims this doesn't constitute a violation of the Principle of Interchangeability, pointing out the user who sees only LS is seeing something that isn’t information equivalent to the original table S, and so it’s only to be expected that there’ll be certain operations that he or she can’t be allowed to do.

In other words he only upholds the Principle of Interchangeability under information preserving views - i.e. where there are bijections defined between alternative representations.

That's a very reasonable point of view. However it is quite limiting, one would hope to (where possible) support updatable logical views which represent only part of the information contained in the base relvars.

Indeed how is Chris Date's solution useful in practise? There is no advantage in defining updatable views which have more complicated constraints than the base vars. The better solution is to relax constraints on the base vars in the first place, and then the views are also unconstrained and hence independently updatable.

Allowing updatable non-injective views (in some cases)

A common opinion is that the view update problem is both inevitable and unsolvable in the example of a user that only sees view LS.

But consider the idea of applying integrity constraints on derived relvars, and allowing base relvars to be unconstrained (see the discussion on constraints).

Let's now assume LS and NLS are base relvars that can be updated independently. We don't impose a key constraint on their union.

Indeed there might be a business requirement for the London and non-London suppliers to be managed independently - perhaps with different databases at different physical locations. There may also be a requirement they that can operate autonomously - despite network partitions. The users that update LS don't know about the users that update NLS and vice versa. This independence means LS and NLS are updated asynchronously - i.e. without distributed transactions. The separate databases record information about distinct microworlds.

But now suppose there's an additional business requirement to have the full set of suppliers, and the systems that need this data cannot tolerate repeated supplier identifiers.

Suppose this requirement is met by using a read-only view named S which takes the union of LS and NLS but removes the records where the supplier number is duplicated. S can be expressed in terms of the RemoveDuplicates(R,K) function:

    S = RemoveDuplicates( LS UNION NLS, { SNO } )

S has the following properties

S satisfies the key constraint
If LS and NLS are accurate then S is accurate (i.e. records the full list of suppliers)

That's all that we need.

The duplicates in LS UNION NLS can be highlighted to the users of the system in real time - as potential issues with the data.

This is an asynchronous kind of validation, that might become unavailable when there's a network partition, because it's no longer possible to check for conflicting information between the systems. However it doesn't make the participating databases unavailable.

The important thing is that Chris Date's objection is overcome: the user that only sees LS is free to update LS.

In terms of the CAP theorem, this approach emphasises partition tolerance and availability of the particiating databases.