Object caching in CEDA

In-memory data caching is one of the most effective strategies to improve application performance.

Unfortunately a fundamental problem is the issue of cache coherency [].

  1. Much of the time a client-server model is used: There are multiple clients accessing a single database on a server.
  2. Caching data on the clients tends to be very important for performance.
  3. Even if only one client is the writer, all the readers have to deal with a stale cache

This cache coherency problem is such an issue that many applications either don't use local caches, or tend to throw them away a fraction of a second after they are used! The problem is that caches can quickly become stale.

Cache invalidation refers to the idea to invalidate stale cache items. Martin Fowler says one of his favorite sayings is There are only two hard things in Computer Science: cache invalidation and naming things. See Two Hard Things.

In the ORM product called Hibernate, the first level cache is associated with the Session object, which is a short-lived object. As soon as you close the session, all the information held in the cache is lost. There is also a second level cache which is associated with a SessionFactory. This isn't enabled by default, probably because it is dangerous - you need to understand the potential for stale objects if you enable the second level cache.

The NHibernate documentation on caches says

Be careful. Most caches are never aware of changes made to the persistent store by another process (though they may be configured to regularly expire cached data).

Django comes with a cache system. Since a cached value can be stale, there is a TIMEOUT parameter which defaults to 5 minutes.

The python package django-cacheback is a caching library that refreshes stale cache items asynchronously using a Celery or rq task (utilizing django-rq). The key idea being that it’s better to serve a stale item (and populate the cache asynchronously) than block the response process in order to populate the cache synchronously.

The overview for the product memcached says this:

Memcached is, by default, a Least Recently Used cache. Items expire after a specified amount of time. Both of these are elegant solutions to many problems; Expire items after a minute to limit stale data being returned, or flush unused data in an effort to retain frequently requested information.

Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. Redis has a concept of expiring keys. This is yet another timeout approach to dealing with stale data. The blog post A Key Expired In Redis, You Won't Believe What Happened Next by Karan Kamath 2017 describes issues with unexpectedly long times for stale data. The blog states it was expected that the data could be stale by 11 minutes, and it'd be nice to use cache invalidation. Of course, cache invalidation is hard, and greatly complicates application code. It's little wonder it's often not implemented.

Davide Mauri warns about the downsides of caching in his blog post Caching is not what you think it is.

There is a stackoverflow question Dealing with stale data in in-memory caches. The accepted answer is to version the data - i.e. add a version column in the relevant tables, and clients can then first look into the database to look up the current version number and then talk to the cache. However, in a comment someone asks: if you are querying the DB every time in order to get the cache key, is the cache layer still beneficial?

Note that typically, these client side caching solutions not only have problems with stale data, but also end up caching items read from the server at different times; i.e. the cache is not just stale, it doesn't even represent a consistent snapshot.

This is a really unfortunate situation: in-memory data caching is vital for good performance, and yet both existing DBMS and ORM products push the cache coherency problem onto application programmers.

CEDA: a database per client

In CEDA, rather than have multiple clients accessing a single database on a server (i.e. a client-server model) , each client ("peer") has its own copy of the data in its own database, typically on secondary storage on the same physical machine. This involves a peer-to-peer model.

The data is automatically replicated and synchronised through the interchange of operations. This is implemented by the CEDA DBMS.

Applications/services have no need to be concerned with:

There is something wonderful going on: CEDA provides local in-memory data caches (i.e. in the same process as the application or service) without the cache coherency problem. The data in the cache is always valid. There is no concept of stale data in the cache. That means no need for optimistic concurrency control.

This represents a vast improvement over conventional DBMS and ORM technology.

CEDA provides very high performance caching, replication and synchronisation. This leads to a significant simplication of applications/services, and significant increases in performance.

For this reason, when using CEDA, application developers don't need to be concerned with network latency, or the poor performance that arises with "chatty" messaging.

Antipattern: service statelessness principle

See the wikipedia article on the Service statelessness principle [].

Service statelessness is a design principle that is applied within the service-orientation design paradigm, in order to design scalable services by separating them from their state data whenever possible.

It's not clear what this even means. The whole point of services is to implement business processes, and processes must be stateful.

If it means avoiding cached data, then it's a anti-pattern.

Using cached data to amortise network or disk I/O is one of the most important principles for achieving high performance processing.

Service Oriented Architectures : chatty messaging

Consider the blog post Service-Oriented Composition (with video) (2014) by Udi Dahan.

He outlines approaches for achieving loosely-coupled composition of services, without excessive client to server chit-chat, making use of the following picture:

service-oriented-composition

Note there are only two application defined over-the-wire messages (shown in green). So what's the problem here?

It's better to build a system with an emphasis on data, not messaging.

In CEDA there is no need for application level messaging. Instead there are just applications/services which read/write their local databases. They all have data caching. In this case the component from service A on the client raises the event by updating its local database. This operation propagates to the server. Components B and C "handle" the event by writing to their local database. These changes propagate to the client (which for example might show something changed in a UI). This also only involves two "hops".