Anti-pattern: messages types are public interfaces

Assuming message types in distributed systems represent public interfaces between modules is an anti-pattern

Let a module mean a unit of testing. Sometimes a module involves multiple classes because it doesn't make sense to test them in isolation. Coupling refers to the strength of the relationships between modules. Cohesion refers to the degree to which the elements inside a module belong together.

Generally speaking, a software developer should aim to write lots of simple modules which have high cohesion and low coupling.

Some developers believe messages should always represent a public API and producers of messages should be tested independently of consumers of messages. They partition their code accordingly - i.e. they think the way the system is deployed/distributed dictates the partitioning of the code into modules, rather than just leave it up to the inherent coupling/cohesion.

But the partitioning into modules and associated units of testing should typically treat the IPC as just an internal implementation detail. The code structure should be dictated by the structure of the code had it been written without IPC in mind.

That typcially means the code defining the message, the producer and the consumer of the message will all tend to exist in the same module, because they work together cohesively to implement an "external" function on the underlying data type.

There are certain "scaleable" data structures that are used on large database systems. Usually they involve tree structures. For example, BTrees [], BSP trees [], Quadtrees [] and Octrees [].

Since trees are inductive types [], the operations on tree structures are invariably recursive.

If a tree datatype fitted in memory and didn't need to scale such algorithms would use "internal" pure CPU methods calls to recurse. By "internal" I mean that they are only invoked internally as part of an algorithm to implement a useful "external" function on the tree that begins at the root node, such as an insert into a B-Tree.

In a distributed database, a tree may be scaled up to the point that it's partioned across multiple databases.

To allow the algorithms to be distributed message types are introduced for the inputs/outputs of these "internal" functions which represent IPC calls across the process boundaries.

Typically these messages are relatively complicated and contain state which is highly dependent on how the datatype has been implemented.

By any reasonable metrics of coupling/cohesion it isn't appropriate to partition the code in a way that separates callers and callees of these internal functions into separate modules - i.e. where the message types represent an interface between these modules.

It doesn't make sense to test the callers and callees of the internal functions in isolation. The only way to test one is to mock the other. Complicated unit tests, including the need to mock things when unit testing is a symptom of poor design.