Serialisation

Serialisation refers to the representation of a value of a type as a stream of octets, for the purpose of persistence in a database and for data exchange (such as between computers). In CEDA an OutputArchive (or an Archive but that is deprecated) is used for serialisation.

Deserialisation involves the reverse - converting a stream of octets back to a value of a type. In CEDA an InputArchive is used for deserialisation.

Advantages of user-defined serialise functions

CEDA allows for user-defined serialise/deserialise functions to be written on user-defined types. This can have some advantages over automatically generated serialisation functions:

Serialisation functions provide a natural place to include schema numbers, allowing for straightforward schema evolution. Complex schema changes are straightforward. Schema evolution is
- modular : each class independently manages its own schema.
- lazy : an old schema is only converted to a new schema when the object is next written back to disk because it has been modified
It is fairly straightforward to remove members, add members and even to modify the types of members. In the latter case simply deserialise into one or more variables declared on the frame, then write the code to translate the values into the member variables.
Serialisation functions allow the programmer to write only a subset of the members. For example, cached results of calculations may not need to persist.
Schema remain unaffected by many changes to objects which upset systems like ObjectStore. The following changes have no impact on schema whatsoever
- renaming classes
- renaming member variables
- changing the order of base classes
- changing the order of member variables
- implementing additional interfaces
- adding/removing non-persistent members such as
  - data cache members
  - mutexes
  - pointers to non-persistable (i.e. transient) objects such as observers
  - sockets
  - file handles
  - GUI elements
  - worker threads
Sometimes a much more compact format is desirable on disk that in memory. For example, a text component may choose to compress the text when writing the byte stream.
Serialisation is the ideal abstraction to support communication between processes or between computers. It allows for an efficient implementation of a middleware layer supporting efficient transmission of serialisable objects, i.e. it allows for marshalling of complex objects by value over the wire.
C++ objects in memory that have poor localisation are brought together into a single contiguous byte stream. This provides significant clustering opportunities that help to bridge the disparity in the random access time between main memory and secondary storage. This is particularly beneficial for strings, linked lists, maps etc.

Binary format

A boolean value is serialised as a single octet equal to 0x00 or ox01 for representing the values false and true respectively.

Signed integer types are serialised with a two's complement [] representation.

A float32 is serialised using the binary32 format of the IEEE 754-2008 standard (see Single-precision floating-point format []).

A float64 is serialised using the binary64 format of the IEEE 754-2008 standard (see Double-precision floating-point format []).

All basic integer and floating point types are serialied with the least sigificant byte coming first in the sequence of octets. This representation favours little endian architectures (see Endianness []).

A variable length serialisation is available for (unsigned) integers, using fewer octets for smaller numbers.

Examples

Type	Value	Serialisation
bool	false	[ 0x00 ]
bool	true	[ 0x01 ]
uint16	0x1234	[ 0x34, 0x12 ]
int16	-2	[ 0xFE, 0xFF ]
int16	32767	[ 0xFF, 0x7F ]
float32	0.0f	[ 0x00, 0x00, 0x00, 0x00 ]
float32	1.0f	[ 0x00, 0x00, 0x80, 0x3f ]
float64	0.0	[ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 ]
float64	1.0	[ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f ]