Serialisation
Serialisation refers to the representation of a value of a type as a stream of octets,
for the purpose of persistence in a database and for data exchange (such as between computers).
In CEDA an OutputArchive
(or an Archive but that is deprecated)
is used for serialisation.
Deserialisation involves the reverse - converting a stream of octets back to a value of a type.
In CEDA an InputArchive is used for deserialisation.
Advantages of user-defined serialise functions
CEDA allows for user-defined serialise/deserialise functions to be written on user-defined types.
This can have some advantages over automatically generated serialisation functions:
- Serialisation functions provide a natural place to include schema numbers, allowing for
straightforward schema evolution. Complex schema changes are straightforward.
Schema evolution is
- modular : each class independently manages its own schema.
- lazy : an old schema is only converted to a new schema when the object is next
written back to disk because it has been modified
It is fairly straightforward to remove members, add members and even to modify the types of
members. In the latter case simply deserialise into one or more variables declared on the
frame, then write the code to translate the values into the member variables.
- Serialisation functions allow the programmer to write only a subset of the members. For
example, cached results of calculations may not need to persist.
- Schema remain unaffected by many changes to objects which upset systems like ObjectStore.
The following changes have no impact on schema whatsoever
- renaming classes
- renaming member variables
- changing the order of base classes
- changing the order of member variables
- implementing additional interfaces
- adding/removing non-persistent members such as
- data cache members
- mutexes
- pointers to non-persistable (i.e. transient) objects such as observers
- sockets
- file handles
- GUI elements
- worker threads
- Sometimes a much more compact format is desirable on disk that in memory. For example,
a text component may choose to compress the text when writing the byte stream.
- Serialisation is the ideal abstraction to support communication between processes or
between computers. It allows for an efficient implementation of a middleware layer
supporting efficient transmission of serialisable objects, i.e. it allows for marshalling of
complex objects by value over the wire.
- C++ objects in memory that have poor localisation are brought together into a single
contiguous byte stream. This provides significant clustering opportunities that help to
bridge the disparity in the random access time between main memory and secondary storage.
This is particularly beneficial for strings, linked lists, maps etc.
Binary format
A boolean value is serialised as a single octet equal to 0x00 or ox01
for representing the values false
and true
respectively.
Signed integer types are serialised with a two's complement [] representation.
A float32
is serialised using the binary32 format of the IEEE 754-2008 standard
(see Single-precision floating-point format []).
A float64
is serialised using the binary64 format of the IEEE 754-2008 standard
(see Double-precision floating-point format []).
All basic integer and floating point types are serialied with the least sigificant byte coming first in the sequence of octets.
This representation favours little endian architectures (see Endianness []).
A variable length serialisation is available for (unsigned) integers,
using fewer octets for smaller numbers.
Examples
Type |
Value |
Serialisation |
bool |
false |
[ 0x00 ] |
bool |
true |
[ 0x01 ] |
uint16 |
0x1234 |
[ 0x34, 0x12 ] |
int16 |
-2 |
[ 0xFE, 0xFF ] |
int16 |
32767 |
[ 0xFF, 0x7F ] |
float32 |
0.0f |
[ 0x00, 0x00, 0x00, 0x00 ] |
float32 |
1.0f |
[ 0x00, 0x00, 0x80, 0x3f ] |
float64 |
0.0 |
[ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 ] |
float64 |
1.0 |
[ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x3f ] |