InputArchive

An InputArchive is used to deserialise a sequence of octets back into a value of some type.

An efficient binary format is used (i.e. not using printable text). See serialisation.

InputArchive provides extremely high performance.

Deserialisation is the inverse of serialisation. Serialisation typically uses an OutputArchive and deserialisation uses an InputArchive.

An InputArchive is essentially a const octet_t* which represents a pointer to the next octet to be read. There are implicit conversions to and from a const octet_t*.


class InputArchive
{
public:
    // Allow implicit conversions to/from a const octet_t*
    InputArchive(const octet_t* p) : p_(p) {}
    operator const octet_t*() const { return p_; }
private:
    const octet_t* p_;
};

An InputArchive can only be used to read from a contiguous block of memory.

This approach eliminates the need for buffer underflow checks - which would otherwise be needed on every single read operation. This is a significant benefit for the performance.

It is unsafe to use an InputArchive to read a buffer that cannot be trusted because it represents a significant vulnerability for reading past the end of the buffer. This can easily result in a read access violation. When the data is received over a network, deserialisation with an InputArchive must only be used with a trusted source with adequate error checking. For example using Transport Layer Security [].

Since an InputArchive is not associated with some kind of input stream, deserialisation is "pure CPU". That provides better performance, and reduces the amount of code that can throw exceptions - see the interleaving computation and I/O Anti-pattern.

In some cases it may be possible to directly read from a buffer in the underlying input device (since the const octet_t* pointer can be made to point where we like). For example, we might be able to read directly from a segment in the LSS segment cache - since most objects are much smaller than the LSS segment size (e.g. 4MB). This reduces memory consumption, avoids memory allocations, avoids memcpy's and improves CPU memory cache utilisation.

Deserialisation of user defined types

For every type T that supports deserialisation, the following function is implemented.


void Deserialise(InputArchive& ar, const T& x)

Therefore there can be many overloads of Deserialise (i.e. adhoc polymorphism).

The cxUtils library implements the Deserialise function for the following types:

bool, 
char, signed char, unsigned char, char16_t, char32_t, wchar_t, 
short, unsigned short,
int, unsigned int,
long, unsigned long,
long long, unsigned long long,
float, double, long double,
std::pair, std::array, std::basic_string, std::vector, std::deque,
std::forward_list, std::list,
std::map, std::multimap, std::unordered_map, std::unordered_multimap,
std::set, std::multiset, std::unordered_set, std::unordered_multiset,
ceda::xdeque, ceda::VectorOfByte, ceda::xvector, ceda::CompressedInt, ceda::schema_t
ceda::HPTime, ceda::HPTimeStamp, ceda::HPTimeSpan

Extraction operation on an InputArchive

For convenience clients can use operator>> to deserialise variables/objects. E.g.


ar >> x >> y >> z;

is shorthand for


Deserialise(ar,x);
Deserialise(ar,y);
Deserialise(ar,z);

This is achieved with a single implementation of operator>> for an InputArchive:


template<typename T>
inline InputArchive& operator>>(InputArchive& ar, const T& x) 
{ 
    Deserialise(ar,x);
    return ar; 
}

Verification by the ceda framework

A possible concern is the lack of buffer underflow checks - having these checks (at least in a debug build) can be very useful to track down errors in code.

The ceda framework will normally verify that the InputArchive hadn't read past EOF after deserialisation was completed so there's no chance that such errors go undetected.

In fact the ceda framework will normally ensure that application defined deserialise code reads the message, the whole message and nothing but the message. Therefore simply flexing the code unit tests it quite well.