Serialisation performance

Performance on basic types

Serialisation and deserialisation rates (in MByte/s) using OutputArchive and InputArchive were measured for arrays of basic types where the size of the array is 1MB, 8MB, 64MB or 512MB.

These measurement were obtained on an i7-7700HQ laptop with 16GB RAM running Windows 10 Pro.

The serialisation and deserialisation rates were equal to within the accuracy of the measurements.

Type Rate for 1MB of data (MByte/s) Rate for 8MB of data (MByte/s) Rate for 64MB of data (MByte/s) Rate for 512MB of data (MByte/s)
bool, int8, uint8 3200 3100 2900 2700
int16, uint16 5900 5700 4800 4500
int32, uint32, float32 12000 11000 7700 6700
int64, uint64, float64 22000 20000 9800 7600
128 bit guid 36000 27000 10000 7800

Serialisation on smaller arrays can achieve high performance because of the L1/L2/L3 caches. The performance of larger buffers is constrained by the sustained memory bandwidth to main memory which is less than 8GB/sec.

Example : serialisation of a TIN

Consider a Triangulated irregular network [] (TIN) which is a data type using the following implementation:


struct Triangle
{
    int v0, v1, v2;
};

struct Point
{
    double x,y,z;
};

struct Tin
{
    std::vector<Point> vertices;
    std::vector<Triangle> triangles;
};

Serialisation with Google Protobuf involves defining corresponding message types in a .proto file:


syntax = "proto3";

package Geometry;

message M_Triangle
{
	int32 v0 = 1;
	int32 v1 = 2;
	int32 v2 = 3;
}

message M_Point 
{
	double x = 1;
	double y = 2;
	double z = 3;
}

message M_Tin 
{
	repeated M_Point vertices = 1;
	repeated M_Triangle triangles = 2;
}

The following results were obtained serialising and deserialising a TIN with 718490 vertices and 1430166 triangles (about 35MB of data) on an i7-7700HQ laptop with 16GB RAM running Windows 10 Pro:

Function Time (milliseconds)
Serialise with Google Protobuf 359
Deserialise with Google Protobuf 348
Serialise with CEDA OutputArchive 4.7
Deserialise with CEDA InputArchive 4.8

In this example CEDA is about 70x faster than Google protobuf at both serialisation and deserialisation.

The protobuf serialisation/deserialisation rate is about 110 MB/s The CEDA serialisation/deserialisation is about 6.8 GB/s

CEDA generated a smaller output:

Format Size (bytes)
Google Protobuf 40764955
CEDA 34405758

The Google Protobuf format is about 18% larger than CEDA.

Summary of disadvantages in using protobuf messages in your C++ application

As someone said on Hacker News

Protobuf's abysmal performance, questionable integration into the C++ type system, append-only expandability, and annoying naming conventions and default values are why I usually try and steer away from it.

and also:

But yes, once you want real high performance, protobuf will disappoint you when you benchmark and find it responsible for all the CPU use.

There are a number of reasons why one might not want to build a large C++ application on top of protobuf message types: