Serialisation and deserialisation rates (in MByte/s) using OutputArchive and InputArchive were measured for arrays of basic types where the size of the array is 1MB, 8MB, 64MB or 512MB.
These measurement were obtained on an i7-7700HQ laptop with 16GB RAM running Windows 10 Pro.
The serialisation and deserialisation rates were equal to within the accuracy of the measurements.
Type | Rate for 1MB of data (MByte/s) | Rate for 8MB of data (MByte/s) | Rate for 64MB of data (MByte/s) | Rate for 512MB of data (MByte/s) |
---|---|---|---|---|
bool, int8, uint8 | 3200 | 3100 | 2900 | 2700 |
int16, uint16 | 5900 | 5700 | 4800 | 4500 |
int32, uint32, float32 | 12000 | 11000 | 7700 | 6700 |
int64, uint64, float64 | 22000 | 20000 | 9800 | 7600 |
128 bit guid | 36000 | 27000 | 10000 | 7800 |
Serialisation on smaller arrays can achieve high performance because of the L1/L2/L3 caches. The performance of larger buffers is constrained by the sustained memory bandwidth to main memory which is less than 8GB/sec.
Consider a Triangulated irregular network [] (TIN) which is a data type using the following implementation:
struct Triangle
{
int v0, v1, v2;
};
struct Point
{
double x,y,z;
};
struct Tin
{
std::vector<Point> vertices;
std::vector<Triangle> triangles;
};
Serialisation with Google Protobuf involves defining corresponding message types in a .proto file:
syntax = "proto3";
package Geometry;
message M_Triangle
{
int32 v0 = 1;
int32 v1 = 2;
int32 v2 = 3;
}
message M_Point
{
double x = 1;
double y = 2;
double z = 3;
}
message M_Tin
{
repeated M_Point vertices = 1;
repeated M_Triangle triangles = 2;
}
The following results were obtained serialising and deserialising a TIN with 718490 vertices and 1430166 triangles (about 35MB of data) on an i7-7700HQ laptop with 16GB RAM running Windows 10 Pro:
Function | Time (milliseconds) |
---|---|
Serialise with Google Protobuf | 359 |
Deserialise with Google Protobuf | 348 |
Serialise with CEDA OutputArchive | 4.7 |
Deserialise with CEDA InputArchive | 4.8 |
In this example CEDA is about 70x faster than Google protobuf at both serialisation and deserialisation.
The protobuf serialisation/deserialisation rate is about 110 MB/s The CEDA serialisation/deserialisation is about 6.8 GB/s
CEDA generated a smaller output:
Format | Size (bytes) |
---|---|
Google Protobuf | 40764955 |
CEDA | 34405758 |
The Google Protobuf format is about 18% larger than CEDA.
As someone said on Hacker News
Protobuf's abysmal performance, questionable integration into the C++ type system, append-only expandability, and annoying naming conventions and default values are why I usually try and steer away from it.
and also:
But yes, once you want real high performance, protobuf will disappoint you when you benchmark and find it responsible for all the CPU use.
There are a number of reasons why one might not want to build a large C++ application on top of protobuf message types:
Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.