Variable length serialisation of integers

A variable length serialisation is available for (unsigned) integers, using fewer octets for smaller numbers. This has two important benefits:

Although uncompressed integers are serialised in little endian order, the variable length serialisation is big endian (i.e. the most significant octet appears first in the sequence).

The following table shows the serialisation format for various ranges of the integer. The format is shown as a list of octets in square brackets where each octet is expressed in binary with 8 binary digits. The binary digits which are x's mark bits which come from the binary representation of the value, with the most significant bit on the left. For example in the format [110xxxxx, xxxxxxxx, xxxxxxxx] there are three octets and there are 21 x's corresponding to a 21 bit representation of the value.

Range of integer Num serialised octets Serialisation format
0 to 27-1 1 [0xxxxxxx]
27 to 214-1 2 [10xxxxxx, xxxxxxxx]
214 to 221-1 3 [110xxxxx, xxxxxxxx, xxxxxxxx]
221 to 228-1 4 [1110xxxx, xxxxxxxx, xxxxxxxx, xxxxxxxx]
228 to 235-1 5 [11110xxx, xxxxxxxx, xxxxxxxx, xxxxxxxx, xxxxxxxx]

Examples

The following table provides some examples, using hexadecimal notation

Value Serialisation
0x00 [ 0x00 ]
0x05 [ 0x05 ]
0x7F [ 0x7F ]
0x80 [ 0x80, 0x80 ]
0x85 [ 0x80, 0x85 ]
0x3FFF [ 0xBF, 0xFF ]
0x4000 [ 0xC0, 0x40, 0x00 ]
0x4005 [ 0xC0, 0x40, 0x05 ]
0x1FFFFF [ 0xDF, 0xFF, 0xFF ]
0x200000 [ 0xE0, 0x20, 0x00, 0x00 ]
0x212345 [ 0xE0, 0x21, 0x23, 0x45 ]
0xFFFFFFF [ 0xEF, 0xFF, 0xFF, 0xFF ]
0x10000000 [ 0xF0, 0x10, 0x00, 0x00, 0x00 ]
0x12345678 [ 0xF0, 0x12, 0x34, 0x56, 0x78 ]

Serialisation and deserialisation rate

The serialisation and deserialisation rate for a uint32 was measured for values in the range from 0 to 10000000, on a laptop with an i7-4700MQ 2.4GHz processor and 16GB RAM.

A rate of 750 MByte/sec was obtained for both serialisation and deserialisation.