CEDA LSS versus Oracle BerkeleyDB

CEDA has a high performance storage engine which features atomic transactions, automatic crash recovery, incremental backup, slave mirroring, optional transaction durability and concurrent reading with writing. It is well suited to being used as an embedded key-value database, as an alternative to products like the Oracle BerkeleyDB.

The following are the results of a comparison of the write performance of the CEDA LSS against the Oracle BerkeleyDB (C++ version).

The test involved writing a million (key,value) pairs using 1000 atomic transactions (not flushed) on a laptop which is an x64 Windows 7 platform with a pair of SSDs in RAID0 (striped). In both cases the database used a 512MB cache. Note that certainly for the CEDA LSS the file was unbuffered (i.e. not using the Windows file system cache) (see the code used for the test).

Results with 4096 byte mapped values

With 4096 byte mapped values the total amount of data to be written to disk is about 4GB. For both databases this is 8 times the size of the cache, so the test is dominated by the writing of data to non-volatile storage.

BerkeleyDB was not space efficient. It wrote over 12GB to disk (more that 3 times the amount of actual data). By contrast the CEDA space overhead was tiny (about 0.3%).

BerkeleyDB performance was very poor. It took 2 minutes and 20 seconds using over 1 million write operations while CEDA only took 4 seconds usign less than 1000 write operations.

BerkeleyDB used over a million write operations while the CEDA LSS used less than a thousand.

These results were discussed on an Oracle BerkeleyDB forum here.

Results for other sizes of the mapped values are tabulated below. Also shown are results for a B+Tree implemented on top of the CEDA LSS.

BerkeleyDB

    Mapped       Time         Database        Log             __db       Effective  Disk space
    value       (sec)           size         files            files        Rate       wastage
    size                                      size            size       (MB/sec)      factor
----------------------------------------------------------------------------------------------
      4         3.78          30416896      178257920       550969344      3.03        63.30
      8         3.80          34021376      188743680       550969344      4.02        48.36
     16         3.83          45162496      199229440       550969344      5.98        33.14
     32         4.21          72228864      251658240       550969344      9.06        21.87
     64         4.44         102146048      314572800       550969344     15.46        13.44
    128         5.49         230932480      503316480       550969344     23.62        9.45
    256         7.36         421306368      828375040       550969344     34.21        6.82
    512        11.9          678215680     1342177280       550969344     41.67        4.94
   1024        30.9         1756733440     2967470080       550969344     31.85        5.11
   2048        66.4         8226021376     2380267520       550969344     29.53        5.43
   4096       140           8226021376     4424990720       550969344     27.96        3.22
   8192       212          16418021376     8671723520       550969344     36.89        3.13

CEDA LSS

    Mapped       Time         Database       Effective   Disk space
    value       (sec)           size           Rate       wastage
    size                                     (MB/sec)     factor
------------------------------------------------------------------------
      4         0.66          25690112        17.3        2.1408
      8         0.66          29360128        23.1        1.8350
     16         0.66          37748736        34.7        1.5729
     32         0.67          53477376        56.9        1.3369
     64         0.69          85458944        99.5        1.1869
    128         0.74         149422080       175.3        1.0987
    256         0.81         277348352       310.8        1.0506
    512         1.01         533725184       491.0        1.0264
   1024         1.09        1045430272       902.9        1.0130
   2048         1.64        2069364736      1195.6        1.0065
   4096         3.99        4117757952       980.9        1.0034
   8192        10.1         8213495808       774.3        1.0016

CEDA B+Tree implemented on top of the LSS

    Mapped     Time      Database    Effective   Disk space
    Value      (sec)       Size      date rate   wastage
    Size                              (MB/s)     factor
    ----------------------------------------------------
    4          0.26      12582912      44.0      1.0486
    8          0.26      16777216      58.7      1.0486
    16         0.27      24641536      84.8      1.0267
    32         0.30      40370176     127.2      1.0093
    64         0.38      72876032     180.7      1.0122
    128        0.50     136839168     259.4      1.0062
    256        0.76     264765440     331.3      1.0029
    512        1.35     520617984     367.3      1.0012
    1024       2.24    1032847360     439.4      1.0008
    2048       3.86    2057830400     508.0      1.0009
    4096       6.87    4106747904     569.7      1.0007
    8192      14.7     8217165824     532.0      1.0021

BerkeleyDB Test Code


// Test code minus error handling and timing:
void BerkeleyTest(int objectSize)
{
    const char* environPath = "env";
    const char* dbPath = "my_db.db";
    DbEnv env(0);
    env.open(environPath, DB_CREATE | DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL | DB_INIT_TXN, 0);
    Db database(&env, 0);
    database.open(NULL,dbPath,NULL,DB_BTREE,DB_CREATE | DB_AUTO_COMMIT,0);
    __int64 keyid = 0;
    std::vector buffer(objectSize);

    // note: only this for loop is being timed
    for (int i=0 ; i < 1000 ; ++i)
    {
        DbTxn* txn = NULL;
        env.txn_begin(NULL, &txn, 0);
        for (int j=0 ; j < 1000 ; ++j)
        {
            Dbt key(&keyid, sizeof(keyid));
            Dbt data(buffer.data(),objectSize);
            database.put(txn, &key, &data, DB_NOOVERWRITE);
            ++keyid;
        }
        txn->commit(0);
    }

    database.close(0);
    env.close(0);
}

# DB_CONFIG
set_cachesize   0       536870912        0
set_flags       DB_TXN_NOSYNC
set_lg_regionmax        1048576
set_lg_max              10485760
set_lg_bsize            2097152

Comparison of I/O

The total number of I/O operations and total I/O bytes for the process were recorded using the Windows Task Manager. The numbers in both the following tables seem repeatable down to the last digit.

BerkeleyDB

    Mapped        BDB          BDB        BDB       BDB
    value         num          num        read      write
    size          reads       writes      bytes     bytes
-------------------------------------------------------------
      4            25          3816       19770    201576287
      8            26          4261       19770    212867996
     16            27          5631       19770    243405945
     32            32          8961       19770    314159978
     64            38         12649       19770    406681096
    128            56         28481       19770    731210030
    256            87         51905       19770   1244327637
    512         23979        107405   195341626   2215365406
   1024        360135        575797  2947861818   7660111973
   2048          3071       1008353    23252282  10624106569
   4096          3266       1009525    23252282  12672112224
   8192         30287       2038569   241290554  25322162241

These numbers reveal inherrent inefficiencies in BDB, particularly for 1024 byte mapped values. It is reading 3x the amount of data it is supposed to be writing!

CEDA LSS

    Mapped         LSS         LSS        LSS       LSS
    value          num         num        read      write
    size          reads       writes      bytes     bytes
-----------------------------------------------------------------
      4             0           11         0        25229824
      8             0           12         0        29229568
     16             0           14         0        37230080
     32             0           18         0        53230592
     64             0           26         0        85231104
    128             0           41         0       149232128
    256             0           72         0       277233664
    512             0          133         0       533237760
   1024             0          255         0      1045244928
   2048             0          499         0      2069260288
   4096             0          988         0      4117327872
   8192             0         1968         0      8213450240