CEDA has a high performance storage engine which features atomic transactions, automatic crash recovery, incremental backup, slave mirroring, optional transaction durability and concurrent reading with writing. It is well suited to being used as an embedded key-value database, as an alternative to products like the Oracle BerkeleyDB.
The following are the results of a comparison of the write performance of the CEDA LSS against the Oracle BerkeleyDB (C++ version).
The test involved writing a million (key,value) pairs using 1000 atomic transactions (not flushed) on a laptop which is an x64 Windows 7 platform with a pair of SSDs in RAID0 (striped). In both cases the database used a 512MB cache. Note that certainly for the CEDA LSS the file was unbuffered (i.e. not using the Windows file system cache) (see the code used for the test).
With 4096 byte mapped values the total amount of data to be written to disk is about 4GB. For both databases this is 8 times the size of the cache, so the test is dominated by the writing of data to non-volatile storage.
BerkeleyDB was not space efficient. It wrote over 12GB to disk (more that 3 times the amount of actual data). By contrast the CEDA space overhead was tiny (about 0.3%).
BerkeleyDB performance was very poor. It took 2 minutes and 20 seconds using over 1 million write operations while CEDA only took 4 seconds usign less than 1000 write operations.
BerkeleyDB used over a million write operations while the CEDA LSS used less than a thousand.
These results were discussed on an Oracle BerkeleyDB forum here.
Results for other sizes of the mapped values are tabulated below. Also shown are results for a B+Tree implemented on top of the CEDA LSS.
Mapped Time Database Log __db Effective Disk space value (sec) size files files Rate wastage size size size (MB/sec) factor ---------------------------------------------------------------------------------------------- 4 3.78 30416896 178257920 550969344 3.03 63.30 8 3.80 34021376 188743680 550969344 4.02 48.36 16 3.83 45162496 199229440 550969344 5.98 33.14 32 4.21 72228864 251658240 550969344 9.06 21.87 64 4.44 102146048 314572800 550969344 15.46 13.44 128 5.49 230932480 503316480 550969344 23.62 9.45 256 7.36 421306368 828375040 550969344 34.21 6.82 512 11.9 678215680 1342177280 550969344 41.67 4.94 1024 30.9 1756733440 2967470080 550969344 31.85 5.11 2048 66.4 8226021376 2380267520 550969344 29.53 5.43 4096 140 8226021376 4424990720 550969344 27.96 3.22 8192 212 16418021376 8671723520 550969344 36.89 3.13
Mapped Time Database Effective Disk space value (sec) size Rate wastage size (MB/sec) factor ------------------------------------------------------------------------ 4 0.66 25690112 17.3 2.1408 8 0.66 29360128 23.1 1.8350 16 0.66 37748736 34.7 1.5729 32 0.67 53477376 56.9 1.3369 64 0.69 85458944 99.5 1.1869 128 0.74 149422080 175.3 1.0987 256 0.81 277348352 310.8 1.0506 512 1.01 533725184 491.0 1.0264 1024 1.09 1045430272 902.9 1.0130 2048 1.64 2069364736 1195.6 1.0065 4096 3.99 4117757952 980.9 1.0034 8192 10.1 8213495808 774.3 1.0016
Mapped Time Database Effective Disk space Value (sec) Size date rate wastage Size (MB/s) factor ---------------------------------------------------- 4 0.26 12582912 44.0 1.0486 8 0.26 16777216 58.7 1.0486 16 0.27 24641536 84.8 1.0267 32 0.30 40370176 127.2 1.0093 64 0.38 72876032 180.7 1.0122 128 0.50 136839168 259.4 1.0062 256 0.76 264765440 331.3 1.0029 512 1.35 520617984 367.3 1.0012 1024 2.24 1032847360 439.4 1.0008 2048 3.86 2057830400 508.0 1.0009 4096 6.87 4106747904 569.7 1.0007 8192 14.7 8217165824 532.0 1.0021
// Test code minus error handling and timing:
void BerkeleyTest(int objectSize)
{
const char* environPath = "env";
const char* dbPath = "my_db.db";
DbEnv env(0);
env.open(environPath, DB_CREATE | DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL | DB_INIT_TXN, 0);
Db database(&env, 0);
database.open(NULL,dbPath,NULL,DB_BTREE,DB_CREATE | DB_AUTO_COMMIT,0);
__int64 keyid = 0;
std::vector buffer(objectSize);
// note: only this for loop is being timed
for (int i=0 ; i < 1000 ; ++i)
{
DbTxn* txn = NULL;
env.txn_begin(NULL, &txn, 0);
for (int j=0 ; j < 1000 ; ++j)
{
Dbt key(&keyid, sizeof(keyid));
Dbt data(buffer.data(),objectSize);
database.put(txn, &key, &data, DB_NOOVERWRITE);
++keyid;
}
txn->commit(0);
}
database.close(0);
env.close(0);
}
# DB_CONFIG
set_cachesize 0 536870912 0
set_flags DB_TXN_NOSYNC
set_lg_regionmax 1048576
set_lg_max 10485760
set_lg_bsize 2097152
The total number of I/O operations and total I/O bytes for the process were recorded using the Windows Task Manager. The numbers in both the following tables seem repeatable down to the last digit.
Mapped BDB BDB BDB BDB value num num read write size reads writes bytes bytes ------------------------------------------------------------- 4 25 3816 19770 201576287 8 26 4261 19770 212867996 16 27 5631 19770 243405945 32 32 8961 19770 314159978 64 38 12649 19770 406681096 128 56 28481 19770 731210030 256 87 51905 19770 1244327637 512 23979 107405 195341626 2215365406 1024 360135 575797 2947861818 7660111973 2048 3071 1008353 23252282 10624106569 4096 3266 1009525 23252282 12672112224 8192 30287 2038569 241290554 25322162241
These numbers reveal inherrent inefficiencies in BDB, particularly for 1024 byte mapped values. It is reading 3x the amount of data it is supposed to be writing!
Mapped LSS LSS LSS LSS value num num read write size reads writes bytes bytes ----------------------------------------------------------------- 4 0 11 0 25229824 8 0 12 0 29229568 16 0 14 0 37230080 32 0 18 0 53230592 64 0 26 0 85231104 128 0 41 0 149232128 256 0 72 0 277233664 512 0 133 0 533237760 1024 0 255 0 1045244928 2048 0 499 0 2069260288 4096 0 988 0 4117327872 8192 0 1968 0 8213450240