CEDA LSS API

The LSS (Log Structured Store) is a persistent store for arbitrary sized binary objects, referred to as serial elements.

The LSS is supported on all flavors of 32 bit and 64 bit Windows from Windows 95 onwards. The store is written to the hard-disk (which could be FAT32 or NTFS) as a single file. This file grows as required to accommodate new data written to the store.

Public header

The API is defined in the header ILogStructuredStore.h, here it is reproduced without all the comments and other distractions and simplified slightly:


namespace ceda
{
    typedef uint64 Seid;
    typedef uint32 SeidPart;
    typedef SeidPart SeidHigh;
    typedef SeidPart SeidLow;
    const Seid ROOT_SEID = 1;
    
    enum EOpenMode
    {
        OM_CREATE_NEW,
        OM_CREATE_ALWAYS,
        OM_OPEN_EXISTING,
        OM_OPEN_EXISTING_READ_ONLY,
        OM_OPEN_EXISTING_SHARED_READ,
        OM_OPEN_ALWAYS,
        OM_DELETE_EXISTING,
    };

    struct LssSettings
    {
        int flushTimeMilliSec;
        double cleanerUtilisationPercent;
        bool enableFileBuffering;
        bool enableWriteThrough;
        int maxNumSegmentsInCache;
        int numSegmentsPerCheckPoint;
        int segmentSize;
        bool forceIncrementMSSN;
        bool validateSUTDuringCheckPoint;
    };

    struct IInputStream
    {
        virtual ssize_t ReadStream(void* buffer, ssize_t numBytesRequested) = 0;
        virtual void Close() = 0;
    };

    struct IOutputStream
    {
        virtual void WriteStream(const void* buffer, ssize_t numBytes) = 0;
        virtual void FlushStream() = 0;
        virtual void Close() = 0;
    };

    struct ReadOnlyBuffer
    {
        const octet_t* buffer;
        ssize_t size;
    };

    struct IContiguousSerialElement
    {
        virtual void Close() = 0;
        virtual ReadOnlyBuffer GetBuffer() const = 0;
    };

    struct ILssTransaction
    {
        virtual void Close() = 0;
        virtual void FlushWhenClose() = 0;
        virtual IOutputStream* WriteSerialElement(Seid seid) = 0;
        virtual bool DeleteSerialElement(Seid seid) = 0;
        virtual void DeleteSeidSpace(SeidHigh seidHigh) = 0;
    };

    struct ILogStructuredStore
    {
        virtual void Close() = 0;
        virtual void SetMaxNumSegmentsInCache(int n) = 0;
        virtual SeidHigh CreateSeidSpace() = 0;
        virtual Seid AllocateSeid(SeidHigh seidHigh) = 0;
        virtual bool ReserveSeid(Seid seid) = 0;
        virtual SeidLow PeekNextSeidLow(SeidHigh seidHigh) = 0;
        virtual bool AllocateAffiliateSeid(Seid& seid) = 0;
        virtual void GetSeidsInSeidSpace(xvector<SeidLow>& seidLows, SeidHigh seidHigh) const = 0;
        virtual bool SerialElementExists(Seid seid) const = 0;
        virtual IInputStream* ReadSerialElement(Seid seid) const = 0;
        virtual IContiguousSerialElement* ReadContiguousSerialElement(Seid seid) const = 0;
        virtual ILssTransaction* OpenTransaction() = 0;
    };

    struct ApplyDeltasInfo
    {
        int cpsn1;
        int cpsn2;
        int64 txsn1;
        int64 txsn2;
    };

    ILogStructuredStore* CreateOrOpenLSS(
        const char* lssPath,
        const char* deltasDirPath,
        bool& createdNew,
        EOpenMode openMode,
        const LssSettings& settings);

    void ApplyDeltasToLSS(
        const LssSettings& settings,
        const char* level0Path,
        const char* deltasDirPath,
        ApplyDeltasInfo& info,
        int cpsn2 = -1);

    void LssCopy(const LssSettings& settings, const char* lssPathSrc, const char* lssPathTgt);
}

Simple example


const char* path = "mydatabase.lss";
LssSettings settings;
bool createdNew;
ILogStructuredStore* lss = CreateOrOpenLSS(path, nullptr, createdNew, OM_OPEN_ALWAYS, settings);
const int BUFSIZE = 20;
if (createdNew)
{
    SeidHigh seidHigh = lss->CreateSeidSpace();
    Seid seid = lss->AllocateSeid(seidHigh);
    ILssTransaction* txn = lss->OpenTransaction();
    octet_t buffer[BUFSIZE];
    for (int i=0 ; i < BUFSIZE ; ++i) buffer[i] = (octet_t) i;
    IOutputStream* os = txn->WriteSerialElement(seid);
    os->WriteStream(buffer, BUFSIZE);
    os->Close();
    txn->Close();
}
else
{
    Seid seid = ROOT_SEID;
    IInputStream* is = lss->ReadSerialElement(seid);
    octet_t buffer[BUFSIZE];
    ssize_t numBytesRead = is->ReadStream(buffer, BUFSIZE);
    is->Close();
}
lss->Close();

The serial elements are written to the store within a transaction, and the LSS ensures atomicity of each transaction. i.e. all changes made to the store by a transaction are applied or else none are applied. For example, a transaction could fail to commit because of a power failure. The next time the store is opened, any uncommitted transactions are rolled back. This "recovery scan" is performed automatically whenever the store is opened. The time for a recovery scan is bounded, and on typical hardware will never take longer than a few seconds.

Serial elements are read or written as a byte stream, in a manner similar to the C functions fread() and fwrite(). The store can deal efficiently with very small and very large serial elements. Assuming compacting is good, the overhead is of the order of 20 bytes per object.

The LSS achieves excellent write performance, typically limited only by the maximum transfer rate of the hard-disk. Disk head seeks during writing of data are kept to a minimum by writing new data to the end of the log using large segments. By default segments are 512 kbyte.

When the LSS is opened, a background thread is automatically started that cleans segments with a poor utilisation (i.e. below a preset threshold). The data on a segment to be cleaned is written to the end of the log, allowing the segment to be returned to an internal free segment pool. Because of this, users of the LSS never need to concern themselves with "fragmentation" of the store.

However, a user of the LSS needs to be concerned with clustering related data together, in order to achieve maximum read performance. This is essentially achieved by writing related data close together in time (so the related serial elements tend to be written to the same segments). Note that rewriting individual serial elements over time has the effect of upsetting the clustering. Reclustering simply involves rewriting a collection of related serial elements to the end of the log. The background cleaner thread will automatically defragment the store.

It is important to note that the LSS is not concerned with concurrency control on access to the serial elements. It certainly doesn't provide strict two phase locking, or any other locking protocol to enforce serialisation of transactions. Instead, it assumes that a layer above the LSS is responsible for concurrency control.

Serial elements are identified by a 64 bit Seid (Serial element identifier). The LSS provides a mechanism for allocating new, unused Seids as required. The number of serial elements is actually limited by the maximum size of the store which is about 500 TB, rather than the size of the 64 bit Seid space.

Links

API

The following document the CEDA LSS API: