C++ classes for High Performance Extensible Storage Engine Development
The Harlinn.Common.Core library includes C++ classes that makes it easy to use MicroSofts’ Extensible Storage Engine (ESE) in your apps. The classes are designed to be used in performant, highly concurrent, server development.
Background
Several years ago, while working on a system that stored, retrieved, analyzed large amounts of timeseries data in an Oracle database, we found that it was impossible to meet the performance requirements set for the project using the existing database schema as deployed on an Oracle RDBMS. Changing the database schema was not an option since that would also require changes to several large and critical systems.
After trying out several NoSQL database engines we found that MicroSofts’ Extensible Storage Engine (ESE) was a good fit for our needs.
Our main reasons for choosing ESE was that:
-
It’s installed with Windows, so it gets updates and bugfixes along with the rest of the OS, and it is used by well known Microsoft products such as:
- Microsoft Exchange
- Active Directory
- Windows Search
- It is reliable.
- It’s fast - inserting more than 100 000 records per second is not unusual.
- There are no additional costs.
This article shows how to:
- Initialize the database engine and create a session
- Create a database
- Open an existing database
- Create a table
- Open an existing table
- Add columns to table
- Create a primary index for a table
- Insert rows into a table
- Read column data
- Search for data
- Iterate over a table, or a range of rows, forwards and backwards
- Updating a Row in a Table
- Deleting a Row in a Table
- Use transactions
The Extensible Storage Engine has a C based API, and after a while we decided that it would be beneficial for the progress of the project the create a C++ wrapper around the C API to simplify the code base and reduce programming errors. This turned out to be an excellent idea, as it really simplified and reduced the size of the ESE related code.
The C++ wrappers that Harlinn.Common.Core provides for the ESE C API, are a complete rewrite and an improvement compared to the wrappers I developed for that project, since:
- They work nicely with the standard C++ libraries.
- Use move semantics to safely and efficiently manage the ESE resources.
To use the C++ wrappers for Extensible Storage Engine you need to:
- Include the
HCCEse.h
header file and link with theHarlinn.Common.Core.lib
library. - Initialize and configure an instance.
- Create a session.
- Create or open a database in this session.
Here is an example that does this, and:
- Creates a table.
- Adds a boolean column.
- Inserts two rows of data
- Reads the two rows of data from the table and verifies that the retrieved data is as expected.
// Initialize the Extensible Storage Engine
Ese::Instance instance( L"TestInstance" );
// Set the path to the directory that will contain
// the checkpoint file for the instance
instance.SetSystemPath( DatabaseSystemPath );
// Set the path to the directory that will contain
// the transaction log files for the instance
instance.SetLogFilePath( DatabaseLogfilePath );
// Create the session
auto session = instance.BeginSession( );
// Create or open the database overwriting any
// existing data
auto database = session.CreateDatabase( DatabasePath,
Ese::CreateDatabaseFlags::OverwriteExisting );
// Create a table
auto table1 = database.CreateTable( "table1" );
// Add a column
auto columnId = table1.AddBoolean( "fieldName" );
// A lambda funtion that inserts one row in the table.
auto DoInsert = [ &table1, columnId ]( )
{
// Insert a new row
table1.Insert( );
// Assign a value to the column
table1.SetColumn( columnId, true );
// Write the data to the table
table1.Update( );
};
// Insert two rows of data
DoInsert( );
DoInsert( );
// Move the table cursor to the first row
if ( table1.MoveFirst( ) )
{
// Retrieve the column value as a std::optional<bool>
auto value1 = table1.As<bool>( columnId );
// Move the table cursor to the next row
if ( table1.MoveNext( ) )
{
auto value2 = false;
// Retrieve the column value by passing a reference
// to the variable that will be assigned the column
// value
auto columnNotNull = table1.Read( columnId, value2 );
// Verify that the data is as expected
assert( columnNotNull == value1.has_value( ) == true );
assert( value2 == value1.value( ) == true );
// Verify that there are no more rows
assert( table1.MoveNext( ) == false );
}
}
If anything goes wrong, the library will throw an exception, so there is no need for elaborate error checking at each call site, but those exceptions must be handled.
The test code, in EseTests.cpp
, which is included with the source
code, demonstrates how to store and retrieve all datatypes supported
by ESE.
There is also an example that demonstrates the performance of ESE, on my development laptop it:
- Inserts
44 640 000
rows in 286 seconds, at a rate of156 005
rows per second. This done by first searching for a row that matches the primary key. If found, the row is updated, otherwise a new row is inserted into the table. - Reads
44 640 000
rows in 30 seconds, at a rate of1 479 087
rows per second. - Searches for, and retrieves
1 000
subsets of the data, for a total of21 600 000
rows, in27
seconds, at a rate of794 690
rows per second.
The above are excellent numbers, and to see how this works out on your own system, build and execute:
ExampleCoreESE01.exe -c -r -t -d F:\<Path to database directory>\Database.edb
where
- -c tells the program to create a new database
- -d provides the path to the database file
- -r tells the program to replace any existing database at that location
- -t tells the program to execute the performance tests
This requires about 3GB of free space on your hard drive.
The library allows you to use both UTF16, and char based strings by providing overloads that use the Unicode version and the ANSI version of the ESE C API, and selects one or the other based on the string/character type of the arguments were this is applicable.
Harlinn::Common::Core::Ese
There are four major classes in the library that wraps the ESE C API handle types. The classes are move assignable and move constructible, but not copy assignable, and not copy constructible. This design is used to ensure that the lifetime of the handles is convenient to manage in an appropriate way, ensuring that the handles are always closed when the owning objects go out of scope.
Instance
Instance is the root object, and all use of the library should start by creating an instance of this type.
The Instance object holds and manages the lifetime of an ESE instance handle.
Session
A Session object holds and manages the lifetime of an ESE session handle. Sessions provide the transactional context of ESE, and all ESE database operations are performed through a session.
When an instance is used by more than one thread, each thread should have its own session.
Database
A Database object holds and manages the lifetime of a database handle. ESE database handles are used to manage the schema of the database and to manage tables inside the database. Database handles can only be used with the session that was used to create them, and the library helps to facilitate this.
Table
A Table object holds and manages the lifetime of a database cursor handle. ESE cursor handles are used to read row data, search rows; or create, update and delete rows. It is also used to define the columns and indexes for a table. Like database handles, the database cursor handles can only be used with the session that was used to create them, and the library helps to facilitate this too.
First steps
Normally, an application will have a single Ese::Instance
object, and
multiple sessions with one or more open databases in them. To keep
the code for the unit tests simple, I bundled them into a single class called
Engine
that is used for most of them:
class Engine
{
public:
Ese::Instance instance;
Ese::Session session;
Ese::Database database;
Engine( bool createNewDatabase = true )
{
instance = Ese::Instance( "TestInstance" );
// Configure ESE to create any required directory that is
// missing in a file system silently
instance.SetCreatePathIfNotExist( );
// Configure ESE to not display a dialog box in
case of an error.
instance.SetExceptionAction( Ese::ExceptionAction::None );
// Set the path to the directory that will contain
// the checkpoint file for the instance
instance.SetSystemPath( DatabaseSystemPath );
// Set the path to the directory that will contain
// the transaction log files for the instance
instance.SetLogFilePath( DatabaseLogfilePath );
// Create the session
session = instance.BeginSession( );
if ( createNewDatabase )
{
// Create a new database, overwriting any existing
// database.
database =
session.CreateDatabase( DatabasePath,
Ese::CreateDatabaseFlags::OverwriteExisting );
}
else
{
// Attach the database to the session
session.AttachDatabase( DatabasePath );
// Open the database
database = session.OpenDatabase( DatabasePath );
}
}
};
If createNewDatabase
is false
, an instance of the Engine
class will open an
existing database.
The call to instance.SetCreatePathIfNotExist( )
configures ESE to create
any required directory that is missing in a file system silently.
instance.SetExceptionAction( Ese::ExceptionAction::None )
configures ESE
to not display a dialog box in case of error.
instance.SetSystemPath( DatabaseSystemPath )
configures the path to the directory
that will contain the checkpoint file for the instance.
instance.SetLogFilePath( DatabaseLogfilePath )
configures the path to the directory
that will contain the transaction logs for the instance.
When the instance is configured, we’re ready to create the Session object:
session = instance.BeginSession( );
Once we have a Session object, we can create new databases by calling Session::CreateDatabase, or we can attach and open an existing database, as shown above.
With this in place, creating a new table is as easy as:
BOOST_AUTO_TEST_CASE( CreateTableTest1 )
{
Engine engine;
auto& database = engine.database;
auto table1 = database.CreateTable( "table1" );
BOOST_TEST( table1.IsValid( ) );
}
Trivial, so lets look at something useful. Lets say you have the following struct and want to store the data in a table:
struct SensorValue
{
// Identifies the sensor
boost::uuids::uuid Sensor;
// The timestamp for the sensor value
std::chrono::system_clock::time_point Timestamp;
// The flags value
int64_t Flags = 0;
// The sensor value
double Value = 0.0;
};
To keep everything in one place, we extend the Engine class:
class SensorEngine : public Engine
{
public:
using Base = Engine;
When working with ESE I like to keep names short, particularly the column names as this simplifies index creation.
We need a name for the table:
static constexpr char SensorValueTableName[] = "SV";
A name for the primary index:
static constexpr char SensorValueIndexName[] = "ISV";
A name for the sensor column:
static constexpr char SensorColumnName[] = "S";
A name for the timestamp column
static constexpr char TimestampColumnName[] = "T";
A name for the flags column
static constexpr char FlagsColumnName[] = "F";
And, finally a name for the value column:
static constexpr char ValueColumnName[] = "V";
When working with columns in ESE we identity them by their column ids, so we need four column ids, one for each column:
JET_COLUMNID SensorColumnId = 0;
JET_COLUMNID TimestampColumnId = 0;
JET_COLUMNID FlagsColumnId = 0;
JET_COLUMNID ValueColumnId = 0;
We also need a table object:
Ese::Table SensorValues;
The default constructor for Ese::Table
creates an object that is not connected
to an ESE table.
We could skip writing the constructor, since it doesn’t add anything, but I think implementing it makes the intent of the code clearer:
SensorEngine( bool createDatabase = true )
: Base( createDatabase )
{
}
Create a Table
Creating a table is rather straight forward:
void CreateSensorValueTable( )
{
First we need to start a transaction, which will be rolled back automatically
unless we call transaction.Commit( )
when we are done:
auto transaction = session.StartTransaction( );
Then we create the table:
SensorValues = database.CreateTable( SensorValueTableName );
Before adding the four columns:
SensorColumnId = SensorValues.AddGuid( SensorColumnName );
TimestampColumnId = SensorValues.AddDateTime( TimestampColumnName );
FlagsColumnId = SensorValues.AddUInt64( FlagsColumnName );
ValueColumnId = SensorValues.AddDouble( ValueColumnName );
The various AddColumnType functions return the column id for the columns they create.
Now that we have created the table, it’s time to create the primary key, which
is an index with the Ese::IndexFlags::Primary
flag set. Every table should
have a primary key, and if we don’t create one, ESE will create one for us, but
then we don’t have control over its definition.
This "+S\0+T\0"
is the definition we would like to have for the primary key of
the table. Here S
and T
are the column names, and the +
prefix before
the column names tells ESE that want the index in ascending order for that column.
The index field definition for each column is required to be terminated by a \0
value,
and the last field definition must be terminated by a double \0
.
The last parameter to the CreateIndex
function 7
is the number of characters in
the index definition, including the final \0
that C++ adds to zero terminated strings.
SensorValues.CreateIndex( SensorValueIndexName, Ese::IndexFlags::Primary, "+S\0+T\0", 7 );
We are done, and can commit the transaction.
transaction.Commit( );
}
Opening a Table
Opening a Table is easy:
void OpenSensorValueTable( )
{
First we call Database::OpenTable:
SensorValues = database.OpenTable( SensorValueTableName );
Then we call Table::GetColumnId to get the column ids for each of the four columns:
SensorColumnId = SensorValues.GetColumnId( SensorColumnName );
TimestampColumnId = SensorValues.GetColumnId( TimestampColumnName );
FlagsColumnId = SensorValues.GetColumnId( FlagsColumnName );
ValueColumnId = SensorValues.GetColumnId( ValueColumnName );
}
Create, Retrieve, Update and Delete (CRUD)
Now that we have a table to work with, we can implement functions to create, retrieve, update and delete rows from the table.
Creating a new row
Inserting a new row starts with calling the Table::Insert function, before calling one of the Table::SetColumn overloads for each of the columns, and finalizing the operation with a call to Table::Store.
void Insert( const SensorValue& value )
{
SensorValues.Insert( );
SensorValues.SetColumn( SensorColumnId, value.Sensor );
SensorValues.SetColumn( TimestampColumnId, value.Timestamp );
SensorValues.SetColumn( FlagsColumnId, value.Flags );
SensorValues.SetColumn( ValueColumnId, value.Value );
SensorValues.Store( );
}
Read column data
Reading column values is performed by calling one of the
Table::Read overloads. The Table::Read overloads returns
false if the column value is NULL, but since none of the
columns can be set to NULL, the return value can be safely
ignored. First we create a function that reads the value
stored in the S
column:
boost::uuids::uuid Sensor( ) const
{
boost::uuids::uuid result;
SensorValues.Read( SensorColumnId, result );
return result;
}
Then we create a function that reads the value
stored in the T
column:
std::chrono::system_clock::time_point Timestamp( ) const
{
std::chrono::system_clock::time_point result;
SensorValues.Read( TimestampColumnId, result );
return result;
}
Afterwards we create a function that reads the value
stored in the F
column:
int64_t Flags( ) const
{
int64_t result;
SensorValues.Read( FlagsColumnId, result );
return result;
}
Finally we create a function that reads the value
stored in the V
column:
double Value( ) const
{
double result;
SensorValues.Read( ValueColumnId, result );
return result;
}
To bring it all together we can wrap it up by creating
a Retrieve
function that returns our SensorValue
object.
SensorValue Retrieve( )
{
SensorValue result{ Sensor( ), Timestamp( ), Flags( ), Value( ) };
return result;
}
Searching
To place the table cursor on the first row that contains data for a sensor,
which is the one with the earliest timestamp, we first create a search key
using the one of the Table::MakeKey
function overloads. Search keys are used
to search for a row of data matching an index. When we have one or more secondary
indexes defined for a table, we use the Table::SetCurrentIndex
function to
select the active search index for the table. Since we have only created
a primary index for the table, it will always be the search index. The S
column, which identifies the sensor, is the first member of the index, and
the T
column is the second and final member of the index.
By passing the sensor id to Table::MakeKey
along with Ese::KeyFlags::NewKey
to indicate that this is the first value of a new sear key, and
Ese::KeyFlags::FullColumnStartLimit
indicates this is the last value
we will specify for the search key, and that it will be used to match the
index entry closest to the start of the index that satisfies the key.
In other words, we create a that will only look at the first column of
the index when looking for matches with Table::Seek
or
Table::SetIndexRange
.
bool MoveTo( const boost::uuids::uuid& sensorId ) const
{
SensorValues.MakeKey( sensorId, Ese::KeyFlags::NewKey |
Ese::KeyFlags::FullColumnStartLimit );
Next we call Table::Seek
passing Ese::SeekFlags::GreaterOrEqual
as its
argument because this will place the cursor on the row closest to the beginning
of the active search index that matches the current search key.
auto rc = SensorValues.Seek( Ese::SeekFlags::GreaterOrEqual );
return rc >= Ese::Result::Success;
}
The MoveTo
function can be used to iterate over the stored sensor
values for a sensor like this:
success = engine.MoveTo( firstSensor );
if ( success )
{
do
{
auto sensorId = engine.Sensor( );
if ( sensorId != firstSensor )
{
break;
}
} while ( sensorValues.MoveNext( ) );
}
Next we create a function that locates the sensor value, for a sensor, with the greatest timestamp less or equal to the argument timestamp. In other words, the current sensor value at the requested time.
bool MoveTo( const boost::uuids::uuid& sensorId,
const std::chrono::system_clock::time_point& timestamp,
bool requireFullMatch = false ) const
{
This time we need to create a search key with two entries, one for the sensor id, and one for the timestamp.
SensorValues.MakeKey( sensorId, Ese::KeyFlags::NewKey );
SensorValues.MakeKey( timestamp );
By default we pass Ese::SeekFlags::LessOrEqual
to Table::Seek
because this
will place the cursor on the row that compares less of equal to
to the search key that is closest to the end of the search index. When
requireFullMatch
is true, it’s because we want to move to a record that
exactly matches the search key.
auto rc = SensorValues.Seek( requireFullMatch ?
Ese::SeekFlags::Equal : Ese::SeekFlags::LessOrEqual );
return rc >= Ese::Result::Success;
}
Implementing a function that places the cursor on the record with
the latest timestamp, is similar to creating the
bool MoveTo( const boost::uuids::uuid& sensorId ) const
function:
bool MoveToLast( const boost::uuids::uuid& sensorId ) const
{
SensorValues.MakeKey( sensorId, Ese::KeyFlags::NewKey |
Ese::KeyFlags::FullColumnEndLimit );
auto rc = SensorValues.Seek( Ese::SeekFlags::LessOrEqual );
return rc >= Ese::Result::Success;
}
It would be nice if we could simplify the code used to iterate over the sensor values, like this:
success = engine.Filter( secondSensor );
if ( success )
{
do
{
} while ( sensorValues.MoveNext( ) );
}
This can be accomplished using the Table::SetIndexRange
function.
bool Filter( const boost::uuids::uuid& sensorId ) const
{
auto success = MoveTo( sensorId );
if ( success )
{
SensorValues.MakeKey( sensorId, Ese::KeyFlags::NewKey |
Ese::KeyFlags::FullColumnEndLimit );
Calling Table::SetIndexRange
with Ese::IndexRangeFlags::UpperLimit
creates a limit for the records that can be navigated to using Table::MoveNext
or
Table::Move( JET_MoveNext )
, while Ese::IndexRangeFlags::Inclusive
indicates that
the row matching the search key is included in the set of rows that the cursor
can navigate to when this index range is active.
SensorValues.SetIndexRange( Ese::IndexRangeFlags::Inclusive |
Ese::IndexRangeFlags::UpperLimit );
return true;
}
else
{
return false;
}
}
Similarly, it would be nice to be able to move backwards over the index:
success = engine.ReverseFilter( secondSensor );
if ( success )
{
do
{
} while ( sensorValues.MovePrevious( ) );
}
ReverseFilter
can be implemented like this:
bool ReverseFilter( const boost::uuids::uuid& sensorId ) const
{
auto success = MoveToLast( sensorId );
if ( success )
{
SensorValues.MakeKey( sensorId, Ese::KeyFlags::NewKey | Ese::KeyFlags::FullColumnStartLimit );
SensorValues.SetIndexRange( Ese::IndexRangeFlags::Inclusive );
return true;
}
else
{
return false;
}
}
Updating a Row in a Table
To update a row of data in the table we must first navigate to the
row that we want to update, which is why we have the requireFullMatch
parameter for
bool MoveTo( const boost::uuids::uuid& sensorId,
const std::chrono::system_clock::time_point& timestamp,
bool requireFullMatch = false ) const
This overload of MoveTo
will return false if it cannot find
a row where S
is equal to sensorId
and T
is equal to timestamp
when
true
is passed for the requireFullMatch
parameter.
bool Update( const SensorValue& value )
{
So first we locate the row:
if ( MoveTo( value.Sensor, value.Timestamp, true ) )
{
Then call Table::Replace
to prepare the table for an update
to the columns that are not part of the primary index.
SensorValues.Replace( );
Assign the values that we want to write to the columns of the row:
SensorValues.SetColumn( FlagsColumnId, value.Flags );
SensorValues.SetColumn( ValueColumnId, value.Value );
And finalize the update with a call to Table::Store
:
SensorValues.Store( );
return true;
}
return false;
}
Deleting a Row in a Table
To delete a row in a table we must first position the cursor on the
row we want to delete, and then call Table::Delete
bool Delete( const SensorValue& value )
{
if ( MoveTo( value.Sensor, value.Timestamp, true ) )
{
SensorValues.Delete( );
return true;
}
return false;
}
That’s it, end of class :-)
};