"To be 70 years young is sometimes far more cheerful and hopeful than to be 40 years old." - Oliver Wendell Holmes, Jr.

Table of Contents [Hide/Show]


Edit

Overview

When designing code that needs to access data, we often run into performance problems. While modern database engines can be very fast, it's still much faster to work with memory than it is to request data from the database. The following technique is aimed a specific type of problem: Situations where you need to lookup individual records from a database (key based searches) and you may need to lookup the same record more than once during the lifetime of your code. Typically this applies primarily to reference or lookup tables.

There are 2 extremes possible:

  1. Do each lookup as a database query as needed - if the number of repeat lookups is very low, this is a simple way to handle the situation.
  2. Pre-load the data you might need and do in-memory lookups. If the number of records in the lookup table is small, this is a reasonable way to handle it. But, if the number of records is potentially large, you risk memory issues.

The following technique attempts to take a middle road between these 2: Do each lookup from the database the first time, but then store the result in an in-memory cache for subsequent lookups. The technique includes the ability to limit the memory usage to a certain number of records.

Edit

Specific example

For this example, Assume we are building a system that processes phone call records and we need to lookup information about the extension the call was made from. Since we may be dealing with calls from multiple locations, we'll be using a table that is keyed on 2 fields: SiteID (integer) and Extension (string). In addition to telling me who is responsible for that extension (LastName, FirstName), this table will give us a CostCenter field (string) that we can use to determine how to allocate the costs of this phone call.

So, a typical query to this table would look like this:

SELECT LastName, FirstName, CostCenter
    FROM Extensions
    WHERE SiteID = 1 AND Extension = '4444'

First, we create a class for our cache. We'll define this as a singleton so we only keep one copy of the cache:

Public Class EmployeeCache
    Private Sub New()
    End Sub

Public Shared Instance As New EmployeeCache End Class

The rest of this code goes within the cache class, including a couple of additional class definitions. First, we'll define a data record class to hold each element:

Public Class DataRecord
    Implements IComparable(Of DataRecord)

'key fields Public SiteID As Integer Public Extension As String

Public NotFound As Boolean = True Public DateAddedToCache As Date = Date.Now

'data fields Public LastName As String Public FirstName As String Public CostCenter As String

Public Function CompareTo(ByVal other As DataRecord) As Integer _ Implements IComparable(Of DataRecord).CompareTo

Dim n As Integer = SiteID.CompareTo(other.SiteID) If n = 0 Then n = String.Compare(Extension, other.Extension, True) Return n End Function

Private Function HandleDBNullString(ByVal dbValue As Object) As String If Convert.IsDBNull(dbValue) Then Return "" Else Return CStr(dbValue) End If End Function

Public Sub LoadData(ByVal cn As SqlClient.SqlConnection, ByVal siteID As Integer, ByVal Extension As String) Dim sql As String = _ "SELECT LastName, FirstName, CostCenter " & _ " FROM Extensions " & _ " WHERE SiteID = @SiteID AND Extension = @Ext"

Using cmd As New SqlClient.SqlCommand(sql, cn) cmd.Parameters.Add(New SqlClient.SqlParameter("@SiteID", siteID)) cmd.Parameters.Add(New SqlClient.SqlParameter("@Ext", Extension)) Using rdr As SqlClient.SqlDataReader = cmd.ExecuteReader Me.SiteID = siteID Me.Extension = Extension If rdr.Read Then Me.LastName = HandleDBNullString(rdr("LastName")) Me.FirstName = HandleDBNullString(rdr("FirstName")) Me.CostCenter = HandleDBNullString(rdr("CostCenter")) Me.NotFound = False Else Me.LastName = "" Me.FirstName = "" Me.CostCenter = "" Me.NotFound = True End If End Using End Using End Sub End Class

Key features of this class:

  • definition of the primary key using the IComparable interface
  • 2 non-data fields:
    • NotFound allows us to cache a "not found" result just as easily as caching an actual data record
    • DateAddedToCache allows us to track when we created this entry and use that to purge older entries
  • LoadData method loads a specific record from the database (or sets the record as a "not found" condition)

To take advantage of the DateAddedToCache value, we'll need the ability sort the internal list by the date instead of it's natural sort order. Create the following class:

Private Class SortByTimeStamp
    Implements IComparer(Of DataRecord)

Public Shared Instance As New SortByTimeStamp

Public Function Compare _ (ByVal x As DataRecord, _ ByVal y As DataRecord) _ As Integer _ Implements IComparer(Of DataRecord).Compare

Return x.DateAddedToCache.CompareTo(y.DateAddedToCache) End Function End Class

The actual data for this cache will be held in these private variables:

Private _lastDatePurge As Date = Now
Private _Cache As New List(Of DataRecord)

These new 2 constants are used to purge old cache records. Basically, if a record has been sitting in the cache for more than MaxAge minutes or if there are more than MaxSize records in the cache, the oldest records will be purged.

Const MaxAge As Integer = 60 'minutes
Const MaxSize As Integer = 10000 'records

A couple of support methods:

Public Function CacheCount() As Integer
    Return _Cache.Count
End Function

Public Sub Clear() _Cache.Clear() lastDatePurge = Now End Sub

The routine that purges old data:

Private Sub AutoPurge()
    If CacheCount() = 0 Then Exit Sub

Dim isSorted As Boolean = True

If MaxAge > 0 AndAlso _ DateDiff(DateInterval.Second, _lastDatePurge, Now) > (MaxAge * 60) Then lastDatePurge = Now isSorted = False _Cache.Sort(SortByTimeStamp.Instance)

If CacheCount() > 0 Then 'if the newest item is stale, kill the whole cache Dim lastNdx As Integer = CacheCount() - 1 If DateDiff(DateInterval.Second, _Cache.Item(lastNdx).DateAddedToCache, Now) > (MaxAge * 60) Then _Cache.Clear() isSorted = True Exit Sub End If End If

Do While CacheCount() > 0 _ AndAlso DateDiff(DateInterval.Second, _Cache.Item(0).DateAddedToCache, Now) > (MaxAge * 60) _Cache.RemoveAt(0) Loop End If

If MaxSize > 0 AndAlso CacheCount() > MaxSize Then _Cache.Sort(SortByTimeStamp.Instance) isSorted = False

Do Until CacheCount() <= CInt(MaxSize * 0.9) _Cache.RemoveAt(0) Loop End If

If isSorted = False Then _Cache.Sort() End Sub

Technically, data can sit in the cache longer than the MaxAge value because we only check to see if we want to purge once every MaxAge minutes. We could check more often, but that cause us to sort and re-sort the internal list an excessive number of times.

And, finally, the main method used to interface with this class:

Public Function GetResult _
    (ByVal Conn As SqlClient.SqlConnection, _
    ByVal SiteID As Integer, ByVal Extension As String) _
        As DataRecord

AutoPurge() 'keeps the list from getting too big or stale; also calls initial sort if needed

Dim tmp As New DataRecord tmp.SiteID = SiteID tmp.Extension = Extension Dim CacheNdx As Integer = _Cache.BinarySearch(tmp) If CacheNdx < 0 Then 'not found, load the object from the source db tmp.LoadData(Conn, SiteID, Extension)

CacheNdx = Not CacheNdx _Cache.Insert(CacheNdx, tmp) End If Return _Cache.Item(CacheNdx) End Function

Using this cache is a simple matter of calling the GetResult method:

Dim Data As EmployeeCache.DataRecord = EmployeeCache.Instance.GetResult(cn.BaseSQLConnection, 3, "4444")
If Data.NotFound Then
    Console.WriteLine("Record not found")
Else
    Console.WriteLine(String.Format("Name={0}, {1}", Data.LastName, Data.FirstName))
End If

Edit

Generic method

The following class is intended as a more generic solution to the above problem. We'll still need to define specific classes to match our data record, but they'll inherit from the following:

Public MustInherit Class AdHocCache(Of TKey As {IComparable(Of TKey), New}, TValue As {BaseDataRecord, New})
    Public MustInherit Class BaseDataRecord
        Public MustOverride Sub LoadData(ByVal Conn As ADODB.Connection, ByVal key As TKey)

Public NotFound As Boolean = True End Class

Public Class CombinedRecord Implements IComparable(Of CombinedRecord)

Private ConnStrLen As Integer = 0 Private ConnStrCRC As Integer = 0 Public Key As TKey Public Value As TValue Public DateAddedToCache As Date = Now

Public Function CompareTo _ (ByVal other As AdHocCache(Of TKey, TValue).CombinedRecord) As Integer _ Implements System.IComparable(Of AdHocCache(Of TKey, TValue).CombinedRecord).CompareTo

Dim n As Integer = Me.ConnStrCRC.CompareTo(other.ConnStrCRC) If n = 0 Then n = Me.ConnStrLen.CompareTo(other.ConnStrLen) If n = 0 Then If Me.Key.GetType.FullName = "System.String" Then 'forcing case-insensitivity if key is a stand-alone string. 'If key is a class that contains strings, it'll be up to 'the creator of that class to implement case-insensitivity 'in the CompareTo method. Dim s1 As String = "" Dim s2 As String = "" If Me.Key IsNot Nothing Then s1 = Me.Key.ToString If other.Key IsNot Nothing Then s2 = other.Key.ToString n = StrComp(s1, s2, CompareMethod.Text) Else n = Me.Key.CompareTo(other.Key) End If End If Return n End Function

Public WriteOnly Property ConnStr() As String Set(ByVal value As String) Me.ConnStrLen = Len(value) If Me.ConnStrLen > 0 Then Using ms As IO.Stream = New IO.MemoryStream(UTF8.GetBytes(value)) Me.ConnStrCRC = CRC32Class.GetInstance.GetCrc32(ms) End Using Else Me.ConnStrCRC = 0 End If End Set End Property

Public Sub New() ConnStr = Nothing Key = New TKey Value = New TValue End Sub

Public Sub New(ByVal ConnStr As String, ByVal _Key As TKey) Me.ConnStr = ConnStr Key = _Key Value = New TValue End Sub

Public Sub New(ByVal ConnStr As String, ByVal _Key As TKey, ByVal _Value As TValue) Me.ConnStr = ConnStr Key = _Key Value = _Value End Sub End Class

Private Class SortByChangeDate Implements IComparer(Of CombinedRecord)

Public Function Compare _ (ByVal x As CombinedRecord, _ ByVal y As CombinedRecord) _ As Integer _ Implements IComparer(Of AdHocCache(Of TKey, TValue).CombinedRecord).Compare

Return x.DateAddedToCache.CompareTo(y.DateAddedToCache) End Function End Class

Private lastDatePurge As Date = Now Protected MaxAge As Integer 'minutes Protected MaxSize As Integer 'records Private arBase As New Generic.List(Of CombinedRecord) Private cntTotal As Integer = 0 Private cntHit As Integer = 0 Private cntPurge As Integer = 0

Private tmrPurge As Long = 0 Private tmrGetResult As Long = 0 Protected tmrLoad As Long = 0

Public Sub New(ByVal MaxAgeMinutes As Integer, ByVal MaxRecords As Integer) 'zero or negative values in either of these 2 parms will turn off 'that part of the purge feature. MaxAge = MaxAgeMinutes MaxSize = MaxRecords End Sub Public Sub New() MaxAge = 60 MaxSize = 1000 End Sub

Public Overrides Function ToString() As String Dim r As Double = 0 If cntTotal > 0 Then r = (cntHit / cntTotal)

Return _ String.Format("AdHocCache(of {0}, {1}): Hit Ratio: {2:#,##0} / {3:#,##0} = {4:P1}, Purge Events = {5:#,##0}, Current Cache Size = {9:#,##0} (Timers: GetResult={7:#,##0}ms, GetResult.Load={8:#,##0}ms, Purge={6:#,##0}ms)", _ GetType(TKey).Name, GetType(TValue).Name, cntHit, cntTotal, r, cntPurge, tmrPurge, tmrGetResult, tmrLoad, Me.CacheCount) End Function

Public ReadOnly Property ItemKey(ByVal ndx As Integer) As TKey Get Return arBase(ndx).Key End Get End Property

Public ReadOnly Property ItemData(ByVal ndx As Integer) As TValue Get Return arBase(ndx).Value End Get End Property

Protected Sub AutoPurge() If CacheCount() = 0 Then Exit Sub

Dim tmr As System.Diagnostics.Stopwatch = System.Diagnostics.Stopwatch.StartNew Dim SomethingGotPurged As Boolean = False Dim isSorted As Boolean = True

If MaxAge > 0 AndAlso _ DateDiff(DateInterval.Second, lastDatePurge, Now) > (MaxAge * 60) Then lastDatePurge = Now isSorted = False arBase.Sort(New SortByChangeDate())

If CacheCount() > 0 Then 'if the newest item is stale, kill the whole cache Dim lastNdx As Integer = CacheCount() - 1 If DateDiff(DateInterval.Second, arBase.Item(lastNdx).DateAddedToCache, Now) > (MaxAge * 60) Then arBase.Clear() isSorted = True SomethingGotPurged = True End If End If

Do While CacheCount() > 0 _ AndAlso DateDiff(DateInterval.Second, arBase.Item(0).DateAddedToCache, Now) > (MaxAge * 60) arBase.RemoveAt(0) SomethingGotPurged = True Loop End If

If MaxSize > 0 AndAlso CacheCount() > MaxSize Then SomethingGotPurged = True arBase.Sort(New SortByChangeDate()) isSorted = False

Do Until CacheCount() <= CInt(MaxSize * 0.9) arBase.RemoveAt(0) Loop End If

If SomethingGotPurged Then cntPurge += 1

If isSorted = False Then arBase.Sort() End If

tmrPurge += tmr.ElapsedMilliseconds End Sub

Public Sub Clear() arBase.Clear() cntTotal = 0 cntHit = 0 cntPurge = 0 tmrGetResult = 0 tmrLoad = 0 tmrPurge = 0 lastDatePurge = Now End Sub

Public Function CacheCount() As Integer Return arBase.Count End Function

Protected Function Add(ByVal data As CombinedRecord) As Integer Dim ndx As Integer = arBase.BinarySearch(data) If ndx >= 0 Then arBase(ndx) = data Return ndx End If

data.DateAddedToCache = Now

ndx = Not ndx If ndx >= arBase.Count() Then arBase.Add(data) Return arBase.Count - 1 Else arBase.Insert(ndx, data) Return ndx End If End Function

Protected Function GetResult _ (ByVal Conn As ADODB.Connection, _ ByVal _key As TKey) _ As TValue

AutoPurge() 'keeps the list from getting too big or stale; also calls initial sort if needed Dim watch As System.Diagnostics.Stopwatch = System.Diagnostics.Stopwatch.StartNew

Dim tmp As New CombinedRecord(Conn.ConnectionString, _key) Dim CacheNdx As Integer = arBase.BinarySearch(tmp) cntTotal += 1 If CacheNdx >= 0 Then cntHit += 1 Else 'not found, load the object from the source db Dim watch2 As System.Diagnostics.Stopwatch = System.Diagnostics.Stopwatch.StartNew tmp.Value.LoadData(Conn, _key)

CacheNdx = Add(tmp)

watch2.Stop() tmrLoad += watch2.ElapsedMilliseconds End If watch.Stop() tmrGetResult += watch.ElapsedMilliseconds Return arBase.Item(CacheNdx).Value End Function End Class

TODO: The above code uses an object called ADODB.Connection instead of SQLClient.SQLConnection, but the basics are the same. There are calls to a couple of additional general functions I used regularly. I'll update this article later to use SQLConnection and to document those general functions.

Additional features in the above code:

  • tracks time spent in various sections of the code and publishes those performance numbers in the ToString method.
  • considers the database connection string a part of the key; therefore, if you use multiple db's in the same program that have similar tables, the cache can cover each database without requiring a new instance for each one.

To use this abstract set of classes, we must first define classes that inherit from them and add the specifics. First a key class. The only rule for this type is that it must implement IComparable. If you have a data record that has a single value as a key, you can skip creation of this class and just use that type where you see ExtensionKeyRecord below:

Public Class ExtensionKeyRecord
    Implements IComparable(Of ExtensionKeyRecord)

Public SiteID As Integer Public Extension As String

Public Function CompareTo(ByVal other As ExtensionKeyRecord) As Integer _ Implements System.IComparable(Of ExtensionKeyRecord).CompareTo Dim n As Integer = SiteID.CompareTo(other.SiteID) If n = 0 Then n = StrComp(Extension, other.Extension, CompareMethod.Text) Return n End Function End Class

Next is the data record. This must inherit from BaseDataRecord and override the LoadData method.

Public Class ExtensionDataRecord
    Inherits AdHocCache(Of ExtensionKeyRecord, ExtensionDataRecord).BaseDataRecord

Public LastName As String = "" Public FirstName As String = "" Public CostCenter as String = ""

Private Function GetSQL(ByVal Conn As ADODB.Connection) As String Return "SELECT LastName, FirstName, CostCenter " & _ " FROM Extensions " & _ " WHERE SiteID = {0} AND Extension = {1}" End Function

Public Overrides Sub LoadData(ByVal Conn As ADODB.Connection, ByVal key As ExtensionKeyRecord)

Dim sql As String = GetSQL(Conn) sql = String.Format(sql, key.SiteID, StrToSQL(key.Extension))

Using rs As ADODB.Recordset = Conn.Execute(sql) If Not rs.EOF Then Me.LastName = varToStr(rs.Fields("LastName").Value) Me.FirstName = varToStr(rs.Fields("FirstName").Value) Me.CostCenter= varToStr(rs.Fields("CostCenter").Value) Me.NotFound = False End If End Using End Sub End Class

And finally, the specific cache itself. Technically you could just instantiate a new AdHocCache of the appropriate types, but this way you can define the cache as a singleton and overload the GetResult method to use specific key fields instead of a key class:

Public Class ExtensionCache
    Inherits AdHocCache(Of ExtensionKeyRecord, ExtensionDataRecord)

Private Shared _Instance As ExtensionCache = Nothing Private Shared _MaxAgeMinutes As Integer = 60 Private Shared _MaxRecords As Integer = 2000

Private Sub New(ByVal MaxAgeMinutes As Integer, ByVal MaxRecords As Integer) MyBase.New(MaxAgeMinutes, MaxRecords) End Sub

Public Shared Function GetInstance() As ExtensionCache Static _lock As New System.Object SyncLock _lock If _Instance Is Nothing Then _Instance = New ExtensionCache(_MaxAgeMinutes, _MaxRecords) End If Return _Instance End SyncLock End Function

Public Overloads Function GetResult _ (ByVal Conn As ADODB.Connection, ByVal SiteID as Integer, ByVal Extension As String) _ As ExtensionDataRecord

Dim tmp As New ExtensionKeyRecord tmp.SiteID = SiteID tmp.Extension = Extension Return Me.GetResult(Conn, tmp) End Function End Class

Then, we can use the ExtensionCache.GetInstance in the same way as the specific example above.

ScrewTurn Wiki version 2.0.33. Current Page Count: 23. Some of the icons created by FamFamFam.