EditOverview
When designing code that needs to access data, we often run into performance problems. While modern database engines can be very fast, it's still much faster to work with memory than it is to request data from the database. The following technique is aimed a specific type of problem: Situations where you need to lookup individual records from a database (key based searches) and you may need to lookup the same record more than once during the lifetime of your code. Typically this applies primarily to reference or lookup tables.
There are 2 extremes possible:
- Do each lookup as a database query as needed - if the number of repeat lookups is very low, this is a simple way to handle the situation.
- Pre-load the data you might need and do in-memory lookups. If the number of records in the lookup table is small, this is a reasonable way to handle it. But, if the number of records is potentially large, you risk memory issues.
The following technique attempts to take a middle road between these 2: Do each lookup from the database the first time, but then store the result in an in-memory cache for subsequent lookups. The technique includes the ability to limit the memory usage to a certain number of records.
EditSpecific example
For this example, Assume we are building a system that processes phone call records and we need to lookup information about the extension the call was made from. Since we may be dealing with calls from multiple locations, we'll be using a table that is keyed on 2 fields: SiteID (integer) and Extension (string). In addition to telling me who is responsible for that extension (LastName, FirstName), this table will give us a CostCenter field (string) that we can use to determine how to allocate the costs of this phone call.
So, a typical query to this table would look like this:
SELECT LastName, FirstName, CostCenter
FROM Extensions
WHERE SiteID = 1 AND Extension = '4444'
First, we create a class for our cache. We'll define this as a singleton so we only keep one copy of the cache:
Public Class EmployeeCache
Private Sub New()
End Sub
Public Shared Instance As New EmployeeCache
End Class
The rest of this code goes within the cache class, including a couple of additional class definitions. First, we'll define a data record class to hold each element:
Public Class DataRecord
Implements IComparable(Of DataRecord)
'key fields
Public SiteID As Integer
Public Extension As String
Public NotFound As Boolean = True
Public DateAddedToCache As Date = Date.Now
'data fields
Public LastName As String
Public FirstName As String
Public CostCenter As String
Public Function CompareTo(ByVal other As DataRecord) As Integer _
Implements IComparable(Of DataRecord).CompareTo
Dim n As Integer = SiteID.CompareTo(other.SiteID)
If n = 0 Then n = String.Compare(Extension, other.Extension, True)
Return n
End Function
Private Function HandleDBNullString(ByVal dbValue As Object) As String
If Convert.IsDBNull(dbValue) Then
Return ""
Else
Return CStr(dbValue)
End If
End Function
Public Sub LoadData(ByVal cn As SqlClient.SqlConnection, ByVal siteID As Integer, ByVal Extension As String)
Dim sql As String = _
"SELECT LastName, FirstName, CostCenter " & _
" FROM Extensions " & _
" WHERE SiteID = @SiteID AND Extension = @Ext"
Using cmd As New SqlClient.SqlCommand(sql, cn)
cmd.Parameters.Add(New SqlClient.SqlParameter("@SiteID", siteID))
cmd.Parameters.Add(New SqlClient.SqlParameter("@Ext", Extension))
Using rdr As SqlClient.SqlDataReader = cmd.ExecuteReader
Me.SiteID = siteID
Me.Extension = Extension
If rdr.Read Then
Me.LastName = HandleDBNullString(rdr("LastName"))
Me.FirstName = HandleDBNullString(rdr("FirstName"))
Me.CostCenter = HandleDBNullString(rdr("CostCenter"))
Me.NotFound = False
Else
Me.LastName = ""
Me.FirstName = ""
Me.CostCenter = ""
Me.NotFound = True
End If
End Using
End Using
End Sub
End Class
Key features of this class:
- definition of the primary key using the
IComparable interface - 2 non-data fields:
- NotFound allows us to cache a "not found" result just as easily as caching an actual data record
- DateAddedToCache allows us to track when we created this entry and use that to purge older entries
- LoadData method loads a specific record from the database (or sets the record as a "not found" condition)
To take advantage of the DateAddedToCache value, we'll need the ability sort the internal list by the date instead of it's natural sort order. Create the following class:
Private Class SortByTimeStamp
Implements IComparer(Of DataRecord)
Public Shared Instance As New SortByTimeStamp
Public Function Compare _
(ByVal x As DataRecord, _
ByVal y As DataRecord) _
As Integer _
Implements IComparer(Of DataRecord).Compare
Return x.DateAddedToCache.CompareTo(y.DateAddedToCache)
End Function
End Class
The actual data for this cache will be held in these private variables:
Private _lastDatePurge As Date = Now
Private _Cache As New List(Of DataRecord)
These new 2 constants are used to purge old cache records. Basically, if a record has been sitting in the cache for more than
MaxAge minutes or if there are more than
MaxSize records in the cache, the oldest records will be purged.
Const MaxAge As Integer = 60 'minutes
Const MaxSize As Integer = 10000 'records
A couple of support methods:
Public Function CacheCount() As Integer
Return _Cache.Count
End Function
Public Sub Clear()
_Cache.Clear()
lastDatePurge = Now
End Sub
The routine that purges old data:
Private Sub AutoPurge()
If CacheCount() = 0 Then Exit Sub
Dim isSorted As Boolean = True
If MaxAge > 0 AndAlso _
DateDiff(DateInterval.Second, _lastDatePurge, Now) > (MaxAge * 60) Then
lastDatePurge = Now
isSorted = False
_Cache.Sort(SortByTimeStamp.Instance)
If CacheCount() > 0 Then
'if the newest item is stale, kill the whole cache
Dim lastNdx As Integer = CacheCount() - 1
If DateDiff(DateInterval.Second, _Cache.Item(lastNdx).DateAddedToCache, Now) > (MaxAge * 60) Then
_Cache.Clear()
isSorted = True
Exit Sub
End If
End If
Do While CacheCount() > 0 _
AndAlso DateDiff(DateInterval.Second, _Cache.Item(0).DateAddedToCache, Now) > (MaxAge * 60)
_Cache.RemoveAt(0)
Loop
End If
If MaxSize > 0 AndAlso CacheCount() > MaxSize Then
_Cache.Sort(SortByTimeStamp.Instance)
isSorted = False
Do Until CacheCount() <= CInt(MaxSize * 0.9)
_Cache.RemoveAt(0)
Loop
End If
If isSorted = False Then _Cache.Sort()
End Sub
Technically, data can sit in the cache longer than the
MaxAge value because we only check to see if we want to purge once every
MaxAge minutes. We could check more often, but that cause us to sort and re-sort the internal list an excessive number of times.
And, finally, the main method used to interface with this class:
Public Function GetResult _
(ByVal Conn As SqlClient.SqlConnection, _
ByVal SiteID As Integer, ByVal Extension As String) _
As DataRecord
AutoPurge() 'keeps the list from getting too big or stale; also calls initial sort if needed
Dim tmp As New DataRecord
tmp.SiteID = SiteID
tmp.Extension = Extension
Dim CacheNdx As Integer = _Cache.BinarySearch(tmp)
If CacheNdx < 0 Then
'not found, load the object from the source db
tmp.LoadData(Conn, SiteID, Extension)
CacheNdx = Not CacheNdx
_Cache.Insert(CacheNdx, tmp)
End If
Return _Cache.Item(CacheNdx)
End Function
Using this cache is a simple matter of calling the GetResult method:
Dim Data As EmployeeCache.DataRecord = EmployeeCache.Instance.GetResult(cn.BaseSQLConnection, 3, "4444")
If Data.NotFound Then
Console.WriteLine("Record not found")
Else
Console.WriteLine(String.Format("Name={0}, {1}", Data.LastName, Data.FirstName))
End If
EditGeneric method
The following class is intended as a more generic solution to the above problem. We'll still need to define specific classes to match our data record, but they'll inherit from the following:
Public MustInherit Class AdHocCache(Of TKey As {IComparable(Of TKey), New}, TValue As {BaseDataRecord, New})
Public MustInherit Class BaseDataRecord
Public MustOverride Sub LoadData(ByVal Conn As ADODB.Connection, ByVal key As TKey)
Public NotFound As Boolean = True
End Class
Public Class CombinedRecord
Implements IComparable(Of CombinedRecord)
Private ConnStrLen As Integer = 0
Private ConnStrCRC As Integer = 0
Public Key As TKey
Public Value As TValue
Public DateAddedToCache As Date = Now
Public Function CompareTo _
(ByVal other As AdHocCache(Of TKey, TValue).CombinedRecord) As Integer _
Implements System.IComparable(Of AdHocCache(Of TKey, TValue).CombinedRecord).CompareTo
Dim n As Integer = Me.ConnStrCRC.CompareTo(other.ConnStrCRC)
If n = 0 Then n = Me.ConnStrLen.CompareTo(other.ConnStrLen)
If n = 0 Then
If Me.Key.GetType.FullName = "System.String" Then
'forcing case-insensitivity if key is a stand-alone string.
'If key is a class that contains strings, it'll be up to
'the creator of that class to implement case-insensitivity
'in the CompareTo method.
Dim s1 As String = ""
Dim s2 As String = ""
If Me.Key IsNot Nothing Then s1 = Me.Key.ToString
If other.Key IsNot Nothing Then s2 = other.Key.ToString
n = StrComp(s1, s2, CompareMethod.Text)
Else
n = Me.Key.CompareTo(other.Key)
End If
End If
Return n
End Function
Public WriteOnly Property ConnStr() As String
Set(ByVal value As String)
Me.ConnStrLen = Len(value)
If Me.ConnStrLen > 0 Then
Using ms As IO.Stream = New IO.MemoryStream(UTF8.GetBytes(value))
Me.ConnStrCRC = CRC32Class.GetInstance.GetCrc32(ms)
End Using
Else
Me.ConnStrCRC = 0
End If
End Set
End Property
Public Sub New()
ConnStr = Nothing
Key = New TKey
Value = New TValue
End Sub
Public Sub New(ByVal ConnStr As String, ByVal _Key As TKey)
Me.ConnStr = ConnStr
Key = _Key
Value = New TValue
End Sub
Public Sub New(ByVal ConnStr As String, ByVal _Key As TKey, ByVal _Value As TValue)
Me.ConnStr = ConnStr
Key = _Key
Value = _Value
End Sub
End Class
Private Class SortByChangeDate
Implements IComparer(Of CombinedRecord)
Public Function Compare _
(ByVal x As CombinedRecord, _
ByVal y As CombinedRecord) _
As Integer _
Implements IComparer(Of AdHocCache(Of TKey, TValue).CombinedRecord).Compare
Return x.DateAddedToCache.CompareTo(y.DateAddedToCache)
End Function
End Class
Private lastDatePurge As Date = Now
Protected MaxAge As Integer 'minutes
Protected MaxSize As Integer 'records
Private arBase As New Generic.List(Of CombinedRecord)
Private cntTotal As Integer = 0
Private cntHit As Integer = 0
Private cntPurge As Integer = 0
Private tmrPurge As Long = 0
Private tmrGetResult As Long = 0
Protected tmrLoad As Long = 0
Public Sub New(ByVal MaxAgeMinutes As Integer, ByVal MaxRecords As Integer)
'zero or negative values in either of these 2 parms will turn off
'that part of the purge feature.
MaxAge = MaxAgeMinutes
MaxSize = MaxRecords
End Sub
Public Sub New()
MaxAge = 60
MaxSize = 1000
End Sub
Public Overrides Function ToString() As String
Dim r As Double = 0
If cntTotal > 0 Then r = (cntHit / cntTotal)
Return _
String.Format("AdHocCache(of {0}, {1}): Hit Ratio: {2:#,##0} / {3:#,##0} = {4:P1}, Purge Events = {5:#,##0}, Current Cache Size = {9:#,##0} (Timers: GetResult={7:#,##0}ms, GetResult.Load={8:#,##0}ms, Purge={6:#,##0}ms)", _
GetType(TKey).Name, GetType(TValue).Name, cntHit, cntTotal, r, cntPurge, tmrPurge, tmrGetResult, tmrLoad, Me.CacheCount)
End Function
Public ReadOnly Property ItemKey(ByVal ndx As Integer) As TKey
Get
Return arBase(ndx).Key
End Get
End Property
Public ReadOnly Property ItemData(ByVal ndx As Integer) As TValue
Get
Return arBase(ndx).Value
End Get
End Property
Protected Sub AutoPurge()
If CacheCount() = 0 Then Exit Sub
Dim tmr As System.Diagnostics.Stopwatch = System.Diagnostics.Stopwatch.StartNew
Dim SomethingGotPurged As Boolean = False
Dim isSorted As Boolean = True
If MaxAge > 0 AndAlso _
DateDiff(DateInterval.Second, lastDatePurge, Now) > (MaxAge * 60) Then
lastDatePurge = Now
isSorted = False
arBase.Sort(New SortByChangeDate())
If CacheCount() > 0 Then
'if the newest item is stale, kill the whole cache
Dim lastNdx As Integer = CacheCount() - 1
If DateDiff(DateInterval.Second, arBase.Item(lastNdx).DateAddedToCache, Now) > (MaxAge * 60) Then
arBase.Clear()
isSorted = True
SomethingGotPurged = True
End If
End If
Do While CacheCount() > 0 _
AndAlso DateDiff(DateInterval.Second, arBase.Item(0).DateAddedToCache, Now) > (MaxAge * 60)
arBase.RemoveAt(0)
SomethingGotPurged = True
Loop
End If
If MaxSize > 0 AndAlso CacheCount() > MaxSize Then
SomethingGotPurged = True
arBase.Sort(New SortByChangeDate())
isSorted = False
Do Until CacheCount() <= CInt(MaxSize * 0.9)
arBase.RemoveAt(0)
Loop
End If
If SomethingGotPurged Then cntPurge += 1
If isSorted = False Then
arBase.Sort()
End If
tmrPurge += tmr.ElapsedMilliseconds
End Sub
Public Sub Clear()
arBase.Clear()
cntTotal = 0
cntHit = 0
cntPurge = 0
tmrGetResult = 0
tmrLoad = 0
tmrPurge = 0
lastDatePurge = Now
End Sub
Public Function CacheCount() As Integer
Return arBase.Count
End Function
Protected Function Add(ByVal data As CombinedRecord) As Integer
Dim ndx As Integer = arBase.BinarySearch(data)
If ndx >= 0 Then
arBase(ndx) = data
Return ndx
End If
data.DateAddedToCache = Now
ndx = Not ndx
If ndx >= arBase.Count() Then
arBase.Add(data)
Return arBase.Count - 1
Else
arBase.Insert(ndx, data)
Return ndx
End If
End Function
Protected Function GetResult _
(ByVal Conn As ADODB.Connection, _
ByVal _key As TKey) _
As TValue
AutoPurge() 'keeps the list from getting too big or stale; also calls initial sort if needed
Dim watch As System.Diagnostics.Stopwatch = System.Diagnostics.Stopwatch.StartNew
Dim tmp As New CombinedRecord(Conn.ConnectionString, _key)
Dim CacheNdx As Integer = arBase.BinarySearch(tmp)
cntTotal += 1
If CacheNdx >= 0 Then
cntHit += 1
Else
'not found, load the object from the source db
Dim watch2 As System.Diagnostics.Stopwatch = System.Diagnostics.Stopwatch.StartNew
tmp.Value.LoadData(Conn, _key)
CacheNdx = Add(tmp)
watch2.Stop()
tmrLoad += watch2.ElapsedMilliseconds
End If
watch.Stop()
tmrGetResult += watch.ElapsedMilliseconds
Return arBase.Item(CacheNdx).Value
End Function
End Class
TODO: The above code uses an object called ADODB.Connection instead of SQLClient.SQLConnection, but the basics are the same. There are calls to a couple of additional general functions I used regularly. I'll update this article later to use SQLConnection and to document those general functions.
Additional features in the above code:
- tracks time spent in various sections of the code and publishes those performance numbers in the ToString method.
- considers the database connection string a part of the key; therefore, if you use multiple db's in the same program that have similar tables, the cache can cover each database without requiring a new instance for each one.
To use this abstract set of classes, we must first define classes that inherit from them and add the specifics. First a key class. The only rule for this type is that it must implement IComparable. If you have a data record that has a single value as a key, you can skip creation of this class and just use that type where you see ExtensionKeyRecord below:
Public Class ExtensionKeyRecord
Implements IComparable(Of ExtensionKeyRecord)
Public SiteID As Integer
Public Extension As String
Public Function CompareTo(ByVal other As ExtensionKeyRecord) As Integer _
Implements System.IComparable(Of ExtensionKeyRecord).CompareTo
Dim n As Integer = SiteID.CompareTo(other.SiteID)
If n = 0 Then n = StrComp(Extension, other.Extension, CompareMethod.Text)
Return n
End Function
End Class
Next is the data record. This must inherit from BaseDataRecord and override the LoadData method.
Public Class ExtensionDataRecord
Inherits AdHocCache(Of ExtensionKeyRecord, ExtensionDataRecord).BaseDataRecord
Public LastName As String = ""
Public FirstName As String = ""
Public CostCenter as String = ""
Private Function GetSQL(ByVal Conn As ADODB.Connection) As String
Return "SELECT LastName, FirstName, CostCenter " & _
" FROM Extensions " & _
" WHERE SiteID = {0} AND Extension = {1}"
End Function
Public Overrides Sub LoadData(ByVal Conn As ADODB.Connection, ByVal key As ExtensionKeyRecord)
Dim sql As String = GetSQL(Conn)
sql = String.Format(sql, key.SiteID, StrToSQL(key.Extension))
Using rs As ADODB.Recordset = Conn.Execute(sql)
If Not rs.EOF Then
Me.LastName = varToStr(rs.Fields("LastName").Value)
Me.FirstName = varToStr(rs.Fields("FirstName").Value)
Me.CostCenter= varToStr(rs.Fields("CostCenter").Value)
Me.NotFound = False
End If
End Using
End Sub
End Class
And finally, the specific cache itself. Technically you could just instantiate a new AdHocCache of the appropriate types, but this way you can define the cache as a singleton and overload the GetResult method to use specific key fields instead of a key class:
Public Class ExtensionCache
Inherits AdHocCache(Of ExtensionKeyRecord, ExtensionDataRecord)
Private Shared _Instance As ExtensionCache = Nothing
Private Shared _MaxAgeMinutes As Integer = 60
Private Shared _MaxRecords As Integer = 2000
Private Sub New(ByVal MaxAgeMinutes As Integer, ByVal MaxRecords As Integer)
MyBase.New(MaxAgeMinutes, MaxRecords)
End Sub
Public Shared Function GetInstance() As ExtensionCache
Static _lock As New System.Object
SyncLock _lock
If _Instance Is Nothing Then
_Instance = New ExtensionCache(_MaxAgeMinutes, _MaxRecords)
End If
Return _Instance
End SyncLock
End Function
Public Overloads Function GetResult _
(ByVal Conn As ADODB.Connection, ByVal SiteID as Integer, ByVal Extension As String) _
As ExtensionDataRecord
Dim tmp As New ExtensionKeyRecord
tmp.SiteID = SiteID
tmp.Extension = Extension
Return Me.GetResult(Conn, tmp)
End Function
End Class
Then, we can use the ExtensionCache.GetInstance in the same way as the specific example above.