Robin's Ramblings

Friday, August 25, 2006

Dealing with Dead-lock

Sometimes it seems like in a software project of any size or complexity, there are a never-ending stream of issues to consider; things you didn’t think about before, or things you knew about, but didn’t get around to developing strategies against. My latest problem is Dead-lock. Now, since I scored top of the class in my Concurrent Programming module when I was at University, you might think this would be one of the first things I’d consider when writing a multi-user database application with VB.NET and SQL Server. Well, you guessed it, I thought I’d try and wing-it and see what happens.

My testing strategy involves a “random” module, that executes a single random operation against the data source every 10 seconds. Operations are queued and then executed sequentially on a separate thread. Every 5 seconds or so, the thread also queues a “Coherence Check” (yes, I feel a blog coming on about this too) operation, which is purely read-only. The coherence check fetches bits of information from the database to make sure the client side cache (a kind of homebrew DataSet) is as up-to-date as possible and hence the client UI also matches the database state, in real-time (real-ish).

Effectively I’m executing around 3 operations against the database every 10 seconds. This is all fine and dandy in the single user case, but when you launch 4 processes and they are all executing 3 operations every 10 seconds, you start to get some interesting results. Naturally on my quite powerful home PC, I didn’t see any dead-locking at all. As soon as I transferred the code to my slower test PC at work, I hit frequent dead-lock exceptions. When I say frequent, I mean enough to kill one process every three hours or so. Well that just isn’t acceptable from a Software Quality point of view, so I’ve been thinking about strategies to handle these exceptions.

My first thought was just to tell the user that the operation dead-locked and then invite him to retry it. Generally this is a bit of a cop-out though (lazy programmer). The solution is of course to automatically retry the operation after a suitable delay and only fail with an error to the user if despite our re-try efforts we still can’t commit the transaction.

Below is a sample showing my proposed solution. Each operation against the data source is wrapped in the dead-lock loop as follows:



Public Function DoSomething () As <some error class>

Dim bTrying As Boolean = False
Dim nDeadlocked As Integer = 0
Dim Result as <some error class>

Do

Try

' ...
' ... Setup stored procedure execution
' ...

DataReader = Command.ExecuteReader()

' ... No exception

Result = <some error class, success>

Catch sqlEx As SqlException

Result = <some error class, failed>

If sqlEx.Number = 1205 Then
nDeadlocked += 1
bTrying = True
Thread.Sleep(100)

End If

Catch Ex As Exception

Result = <some error class, failed>

Finally

' ... cleanup resources

End Try

Loop While bTrying And (nDeadlocked < m_RetryCount)

Return Result

EndFunction

The idea is simply to loop "n" times, introducing a delay of "m" seconds in-between retries when we detect a dead-lock situation. If we still fail after that, then we are quite justified in returning an error to the user. Note that this code excludes my custom error class for brevity. Whenever an operation completes, I construct a "SUCCESS" error result. If it fails I construct a FAILED error result (with appropriate information about why it failed). The point is in this instance, we don't return FAILED when we dead-lock, unless we dead-locked "n" times.



0 Comments:

Post a Comment

<< Home