At the end of last week I was called out to look over a poorly performing BizTalk environment. Basically the performance had degraded to the point where BizTalk was processing a few messages a minute even though resource usage on the BizTalk Server was negligible. Immediately this screams BizTalk Host database throttling. Below are the basic steps I undertook to debug and analyse the problem.
- Fire up the BizTalk admin console and look at the running and suspended services instances. I found 41,000 suspended and another 9000 in either active, dehydrated, or ready to run. After refreshing a few times on the Running service instances view I could see messages weren’t processing quickly at all.
- Open perfmon and monitor the publishing throttling state and delivery throttling state performance counters from the BizTalk Messaging Agent category. Add these counters for all instances. Also add the Spool table and tracking data table counters from the BizTalk General Message Box category. An optimal spool count for an empty message box is 4. If its larger than four your BizTalk SQL cleanup jobs either aren’t running or you have a heap of running and suspended services instances in the message box. The spool count on this environment was 2.5+ million. I scrolled through the throttling counters and sure enough most were in throttling state 6 which is database throttling.
- As a temporary fix to get some throughput back we tweaked the “Message count in database” throttling setting which you can find under the advanced tab of your BizTalk host in the BizTalk admin console. The objective is to get those BizTalk hosts out of throttling state 6 but when tweaking the throttling settings pay a CLOSE eye on resource usage. Often you will tweak one setting only to encounter throttling somewhere else. Read more here, How BizTalk Implements Host Throttling.
- So now we had our message throughput back and the BizTalk hosts were no longer throttling. The client was happy as they had a nice normal level of performance back. BizTalk still wasn’t entirely happy though with a large spool. We watched that spool increase to over 5 million records over the following hours whilst a back log that had been queued up ready to send to BizTalk was processed. We noted periods were the spool would drop that coincided with lulls in activity. Obviously the BizTalk Server was passed the point of sustainable throughput. More data was going into the spool than could be cleaned out whilst processing a constant backlog at this rate. Really this environment needs message box scaling. However in our case we had a weekend of zero activity coming up and the BizTalk jobs eventually caught up.
Another option which is rather more drastic is running the bts_CleanupMsgBox stored proc on the message box database. This will clean out the spool and tracking tables in an instant. This does work but is NOT supported for production environments. This should only be a last resort option. Make sure you backup the message box database before running this stored proc if you go this way. If the bts_CleanupMsgBox is run the wrong way it WILL wipe all subscriptions and break your environment so be wary. The correct way to run this stored proc is “exec bts_CleanupMsgBox”. The parameter for this stored proc is default internally to 1. Do not send 0 or false as a parameter as it WILL remove your subscriptions.
After we had the spool back down and the back log had been processed we took a closer look at some of the throttling settings to at least delay the point where BizTalk Host database throttling kicks in. Indications are the BizTalk applications have been developed in a less than optimal way, they have way too many persistence points which is evidenced by the 2.5M spool : 50,000 service instance ratio. With this environment with the BizTalk apps in their current state we needed to increase the “message count in database” throttling setting on a number of hosts to delay that throttling point. We have BizTalk Server resources to burn so this is feasible in this scenario. However longer term the client needs to take a closer look at the efficiency of those BizTalk applications to better make use of resources and delay the point where they will MUST scale.