Over-initiation: When More Isn’t Better
When we push our full cart to the front of the grocery store, we all want to see a lot of open registers. We feel the same when crossing a toll bridge; it can make your day when you see a lot of empty toll lanes. And, though far more rare, don’t we just grin when the flight attendant closes the door and we have an empty row all to ourselves?
From our perspective, more is better. Except that in each case, if a company always offers more capacity than is needed, they will have to raise prices to afford this largesse. Empty airplane seats and extra staff—any unused capacity—is bad in the business world.
And while a batch job may “enjoy” being initiated as soon as it hits the system, we don’t do capacity planning like that. The goal in every system is to minimize waits for important work while keeping the system running near 100-percent busy.
No grocery store really wants people abandoning carts because the wait seems endless. Airlines don’t want to bump people anymore than they want to fly empty. Our goal as systems managers is like the goal in every other business: to run as close to capacity as possible, without missing service levels. We want our jobs to be “happy,” but not too happy.
The Dangers of Over-initiation
And yet, even knowledgeable performance experts make the mistake of throwing more resources at work. If 10 buffers are good, 15 would be even better, right? If we have more batch work, throw initiators at it; it will get things moving faster. But it turns out that over-initiation is very similar to putting up more toll collectors on a bridge.
Immediately after the toll is paid, the lanes have to shrink to the number that fits on the bridge. If you have too many toll takers, the merge following the toll gets crazy, accidents can happen and no one is moving fast. When more people wait in the lanes to pay toll, the merge is smoother and the flow across the bridge works better. Getting to the bridge earlier isn’t as good as getting across it faster. But you have to strike a balance.
The same is true with job initiation. Over-initiation doesn’t necessarily translate to better job throughput. For one thing, initiators take up resources, resources that could be better used actually doing work. Next, batch jobs aren’t created equal. Some have heavy demand for I/O, others for CPU and others fly by quickly without much demand on anything.
Ideally, just like the aisles at supermarkets, you want to be able to efficiently manage the load, not having people with one or two items waiting behind the person who is shopping for the long winter.
The Workload Manager Approach
WLM gets it. It starts initiators based on the:
- Queue of jobs awaiting execution in WLM-managed classes
- Performance goals and relative importance of this work
- Success of meeting these goals
- Capacity of each system to do more work
But this doesn’t address the resource demands of each job. If you want true efficiency, resulting in better throughput, you have to look at the entire equation.
The ThruPut Manager Approach
ThruPut Manager addresses this aspect, adding value to the management functions that WLM already has in place.
- Initiators are started and stopped based on system load, datacenter controls and workload performance against service targets
- Workload is distributed across systems
- The datacenter can control which systems can process which jobs
- Initiators always select the ‘best’ job to execute next
- Queue time is used to reduce initiator delays
- Execution time delays are reduced to increase initiator availability
Our batch automation solution understands that wait time grows exponentially with utilization; so, when you think you are using the system efficiently (running-at-100-percent-CPU busy), you’re actually causing a huge amount of wait. So, it designs job initiation around minimizing wait, not maximizing CPU utilization. Less wait is better.
So next time you start to think that more is better, go through the entire scenario. If you have more of a good thing, will this result in more of something you don’t want, like waiting? Keep the entire picture in mind when solving performance problems. Otherwise, you just move the bottleneck without eliminating it. Look for solutions that understand this complex balance, like ThruPut Manager.’
Latest posts by Denise Kalm (see all)
- Breaking Down Dev and Ops Silos with Communication, Collaboration and Trust - April 19, 2018
- What Do Performance and Capacity Roles Have to Do with DevOps? - February 1, 2018
- How Developers Can Boost DevOps Through Automated Batch Processing - December 7, 2017