Here we are again, this my second T-SQL Tuesday and I’m already feeling the excitement of getting back to the keyboard as a blogger for second month in a row (my personal record). T-SQL Tuesday is a fantastic initiative, not necessarily because of my humble contributions (like my previous post last month) but because there is a ton of great reading each month after the blogs are posted.
This month is hosted by Mala Mahadevan (here is the Invitation). The guidance to participate this month is to write about “a few things would you consider as gifts, and why?” in the context of SQL Server technology. The invitation goes on by giving some examples such as getting to know someone in the community better, getting to speak at an event, landing a job you never thought you would, etc, etc…
Here is the thing for me: I didn’t meet someone in the community better (shame on me), I did not speak at an event (shame again) nor landed a job I never thought I would (shame times three). This year I basically interacted with the SQL Server community exclusively online (so no way I could “get know someone better”) and that means no speaking anywhere. I did not switch jobs (I’m still here at RapidRTC) so cross the third option as well.
Going further there was a suggestion I liked and I thought I could use it for my post.
A new cool feature that you never even thought possible is now there…
So here it is, my blog post this month is now decided: Batch Processing Mode in RowStore: A great Gift from the Microsoft SQL Server Team to all of us lazy DBA’s looking for performance improvements without moving a finger!
What’s so good about this gift?
From time to time, Microsoft releases SQL Server features that are not very well received (and in all fairness, if they don’t fly too far, they are normally out of the product after a release or two), then there are those features that are not widespread but serve the purpose of a niche customer base. But then there are those features or improvements (not sure if call them features is accurate) that affect almost all of us, some incorporate that goodness knowingly and some swallow that pill not even knowing what’s in it, but they like it as well.
Examples of this third category abound, but one most of us can talk about is the improvements to the Cardinality Estimator in SQL Server 2014. This improvement got a lot of attention (for the good reasons) and normally it was a huge differentiator and a major reason for some people to want to migrate out of SQL Server 2008R2 and SQL Server 2012.
I can’t think of other examples (even more recently we had things like Adaptive Joins in SQL Server 2017) but the truth of the matter is that when this things happen (improvements to the product that “just work” without any infrastructure change) we all feel like we have received a gift from SQL Server!
I brought Adaptive Joins forward on purpose because Adaptive Joins is part of a larger set of enhancements that received the name of Intelligent Query Processing. Between 2017 and 2019 a dozen or so of improvements have been introduced. This chart summarizes the features pretty well.
One feature in green here (the one I want to talk about) is Batch Mode on RowStore.
Batch Mode on Rowstore is nothing but the ability for the Query Plan operators to “pass” data from one step to another in chunks (or batches) as opposed to row-by-row.
Since T-SQL is a set-based language, one common misconception is that physically, when the query runs, it is also executed using a set-based approach. The answer was (until SQL Server 2019) that this was generally not the case (yeah… I know what you ColumStore people are thinking… but I’m generalizing here).
Now, when you issue a query, based on a MANY conditions that would make this post too long to write (and honestly, it is all written already and better than I could ever do by the SQL Server CAT people out there) the query is evaluated and if those conditions are met (one example is that the query has to NOT be trivial enough that deserves batch mode, other example is the COMPAT LEVEL has to be 150) then Batch Mode is incorporated.
How do you know that batch mode is used… easy… just look into your Execution Plan
What is this operator telling us?
The Storage is RowStore, so no ColumnStore index magic here, we are scanning a normal good old Clustered Index.
The Plan was created under the assumption the scan would use Batch execution mode (that where the Estimated Execution Mode comes).
When the plan was indeed executed, Batch was a go so the execution ended up doing Batch (the Actual Execution Mode). I read out there that in most all the cases the Estimated and Actual never differ, but I can’t confirm.
There is also the Actual Number of Batches who tells us in how many batches the operation was broken down. I guess you can estimate the number of rows per batches if you know the rows in the Index Scan and you divide by the number of Batches, why on earth that can have practical use… I dunno…
Batch Execution Mode in RowStore in Action with WideWorldImporters queries
Let’s talk for a minute about the database I’m going to use for this testing all of this…
It took some time, first I thought I would never be able to get over to the fact that AdventureWorks was gone, but after a few months of disorientation I became friends with WideWorldImporters.
If you don’t know what I’m talking about then I don’t know where you have been in the last 3 years (which is more or less when Microsoft decided to put an end to selling bicycles and started importing cheap stuff God knows where…). Here is my not-so-new-anymore friend WideWorldImporters in all its ugliness (or maybe parts of it)
So, I decided to test this Batch Mode execution against RowStore using WideWorldImporters queries. I randomly grabbed 13 queries against this database with varying complexity, some as simple as showing columns of a single table (no joins) to more complicated JOINS to even using JSON and PIVOT in queries (ough!). I just called my queries Query#1 to Query#13, the full script is here.
Once I have assembled my sample T-SQL scripts I started running my queries in using SQL Query Stress (10 iterations per query in 1 Thread) in 2 different ways. First time I just ran it the way it shows in my script, but the second time around I added this line at the bottom of each query to prevent batch mode to be used at all (so in a sense, I’m reverting back to the old way of doing things).
OPTION(USE HINT('DISALLOW_BATCH_MODE'));
So, without further ado here are the results
The verdict: Awesome!
As you can see, I found in 4 out of 13 queries some operator where batch mode on RowStore was indeed used. This is almost 1 third of my randomly selected queries.
Why these ones were considered by the optimization engine and the others don’t? I have no idea and an investigation into each and every query would require an extensive blog post that would be too much for today. Suffice to say that a considerable portion of my queries were chosen and improved their performance, which is great.
When batch mode was used, I noticed that the differences were terrific. Consider query number 2 for example (from 523 ms to 11 ms), query number 3 (from 21,593 ms to 15,701 ms) or query number 5 (from 1,015 ms to 336 ms). The most dramatic case was the last query (from 147 ms to only 8 ms)!
When you don’t take into account ms but % improvement, these 4 query differences (before and after batch mode is incorporated in the plan) are telling: an improvement of 4587%, 138%, 302% and 1791% for queries 2, 3 5 and 13 respectively.
I don’t know you but I’m pretty happy about these results. I would love to hear what you have found in the wild and whether the initial numbers I get (1/3 of workload being impacted and, on those cases, dramatic reductions in response time) are confirmed in your cases.
Have a great and peaceful end of the year and a happy start of 2020!
댓글