Top 10 query optimizing tips for MS SQL Server

Databases

tips-and-tricks

During my 15 years of experience and working on various projects that involved using MS SQL Server in combination with C# programming language, and also gathering knowledge from different conferences, especially from the last one that occured today at SQL Saturday by ApexSQL where Miloš Radivojević showed some tips, I tried to summarize this knowledge in a small list of recommendations that you should be aware of when writing the Transact SQL queries…

1. Using local defined variables

When possible always try to use direct values instead of variables (this is the rare case in real life because you need variables more often then static values). This is the fastest way to execute the query. When you deal with store procedures I found out that you can benefit enormously in speed execution if you just copy your store procedure parameters into local variables and then use only these local variables in your store procedure queries. I had relatively big tables with 20 million records with proper indexes but doing this trick really boost the performance of my store procedure.
If you don’t have uniform distribution you will have problems with local variables and to optimize this you can use OPTION(RECOMPILE) at the end of your queries.

--Use direct values
SELECT	c.CustomerID,
c.TerritoryID,
c.CustomerType,
ca.AddressID,
soh.SalesOrderID,
soh.OrderDate,
soh.DueDate,
soh.ShipDate,
soh.SubTotal,
soh.TaxAmt,
soh.TotalDue
FROM		Sales.Customer c LEFT JOIN
Sales.CustomerAddress ca ON ca.CustomerID = c.CustomerID LEFT JOIN
Sales.SalesOrderHeader soh ON soh.CustomerID = c.CustomerID
WHERE		c.TerritoryID = 5 AND c.CustomerType = 'S' AND soh.SubTotal > 1000

--Now use variables instead of direct values
DECLARE @TerritoryID int, @CustomerType nchar(1), @SubTotal money
SET @TerritoryID = 5
SET @CustomerType = 'S'
SET @SubTotal = 1000

SELECT	c.CustomerID,
c.TerritoryID,
c.CustomerType,
ca.AddressID,
soh.SalesOrderID,
soh.OrderDate,
soh.DueDate,
soh.ShipDate,
soh.SubTotal,
soh.TaxAmt,
soh.TotalDue
FROM		Sales.Customer c LEFT JOIN
Sales.CustomerAddress ca ON ca.CustomerID = c.CustomerID LEFT JOIN
Sales.SalesOrderHeader soh ON soh.CustomerID = c.CustomerID
WHERE		c.TerritoryID = @TerritoryID AND c.CustomerType = @CustomerType AND soh.SubTotal > @SubTotal

--Create a store procedure with these parameters and run it
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE PROCEDURE [dbo].[DemoProcedure]
@TerritoryID int,
@CustomerType nchar(1),
@SubTotal money
AS
BEGIN
SELECT	c.CustomerID,
c.TerritoryID,
c.CustomerType,
ca.AddressID,
soh.SalesOrderID,
soh.OrderDate,
soh.DueDate,
soh.ShipDate,
soh.SubTotal,
soh.TaxAmt,
soh.TotalDue
FROM		Sales.Customer c LEFT JOIN
Sales.CustomerAddress ca ON ca.CustomerID = c.CustomerID LEFT JOIN
Sales.SalesOrderHeader soh ON soh.CustomerID = c.CustomerID
WHERE		c.TerritoryID = @TerritoryID AND c.CustomerType = @CustomerType AND soh.SubTotal > @SubTotal
END

--Now run the store procedure
EXEC [dbo].[DemoProcedure]
@TerritoryID = 5,
@CustomerType = 'S',
@SubTotal = 1000

--Now modife this procedure by using local variables and assigning parameters to them
ALTER PROCEDURE [dbo].[DemoProcedure]
@TerritoryID int,
@CustomerType nchar(1),
@SubTotal money
AS
BEGIN
DECLARE @aTerritoryID int, @aCustomerType nchar(1), @aSubTotal money
SET @aTerritoryID = @TerritoryID
SET @aCustomerType = @CustomerType
SET @aSubTotal = @SubTotal

SELECT	c.CustomerID,
c.TerritoryID,
c.CustomerType,
ca.AddressID,
soh.SalesOrderID,
soh.OrderDate,
soh.DueDate,
soh.ShipDate,
soh.SubTotal,
soh.TaxAmt,
soh.TotalDue
FROM		Sales.Customer c LEFT JOIN
Sales.CustomerAddress ca ON ca.CustomerID = c.CustomerID LEFT JOIN
Sales.SalesOrderHeader soh ON soh.CustomerID = c.CustomerID
WHERE		c.TerritoryID = @aTerritoryID AND c.CustomerType = @aCustomerType AND soh.SubTotal > @aSubTotal
END

--Now run the store procedure again
EXEC [dbo].[DemoProcedure]
@TerritoryID = 5,
@CustomerType = 'S',
@SubTotal = 1000

2. Index on XML column data type

Don’t put an index on XML column data type. It just don’t work well and you can get really strange results from MS SQL Server that can slower the query execution rapidly. it should work by theory OK, but it doesn’t.

3. Do not use functions in WHERE clauses

This is slowing down enormously query execution. Try always to do the same logic with normal query operators, try to rethink your strategy to avoid this costly scenario.

4. Do not user UPPER sting function if Transact SQL queries if the database is Case Insensitive

Logical request. Developers are thinking “You know… I just want to be sure that left side and right side are uppercased…” 🙂 Well… Don’t do this, just trust the product and you will gain good performance boost.

SELECT Field1 FROM MyTable WHERE UPPER(Field2) = 'MyName'

5. Never do the calculation on your columns (if not needed)

For example, if we have this query

SELECT Field1 FROM MyTable WHERE Field2 * 2 = 10000

SELECT Field1 FROM MyTable WHERE Field2= 10000 / 2

it is obvious that second query will be much faster because in first query MS SQL Server need to multiply all values by 2 for column “Field2” in the table.

6. Comparing non-unicode columns (varchar) with unicode pattern

Don’t do this if your column is type of varchar because the performance will be de degradated. So, in following example second call is bad if Field2 is type of varchar. Second call is OK only if Field2 is nvarchar.

SELECT Field1 FROM MyTable WHERE Field2 = 'MyProperty'
SELECT Field1 FROM MyTable WHERE Field2 = N'MyProperty'

7. Time saving tip for developers

Very often when you are writting your classes you need to be sure that you don’t make typing errors in the variable names. So, usually we do the same property names as are our column names in some table. In order to make our life easier we usually do this:

  1. Open MS SQL Management Studio
  2. Right click on wished table and choose Script Table as… > SELECT To > New Query Window

After this you get SELECT query but all column names are with these parentheses [] and there are commas, spaces… You need to get rid of all this. I usually created some MACRO that do this for me every time but then I was told great SAVING TIME TIP!

  1. Open MS SQL Management Studio
  2. Open New Query Window
  3. Expand to your table and click on expand icon to see the columns
  4. Just drag and drop to your New Query Window

You got all your columns. Super nice time saver for developers!

8. Be careful when using NOT IN on NULL-able columns

For example you should not use this if inner SELECT can return you NULL as one of the values because your whole query will fail – you can not use NOT IN (NNLL, ‘Product1’, ‘Product2’)

BTW, use always EXISTS or NOT EXISTS instead of IN or NOT IN clauses.

SELECT Field1 FROM MyTable WHERE Field2 NOT IN (SELECT ProductName FROM Product)

On the other hand it is completely safe to use IN operator

SELECT Field1 FROM MyTable WHERE Field2 IN (SELECT ProductName FROM Product)

To be sure you can use NOT EXIST clause

SELECT Field1 FROM MyTable WHERE Field2 NOT EXISTS (SELECT 1 FROM Product WHERE ProductName = Field2 )

9. Don’t use ORDER BY (if you don’t really needed)

Developers use very often ORDER BY at the end of their queries “because data should be ordered by some criterion”. But they don’t actually need an ORDER BY, they just think “it looks better”.

Well, don’t do this because it is costly operation. Rather then that, do ordering in your application.

SELECT Field1, Field2, Field3
FROM Table1
ORDER BY Field2

10. Performance cost of different operators and other small tips

When we compare these two queries then second is better because it is faster (natural operator versus more complex IN operator). IN operator is good for discrete values i.e. if we have 9, 99, 224, 435,… but now for sequential values 99, 100, 101, 102,… so we pay performance cost if we use it in this way.

SELECT Field1 FROM MyTable WHERE Field2 IN (1000, 2000)
SELECT Field1 FROM MyTable WHERE 1000 <= Field2 AND 2000 <= Field2

Don’t use SELECT * FROM MyTable
This is how lazy developer return the data and he use an excuse “I will maybe need everything later, so better to have everything right now”. This is not good, better think twice and return only the columns that you will really need.

UNION vs UNION ALL operator
UNION ALL is faster and if you know that two sets don’t have intersection (or you just don’t care) then UNION ALL is the right choice. UNION operator do DISTINCT sorting and this is very costly operation.

CONSTRAINTS
For example, grades at faculty exams can be between 5 and 10 and if you put constraints on your column named ‘Grade’ that is between 5 and 10 you can optimize the execution of your queries because SQL Server will not execute the plan after he check the constraints first.

CURSORS and TRIGGERS
Avoid them at all cost.

INDEXES
Do them of course, on columns you do table joins and on colums you do your data search. Do not overuse the indexes (put them on every table column).

DELETING all rows from big table
Use TRUNCATE TABLE statement instead of DELETE.