MS SQL Server 2014 “Hybrid” to attack MongoDB

On April 1st 2014 MS SQL Server 2014 is internally released for small group of big customers such as bwin and others. The biggest improvement is in-memory OLTP engine to deliver breakthrough performance to their mission critical applications. Company bwin scale its applications to 250K requests a second, a 16x increase from before, and provide an overall faster and smoother customer playing experience.

The new In-Memory OLTP engine (formerly code named Hekaton), provides significantly improved OLTP performance by moving selected tables and stored procedures into memory. Hekaton enables you to use both disk based tables and Memory-Optimized Tables together in the same queries and stored procedures.

The In-memory OTLP engine works with standard x64 hardware. The new In-Memory OLTP engine uses a completely new optimistic locking design that’s optimized for in-memory data operations. In addition, stored procedures are compiled into native Win64 code. The end result is far faster application performance.

Microsoft recommends that you provide an amount of memory that’s two times the on-disk size of the memory-optimized tables and indexes.

Why similarity to MongoDB? Because we were using it also and if you have enough RAM memory it keeps all indexes in RAM and is very quick plus it uses the advantages of SSD disks. Where SQL Server shines comparing to Mongo is when you need to group large amount of data and analyze them. Until this version of MS SQL 2014 (we didn’t test it) MongoDB was superior when it comes to large number of insert request into database. For analyzing the large amount of data MS SQL was better. Now, with new SQL Server 2014 it comes closer to MongoDB main advantage (process data in memory and use SSD advantages – if specified).

in-memory-oltp

(Source: sqlmag.com)

The following data types aren’t supported by memory-optimized tables:

  • datetimeoffset
  • geography
  • hierarchyid
  • image
  • ntext
  • sql_variant
  • text
  • varchar(max)
  • User data types (UDTs)

In addition, there are a number of database features that aren’t supported. Here are some of the most important database and table limitations:

  • Database mirroring isn’t supported.
  • The AUTO_CLOSE database option isn’t supported.
  • Database snapshots aren’t supported.
  • DBCC CHECKDB and DBCC CHECKTABLE don’t work.
  • Computed columns aren’t supported.
  • Triggers aren’t supported.
  • FOREIGN KEY, CHECK, and UNIQUE constraints aren’t supported.
  • IDENTITY columns aren’t supported.
  • FILESTREAM storage isn’t supported.
  • ROWGUIDCOL isn’t supported.
  • Clustered indexes aren’t supported.
  • Memory-optimized tables support a maximum of eight indexes.
  • COLUMNSTORE indexes aren’t supported.
  • ALTER TABLE isn’t supported. In-Memory OLTP tables must be dropped and re-created.
  • Data compression isn’t supported.
  • Multiple Active Result Sets (MARS) aren’t supported.
  • Change Data Capture (CDC) isn’t supported.

Other SQL Server 2014 improvements

  • Using advantages of SSD disks technology (new buffer pool enhancements increase performance by extending SQL Server’s in-memory buffer pool to SSDs for faster paging)
  • AlwaysOn Availability Groups now support up to eight secondary replicas
  • Business Intelligence Enhancements (Power View can work against multidimensional cube data, new data visualization tool named Power Query, new visual data mapping feature named Power Maps)
  • SQL Server 2014 also supports encrypted backups
  • SQL Server 2014 will have the ability to scale up to 640 logical processors and 4 TB of memory in a physical environment. Plus, it has the ability to use up to 64 virtual processors and 1 TB of memory when running in a VM

More reading about this:

Continue Reading

Top 10 query optimizing tips for MS SQL Server

tips-and-tricks

During my 15 years of experience and working on various projects that involved using MS SQL Server in combination with C# programming language, and also gathering knowledge from different conferences, especially from the last one that occured today at SQL Saturday by ApexSQL where Miloš Radivojević showed some tips, I tried to summarize this knowledge in a small list of recommendations that you should be aware of when writing the Transact SQL queries…

1. Using local defined variables

When possible always try to use direct values instead of variables (this is the rare case in real life because you need variables more often then static values). This is the fastest way to execute the query. When you deal with store procedures I found out that you can benefit enormously in speed execution if you just copy your store procedure parameters into local variables and then use only these local variables in your store procedure queries. I had relatively big tables with 20 million records with proper indexes but doing this trick really boost the performance of my store procedure.
If you don’t have uniform distribution you will have problems with local variables and to optimize this you can use OPTION(RECOMPILE) at the end of your queries.

--Use direct values
SELECT	c.CustomerID,
				c.TerritoryID,
				c.CustomerType,
				ca.AddressID,
				soh.SalesOrderID,
				soh.OrderDate,
				soh.DueDate,
				soh.ShipDate,
				soh.SubTotal,
				soh.TaxAmt,
				soh.TotalDue
FROM		Sales.Customer c LEFT JOIN
				Sales.CustomerAddress ca ON ca.CustomerID = c.CustomerID LEFT JOIN
				Sales.SalesOrderHeader soh ON soh.CustomerID = c.CustomerID
WHERE		c.TerritoryID = 5 AND c.CustomerType = 'S' AND soh.SubTotal > 1000

--Now use variables instead of direct values
DECLARE @TerritoryID int, @CustomerType nchar(1), @SubTotal money
SET @TerritoryID = 5
SET @CustomerType = 'S'
SET @SubTotal = 1000

SELECT	c.CustomerID,
				c.TerritoryID,
				c.CustomerType,
				ca.AddressID,
				soh.SalesOrderID,
				soh.OrderDate,
				soh.DueDate,
				soh.ShipDate,
				soh.SubTotal,
				soh.TaxAmt,
				soh.TotalDue
FROM		Sales.Customer c LEFT JOIN
				Sales.CustomerAddress ca ON ca.CustomerID = c.CustomerID LEFT JOIN
				Sales.SalesOrderHeader soh ON soh.CustomerID = c.CustomerID
WHERE		c.TerritoryID = @TerritoryID AND c.CustomerType = @CustomerType AND soh.SubTotal > @SubTotal

--Create a store procedure with these parameters and run it
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE PROCEDURE [dbo].[DemoProcedure]
	@TerritoryID int,
	@CustomerType nchar(1),
	@SubTotal money
AS
BEGIN
	SELECT	c.CustomerID,
					c.TerritoryID,
					c.CustomerType,
					ca.AddressID,
					soh.SalesOrderID,
					soh.OrderDate,
					soh.DueDate,
					soh.ShipDate,
					soh.SubTotal,
					soh.TaxAmt,
					soh.TotalDue
	FROM		Sales.Customer c LEFT JOIN
					Sales.CustomerAddress ca ON ca.CustomerID = c.CustomerID LEFT JOIN
					Sales.SalesOrderHeader soh ON soh.CustomerID = c.CustomerID
	WHERE		c.TerritoryID = @TerritoryID AND c.CustomerType = @CustomerType AND soh.SubTotal > @SubTotal
END

--Now run the store procedure
EXEC [dbo].[DemoProcedure]
	@TerritoryID = 5,
	@CustomerType = 'S',
	@SubTotal = 1000

--Now modife this procedure by using local variables and assigning parameters to them
ALTER PROCEDURE [dbo].[DemoProcedure]
	@TerritoryID int,
	@CustomerType nchar(1),
	@SubTotal money
AS
BEGIN
	DECLARE @aTerritoryID int, @aCustomerType nchar(1), @aSubTotal money
	SET @aTerritoryID = @TerritoryID
	SET @aCustomerType = @CustomerType
	SET @aSubTotal = @SubTotal

	SELECT	c.CustomerID,
					c.TerritoryID,
					c.CustomerType,
					ca.AddressID,
					soh.SalesOrderID,
					soh.OrderDate,
					soh.DueDate,
					soh.ShipDate,
					soh.SubTotal,
					soh.TaxAmt,
					soh.TotalDue
	FROM		Sales.Customer c LEFT JOIN
					Sales.CustomerAddress ca ON ca.CustomerID = c.CustomerID LEFT JOIN
					Sales.SalesOrderHeader soh ON soh.CustomerID = c.CustomerID
	WHERE		c.TerritoryID = @aTerritoryID AND c.CustomerType = @aCustomerType AND soh.SubTotal > @aSubTotal
END

--Now run the store procedure again
EXEC [dbo].[DemoProcedure]
	@TerritoryID = 5,
	@CustomerType = 'S',
	@SubTotal = 1000

2. Index on XML column data type

Don’t put an index on XML column data type. It just don’t work well and you can get really strange results from MS SQL Server that can slower the query execution rapidly. it should work by theory OK, but it doesn’t.

3. Do not use functions in WHERE clauses

This is slowing down enormously query execution. Try always to do the same logic with normal query operators, try to rethink your strategy to avoid this costly scenario.

4. Do not user UPPER sting function if Transact SQL queries if the database is Case Insensitive

Logical request. Developers are thinking “You know… I just want to be sure that left side and right side are uppercased…” 🙂 Well… Don’t do this, just trust the product and you will gain good performance boost.

SELECT Field1 FROM MyTable WHERE UPPER(Field2) = 'MyName'

5. Never do the calculation on your columns (if not needed)

For example, if we have this query

SELECT Field1 FROM MyTable WHERE Field2 * 2 = 10000

SELECT Field1 FROM MyTable WHERE Field2= 10000 / 2

it is obvious that second query will be much faster because in first query MS SQL Server need to multiply all values by 2 for column “Field2” in the table.

6. Comparing non-unicode columns (varchar) with unicode pattern

Don’t do this if your column is type of varchar because the performance will be de degradated. So, in following example second call is bad if Field2 is type of varchar. Second call is OK only if Field2 is nvarchar.

SELECT Field1 FROM MyTable WHERE Field2 = 'MyProperty'
SELECT Field1 FROM MyTable WHERE Field2 = N'MyProperty'

7. Time saving tip for developers

Very often when you are writting your classes you need to be sure that you don’t make typing errors in the variable names. So, usually we do the same property names as are our column names in some table. In order to make our life easier we usually do this:

  1. Open MS SQL Management Studio
  2. Right click on wished table and choose Script Table as… > SELECT To > New Query Window

After this you get SELECT query but all column names are with these parentheses [] and there are commas, spaces… You need to get rid of all this. I usually created some MACRO that do this for me every time but then I was told great SAVING TIME TIP!

  1. Open MS SQL Management Studio
  2. Open New Query Window
  3. Expand to your table and click on expand icon to see the columns
  4. Just drag and drop to your New Query Window

You got all your columns. Super nice time saver for developers!

8. Be careful when using NOT IN on NULL-able columns

For example you should not use this if inner SELECT can return you NULL as one of the values because your whole query will fail – you can not use NOT IN (NNLL, ‘Product1’, ‘Product2’)

BTW, use always EXISTS or NOT EXISTS instead of IN or NOT IN clauses.

SELECT Field1 FROM MyTable WHERE Field2 NOT IN (SELECT ProductName FROM Product)

On the other hand it is completely safe to use IN operator

SELECT Field1 FROM MyTable WHERE Field2 IN (SELECT ProductName FROM Product)

To be sure you can use NOT EXIST clause

SELECT Field1 FROM MyTable WHERE Field2 NOT EXISTS (SELECT 1 FROM Product WHERE ProductName = Field2 )

9. Don’t use ORDER BY (if you don’t really needed)

Developers use very often ORDER BY at the end of their queries “because data should be ordered by some criterion”. But they don’t actually need an ORDER BY, they just think “it looks better”.

Well, don’t do this because it is costly operation. Rather then that, do ordering in your application.

SELECT Field1, Field2, Field3
FROM Table1
ORDER BY Field2

10. Performance cost of different operators and other small tips

When we compare these two queries then second is better because it is faster (natural operator versus more complex IN operator). IN operator is good for discrete values i.e. if we have 9, 99, 224, 435,… but now for sequential values 99, 100, 101, 102,… so we pay performance cost if we use it in this way.

SELECT Field1 FROM MyTable WHERE Field2 IN (1000, 2000)
SELECT Field1 FROM MyTable WHERE 1000 <= Field2 AND 2000 <= Field2

Don’t use SELECT * FROM MyTable
This is how lazy developer return the data and he use an excuse “I will maybe need everything later, so better to have everything right now”. This is not good, better think twice and return only the columns that you will really need.

UNION vs UNION ALL operator
UNION ALL is faster and if you know that two sets don’t have intersection (or you just don’t care) then UNION ALL is the right choice. UNION operator do DISTINCT sorting and this is very costly operation.

CONSTRAINTS
For example, grades at faculty exams can be between 5 and 10 and if you put constraints on your column named ‘Grade’ that is between 5 and 10 you can optimize the execution of your queries because SQL Server will not execute the plan after he check the constraints first.

CURSORS and TRIGGERS
Avoid them at all cost.

INDEXES
Do them of course, on columns you do table joins and on colums you do your data search. Do not overuse the indexes (put them on every table column).

DELETING all rows from big table
Use TRUNCATE TABLE statement instead of DELETE.

Continue Reading