Data Think

Monday, June 05, 2006

Interesting Performance Problem Part II

In my last post I discussed a problem with a slow running query. In this post I will explain why.

The answer to the problem can be seen if you look at the execution plan for the query. To do this from query analyzer choose the query --> Show Execution Plan from the top menu. Alternatively execute 'set showplan_Text ON'. This will cause the output to display the execution plan not the results.

Once this is done you will see the following:
|--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([Spitfire].[dbo].[Users]))
|--Index Scan(OBJECT:([Spitfire].[dbo].[Users].[IX_Users_2]), WHERE:(Convert([Users].[IsValid])=1))

See the problem. The compiler is casting the [Users].[IsValid] to an integer to perform the comparison to the value of 1. What this means is that, for each and every row in the table the value has to be read, converted an int and then evaluated against the integer 1. The sort of the index is useless and performance will be slow. A simple change to the query will resolve this.

SELECT * from Users WHERE IsValid = CONVERT(bit,1)

This will make the compiler convert the integer value to a bit first, and then compare it to the column in the table. This means that only one conversion is executed and the index may be used. The resulting execution plan is:
|--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([Spitfire].[dbo].[Users]))
|--Index Seek(OBJECT:([Spitfire].[dbo].[Users].[IX_Users_2]), SEEK:([Users].[IsValid]=1) ORDERED FORWARD)

This will use an index seek, not an index scan. A much faster and much more efficient plan for our purposes. Now, the same query will be virtually instantaneous about 1ms. MUCH BETTER.

Interesting Performance Problem:
I recently had an interesting problem that stemed from an implicit conversion doing something really stupid.

Here is a reproduction of the scenerio.

I have a table Users with the following columns
ID int identity
FirstName varchar(50)
MiddleName varchar(50)
LastName varchar(50)
IsValid bit

The table has around 2,000,000 records in it, three of which have the IsValid set to true. Each column has an index on it.

When I do the following query performance is poor

SELECT * from Users WHERE IsValid = 1

It takes about 200ms to execute. This is way too slow.

I will discuss why and how to fix in my next post (which I promise will be much sooner.)

Note: Tests were run using SQL Server 2000, SQL Server 2005 does much better.

Tuesday, May 16, 2006

.Net Data Access
If you are going to be using SQL Server, you are probably using it with Microsoft .net development tools. The data access components in .Net grew up from their VB roots and are powerfull, flexible and easy to use. Unfortunatly, like most other things, they can be abused as well.

Fortunatly, Microsoft and it's community have developed some seriously idiot proof tools. The Microsoft Data Access Application Block and the Enterprise Library make calling stored procedures as easy as pie. If you are writting data access routines in C# or Visual Basic and are not using these then you should really have a good look. I see to many people creating data adaptors, commands and connections in code when they are all wrapped up in these libraries. They can be found at Application Blocks.

The other tool of note is the built in dataset designer in Visual Studio. By simply dragging tables from there server explorer to a dataset it creates a nice, type-safe dataset including table adaptors. Table adaptors as generated by Visual Studio are great although they are much more geared to Client-Server 2-tier development and not proper 3-tier or web development. Having said that they can still be used for this and are quite neat.

Monday, May 01, 2006

Optional items in a Where clause

Here is a common situation. You have a query that has several possible but optional filters. For example you may filter on the first name or last name of both. Now to do this there are several possible technics. You could
1. assemble a dynamic SQL statement, you could
2. use the if statment to execute a different select statement or
3. use conditions in the where clause to handle it in one statement

Dynamic SQL
CREATE PROCEDURE Dynamic_SQL @FirstName nvarchar(100), @Lastname nvarchar(100)

AS

DECLARE @sql nvarchar(1000)

SELECT @sql = 'SELECT * FROM Users u WHERE 1=1 '

IF @FirstName IS NOT NULL

BEGIN

SELECT @sql = @sql + ' AND FirstName =''' + @Firstname + ''''

END

IF @LastName IS NOT NULL

BEGIN

SELECT @sql = @sql + ' AND LastName =''' + @Lastname + ''''

END

print @SQL

EXEC sp_executesql @sql

The above proc does what it is supposed to do but it has a couple of drawbacks. Firstly it lacks elegance. Secondly it totally defeats the whole purpose of using a stored procedure. The dynamic SQL requires a compile, there is a significant security risk and lastly it is really hard to follow. If you are going to do this you might as well just do it in the compiled VB/C# code.

Use a Where Clause
CREATE PROCEDURE IF_SQL @FirstName nvarchar(100), @Lastname nvarchar(100)

AS

IF @FirstName IS NOT NULL AND @LastName IS NOT NULL

BEGIN

SELECT * FROM Users u WHERE FirstName = @FirstName AND LastName = @LastName

END

IF @FirstName IS NULL AND @LastName IS NOT NULL

BEGIN

SELECT * FROM Users u WHERE LastName = @LastName

END

IF @FirstName IS NOT NULL AND @LastName IS NULL

BEGIN

SELECT * FROM Users u WHERE FirstName = @FirstName

END

IF @FirstName IS NULL AND @LastName IS NULL

BEGIN

SELECT * FROM Users u

END

Again this works but, as you can imagine, any more than two paramaters and it get unwieldy. Also, the if clause can force a recompile.

WHERE Clause
ALTER PROCEDURE Where_SQL @FirstName nvarchar(100), @Lastname nvarchar(100)

AS

SELECT * FROM Users u

WHERE (FirstName = @FirstName OR @FirstName IS NULL)

AND (LastName = @LastName OR @LastName IS NULL)

Again this works and is a much more elagant solution. The code is tight, easy to follow and performs well. The main problem with this is that the compiler will factor in all of the where conditions into it's execution plan. That is, if there are tables needed only for the where clause, they will be processed even if that where condition is not needed. However, IMHO this is the way to do it for 90% of cases.

Thursday, April 20, 2006

Temporary Tables vs. Table Variables

Here is an interesting behavior with SQL Server. A lot of developers use table variables and temporary tables interchangeably without understanding that they are different entities, each with slightly different behaviors. Here is one of the more interesting differences

Transaction Rollback

If you have a transaction that affects a temporary table, and roll it back, the temporary table is restored to it's original state. For example

SET NOCOUNT ON
CREATE TABLE #TempTable (TempTableColumn int)
DECLARE @TableVar TABLE (TableVarColumn int)
BEGIN TRANSACTION
INSERT INTO #TempTable SELECT 1
INSERT INTO @TableVar SELECT 1
ROLLBACK TRANSACTION
SELECT * FROM #TempTable
SELECT * FROM @TableVar

Results in

TempTableColumn
---------------
TableVarColumn
--------------
1

As you can see the temporary table was rolled back but the table variable was not.

This has several possible uses. An example I have used is, when I wanted to rollback a transaction, I copied the relevant records that were changed previously to a table variable and then rolled back the transaction. That way I had a table available with the old data after the rollback.

Sunday, April 16, 2006

Performance Implications of Scalar User Defined Functions (UDF)

One of the major new features of SQL Server 2000 was user defined functions (UDF). This new feature allowed procedures to encapsulate logic in a function.

For example:
In the Northwind Database 'Order Details' table the total sale can be calculated using the following:

SELECT CONVERT(Money,UnitPrice * Quantity * (1-Discount))FROM [Order Details]

It is possible to encapsulate this in a scalar UDF below:

CREATE FUNCTION dbo.Func_GetSale (@Price MONEY, @Quantity INT, @Discount FLOAT)
RETURNS MONEY AS
BEGIN
RETURN @Price * @Quantity * ( 1 - @Discount)
END

Then the above query could be simplified to:
SELECT dbo.Func_GetSale(UnitPrice, Quantity, Discount)FROM [Order Details]

Obviously this is simpler to understand and better from a maintenance point of view. But what about performance? If I ran the first query 5x on my computer it takes 63ms. If I run the query using the UDF it takes 140ms, over twice as long.

This above simple example shows the penalty of using UDF. I think that UDFs are great BUT they are not free. I use them frequently but keep them away from performance sensitive tasks. In a later post I will analyze and comment on the performance implications of table-valued UDF.

Monday, April 10, 2006

UNION vs. UNION ALL

One very common mistake that people make is in using the union operator. For reasons that are entirely non-obvious to someone new to the language, the union operator automatically applies distinctness to the result. That is all duplicate results are removed. Obviously this is entirely unexpected behaviour.

If you want to get all results, including duplicates, use UNION ALL instead. This does not apply distinctness.

Why would you care?

Well for one thing, by applying distinctness to the resultset a significant performance hit is created. The server has to create internal temporary tables to look at the entire result before sending it to the caller.

Anyways, in general, you want to use UNION ALL instead of UNION 99% of the time. Pretty much the only time you should use UNION instead is if you were going to use SELECT DISTINCT in the queries. Then, if you use UNION you do not need to use SELECT DISTICT and duplicate values from the whole resultset will be removed.

Sunday, April 09, 2006

What is Transact-SQL (T-SQL)

In short, T-SQL is a variation of SQL implemented by Microsoft and Sybase for thier SQL Server products. Originally it developed at Sybase. Microsoft came into the picture when they did a partnership with Sybase to create a database server for OS/2. Since then Microsoft and Sybase have pretty much parted ways each developing their products in slightly different directions. Microsoft has made MS SQL Server one of thier key strategic products and has positioned it against Oracle, DB2 and MySQL.

T-SQL is somewhat different in syntax than PL-SQL used by Oracle. 90% of the basics are the same but much of the performance and optimization is different. That said, the basic principals behind T-SQL and PL-SQL are very similar.

Pure ANSI SQL is a language that is essentially not used by anyone due to it's intentially incomplete specification. The SQL specification for example never discusses indexes which is a very important feature of SQL Server. This is only one way in which SQL is a very diffent language than lower level languages such as C++ and VB.

SQL is considered a 4th generation language (4GL). This compares to C++ and VB which are considered 3GL languages. What this means SQL describes to the computer what the programmer wants done, but leaves the actual implementation and algorithm up to the Relational Database Management System (RDBMS). In conventional 3GL languages it is the programmer who developes the algorithm.

Obviously there is no solid line here. In reality there is much poor SQL code that is written in a 3GL style, and there are some object libraries (.Net Framework for example) that have given 3GL languages some 4GL like functionality (data-binding would be an example).

The long and short of it is... if you write your SQL code the same way you write in a 3GL language, your performance will be poor.

FAQ
Q-What is this blog about?

A-This blog is about all things T-SQL. I feel really strongly about the importance of knowing this language well in order to develop modern applications that use SQL Server. Hopefully this blog will educate and give usefull information about what this language can do and how to do it.

Q-Why do you feel the need to create this blog?

A-I think that knowledge of SQL in general and T-SQL in particular is absolutly essential for modern programming on the Microsoft Platform. Unfortunatly most programmers treat SQL as just another language and do not really understand that SQL is really a different kind of language than Java, C#, C++, VB, PHP etc. As a result I see lots of really bad SQL out there. This poor SQL has a massive negative impact on performance and greatly reduces what people can do with their applications. This blog will address this issue, or, at the very least, let me vent a little bit about the problem.