Data Think: April 2006

Thursday, April 20, 2006

Temporary Tables vs. Table Variables

Here is an interesting behavior with SQL Server. A lot of developers use table variables and temporary tables interchangeably without understanding that they are different entities, each with slightly different behaviors. Here is one of the more interesting differences

Transaction Rollback

If you have a transaction that affects a temporary table, and roll it back, the temporary table is restored to it's original state. For example

SET NOCOUNT ON
CREATE TABLE #TempTable (TempTableColumn int)
DECLARE @TableVar TABLE (TableVarColumn int)
BEGIN TRANSACTION
INSERT INTO #TempTable SELECT 1
INSERT INTO @TableVar SELECT 1
ROLLBACK TRANSACTION
SELECT * FROM #TempTable
SELECT * FROM @TableVar

Results in

TempTableColumn
---------------
TableVarColumn
--------------
1

As you can see the temporary table was rolled back but the table variable was not.

This has several possible uses. An example I have used is, when I wanted to rollback a transaction, I copied the relevant records that were changed previously to a table variable and then rolled back the transaction. That way I had a table available with the old data after the rollback.

Sunday, April 16, 2006

Performance Implications of Scalar User Defined Functions (UDF)

One of the major new features of SQL Server 2000 was user defined functions (UDF). This new feature allowed procedures to encapsulate logic in a function.

For example:
In the Northwind Database 'Order Details' table the total sale can be calculated using the following:

SELECT CONVERT(Money,UnitPrice * Quantity * (1-Discount))FROM [Order Details]

It is possible to encapsulate this in a scalar UDF below:

CREATE FUNCTION dbo.Func_GetSale (@Price MONEY, @Quantity INT, @Discount FLOAT)
RETURNS MONEY AS
BEGIN
RETURN @Price * @Quantity * ( 1 - @Discount)
END

Then the above query could be simplified to:
SELECT dbo.Func_GetSale(UnitPrice, Quantity, Discount)FROM [Order Details]

Obviously this is simpler to understand and better from a maintenance point of view. But what about performance? If I ran the first query 5x on my computer it takes 63ms. If I run the query using the UDF it takes 140ms, over twice as long.

This above simple example shows the penalty of using UDF. I think that UDFs are great BUT they are not free. I use them frequently but keep them away from performance sensitive tasks. In a later post I will analyze and comment on the performance implications of table-valued UDF.

Monday, April 10, 2006

UNION vs. UNION ALL

One very common mistake that people make is in using the union operator. For reasons that are entirely non-obvious to someone new to the language, the union operator automatically applies distinctness to the result. That is all duplicate results are removed. Obviously this is entirely unexpected behaviour.

If you want to get all results, including duplicates, use UNION ALL instead. This does not apply distinctness.

Why would you care?

Well for one thing, by applying distinctness to the resultset a significant performance hit is created. The server has to create internal temporary tables to look at the entire result before sending it to the caller.

Anyways, in general, you want to use UNION ALL instead of UNION 99% of the time. Pretty much the only time you should use UNION instead is if you were going to use SELECT DISTINCT in the queries. Then, if you use UNION you do not need to use SELECT DISTICT and duplicate values from the whole resultset will be removed.

Sunday, April 09, 2006

What is Transact-SQL (T-SQL)

In short, T-SQL is a variation of SQL implemented by Microsoft and Sybase for thier SQL Server products. Originally it developed at Sybase. Microsoft came into the picture when they did a partnership with Sybase to create a database server for OS/2. Since then Microsoft and Sybase have pretty much parted ways each developing their products in slightly different directions. Microsoft has made MS SQL Server one of thier key strategic products and has positioned it against Oracle, DB2 and MySQL.

T-SQL is somewhat different in syntax than PL-SQL used by Oracle. 90% of the basics are the same but much of the performance and optimization is different. That said, the basic principals behind T-SQL and PL-SQL are very similar.

Pure ANSI SQL is a language that is essentially not used by anyone due to it's intentially incomplete specification. The SQL specification for example never discusses indexes which is a very important feature of SQL Server. This is only one way in which SQL is a very diffent language than lower level languages such as C++ and VB.

SQL is considered a 4th generation language (4GL). This compares to C++ and VB which are considered 3GL languages. What this means SQL describes to the computer what the programmer wants done, but leaves the actual implementation and algorithm up to the Relational Database Management System (RDBMS). In conventional 3GL languages it is the programmer who developes the algorithm.

Obviously there is no solid line here. In reality there is much poor SQL code that is written in a 3GL style, and there are some object libraries (.Net Framework for example) that have given 3GL languages some 4GL like functionality (data-binding would be an example).

The long and short of it is... if you write your SQL code the same way you write in a 3GL language, your performance will be poor.

FAQ
Q-What is this blog about?

A-This blog is about all things T-SQL. I feel really strongly about the importance of knowing this language well in order to develop modern applications that use SQL Server. Hopefully this blog will educate and give usefull information about what this language can do and how to do it.

Q-Why do you feel the need to create this blog?

A-I think that knowledge of SQL in general and T-SQL in particular is absolutly essential for modern programming on the Microsoft Platform. Unfortunatly most programmers treat SQL as just another language and do not really understand that SQL is really a different kind of language than Java, C#, C++, VB, PHP etc. As a result I see lots of really bad SQL out there. This poor SQL has a massive negative impact on performance and greatly reduces what people can do with their applications. This blog will address this issue, or, at the very least, let me vent a little bit about the problem.

Data Think

Thursday, April 20, 2006

Sunday, April 16, 2006

Monday, April 10, 2006

Sunday, April 09, 2006

Links

Previous Posts

Archives