Dec 02 2013

Advancing DB LSN and Neat T-SQL

Published by under SQL Server

Recently I have been working on a new backup strategy package, and as part of that I needed to be able to advance the LSN of a database between backups and transaction log backups to simulate a real life environment with a near constant transactional volume. I am certain there are a lot simpler methods of handling this scenario, but this led me to develop a fun little script that was designed to generate some randomized activity on a made up table that contained a single UNIQUEIDENTIFIER (Boo!) column.

The first part of this randomization sequence is to create an in-memory numbers table. This table is used to hold a list of sequential and unique numbers which I will use to generate sets of unique identifiers to insert into my random table. There are a number of unique and interesting ways to build a number table like this, however I chose this particular method both because it’s a neat use of a recursive CTE and because the number of values I need to generate in this case is relatively low.

DECLARE @Numbers TABLE (Number BIGINT);

;WITH cteNumberList AS (
    SELECT 1 [Number]
    
    UNION ALL
    
    SELECT Number + 1
    FROM cteNumberList
    WHERE Number < 1000
)

INSERT INTO @Numbers
SELECT Number
FROM cteNumberList
OPTION (MAXRECURSION 0);

The next step in my process is to make sure the table that I want to play with exists. Including this means that I can run this script on any given database in any given environment and still be able to produce the LSN forwarding effect that I am looking for.

IF OBJECT_ID(N'Randomizer') IS NULL
CREATE TABLE Randomizer (RandomGUID UNIQUEIDENTIFIER)
GO

Finally we come to the fun part of this sequence, where we begin to produce our randomized results. In this case, I am randomly choosing both an action and a random number of actions to perform. Additionally as part of this, I want to save the current number of rows to avoid growing my table to large or updating and deleting non-existent records.

DECLARE @CUD INT, @RowCount BIGINT, @RowsToChange INT

SELECT @RowCount = COUNT(*)
    ,@CUD = ABS(CHECKSUM(NEWID())) % 3 + 1
    ,@RowsToChange = ABS(CHECKSUM(NEWID())) % 1000 + 1
FROM Randomizer

Now that we have what we’re going to do, the objects in place to do it, and the supporting table to make quick inserts we can now perform some magic and produce our randomized results. In this script I also collect some information on the number of affected rows because hey why not.

I'll do what I want!

IF (@CUD = 1 AND @RowCount < 50000)
BEGIN --Insert
    INSERT INTO Randomizer SELECT NEWID()
    FROM @Numbers WHERE Number BETWEEN 1 AND @RowsToChange
    
    SET @Created = @Created + @RowsToChange
END
ELSE IF (@CUD = 2 AND @RowCount > 0)
BEGIN --Update
    UPDATE TOP (@RowsToChange) R SET RandomGUID = NEWID()
    FROM Randomizer R
    
    SET @Updated = @Updated + @RowsToChange
END
ELSE IF (@CUD = 3 AND @RowCount > 0)
BEGIN --Delete
    DELETE TOP (@RowsToChange) FROM Randomizer

    SET @Deleted = @Deleted + @RowsToChange
END

Bringing this all together into one script, I now have a fairly effective and efficient script that will advance the LSN of my database for use in my testing. There are also some neat methods here which can be applied elsewhere and in other ways which is really what I wanted to share with everyone.

IF OBJECT_ID(N'Randomizer') IS NULL
CREATE TABLE Randomizer (RandomGUID UNIQUEIDENTIFIER)
GO

SET NOCOUNT ON;
DECLARE @Numbers TABLE (Number BIGINT);
DECLARE @CUD INT, @RowCount BIGINT, @RowsToChange INT, @Counter INT = 0;
DECLARE @Created INT = 0, @Updated INT = 0, @Deleted INT = 0;

;WITH cteNumberList AS (
    SELECT 1 [Number]
    
    UNION ALL
    
    SELECT Number + 1
    FROM cteNumberList
    WHERE Number < 1000
)

INSERT INTO @Numbers
SELECT Number
FROM cteNumberList
OPTION (MAXRECURSION 0);

WHILE @Counter < 100
BEGIN
SELECT @RowCount = COUNT(*)
    ,@CUD = ABS(CHECKSUM(NEWID())) % 3 + 1
    ,@RowsToChange = ABS(CHECKSUM(NEWID())) % 1000 + 1
FROM Randomizer

IF (@CUD = 1 AND @RowCount < 50000)
BEGIN --Insert
    INSERT INTO Randomizer SELECT NEWID()
    FROM @Numbers WHERE Number BETWEEN 1 AND @RowsToChange
    
    SET @Created = @Created + @RowsToChange
END
ELSE IF (@CUD = 2 AND @RowCount > 0)
BEGIN --Update
    UPDATE TOP (@RowsToChange) R SET RandomGUID = NEWID()
    FROM Randomizer R
    
    SET @Updated = @Updated + @RowsToChange
END
ELSE IF (@CUD = 3 AND @RowCount > 0)
BEGIN --Delete
    DELETE TOP (@RowsToChange) FROM Randomizer

    SET @Deleted = @Deleted + @RowsToChange
END

    SET @Counter = @Counter + 1
END

PRINT 'Inserted ' + CONVERT(VARCHAR, @Created) +
    ', Updated ' + CONVERT(VARCHAR, @Updated) +
    ' and Deleted ' + CONVERT(VARCHAR, @Deleted) +
    ' rows.'

No responses yet

Nov 26 2013

Variable Assignment Behavior

Published by under SQL Server

One of the neater little pieces of T-SQL functionality that comes in the native engine is the ability to assign values to a variable from a select statement. Usually this is done very simply with single record result sets to assign one particular variable or variables in one go as part of some process.

DECLARE @TimeStart DATETIME;
SELECT @TimeStart = GETDATE();

However, there is some additional behaviors to this functionality which are not frequently explored, for example by initializing a variable and then selecting a series of string values to it then it becomes possible to produce a concatenated variable value without using FOR XML query options.

In this example, I am searching a particular server for a table which I lost. This particular table was added to allow me to increment LSN values as part of testing of a backup strategy and is named Randomizer. Therefore, I want to search all databases on my server for tables named Randomizer. I have also updated this particular SQL construction to produce formatted output to read in the print buffer.

DECLARE @Commands TABLE (CommandText NVARCHAR(MAX), DBName NVARCHAR(128));
DECLARE @SQL NVARCHAR(MAX);

SET @SQL = N'SELECT ''master'' DBName, name TableName FROM sys.tables WHERE 1 = 0'

SELECT @SQL = @SQL + CHAR (10) + CHAR (10) +
    N'UNION ALL' + CHAR(10) + CHAR (10) +
    N'SELECT' + CHAR(10) +
    N'  ''' + D.name + ''',' + CHAR(10) +
    N'  name' + CHAR(10) +
    N'FROM ' + D.name + '.sys.tables' + CHAR(10) +
    N'WHERE name = ''Randomizer'''
FROM sys.databases D
WHERE D.name NOT IN ('master','model','msdb','tempdb')

PRINT @SQL
EXEC (@SQL)

This code then produces the much prettier output shown below.

SELECT 'master' DBName, name TableName FROM sys.tables WHERE 1 = 0

UNION ALL

SELECT
  'Database1',
  name
FROM Database1.sys.tables
WHERE name = 'Randomizer'

UNION ALL

SELECT
  'Database2',
  name
FROM Database2.sys.tables
WHERE name = 'Randomizer'

UNION ALL

SELECT
  'Database3',
  name
FROM Database3.sys.tables
WHERE name = 'Randomizer'

No responses yet

Sep 14 2013

Converting contextually aware DST offsets

Published by under SQL Server

It’s been quite awhile since I have posted any blogs and I want to get back into doing this to share my experiences, so I want to share one of my recent small challenges that I came across.

In one of the data warehouses that I worked on recently, some of the data was coming in for UTC. In the user system, they wanted all times to be displayed in CST or CDT and for the displayed information to be shown contextually aware of whether the time that was recorded fell within Daylight Savings Time. In order to do this, we had to determine if the date fell between the second Sunday in March, or the first Sunday in November when the clocks changed.

There were a few different ways that this could be accomplished, but after tooling around on Google for awhile I found a great means of computing whether or not the time fell between these two values

CREATE FUNCTION [dbo].[fn_GetDaylightSavingsTime]
(
@UTC_Date DATETIME,
@ST_Offset INT, -- CST = -6, EST = -5
@DT_Offset INT  -- CDT = -5, EDT = -4
)
RETURNS DATETIME
AS
BEGIN


RETURN 
DATEADD(hh, 
CASE WHEN YEAR(@UTC_Date) <= 2006 THEN  
                CASE WHEN 
                      @UTC_Date >=  '4/' + CAST(ABS(8 - DATEPART(WEEKDAY,'4/1/' 
+ CAST(YEAR(@UTC_Date) AS VARCHAR))) % 7 + 1 AS VARCHAR) +  '/' 
+ CAST(YEAR(@UTC_Date) AS VARCHAR) + ' 2:00' AND 
                      @UTC_Date < '10/' + CAST(32 - DATEPART(WEEKDAY,'10/31/' 
+ CAST(YEAR(@UTC_Date) AS VARCHAR)) AS VARCHAR) +  '/' 
+ CAST(YEAR(@UTC_Date) AS VARCHAR) + ' 2:00' 
                THEN @DT_Offset ELSE @ST_Offset END
              ELSE
                CASE WHEN 
                      @UTC_Date >= '3/' + CAST(ABS(8 - DATEPART(WEEKDAY,'3/1/' 
+ CAST(YEAR(@UTC_Date) AS VARCHAR))) % 7 + 8 AS VARCHAR) +  '/' 
+ CAST(YEAR(@UTC_Date) AS VARCHAR) + ' 2:00' AND 
                      @UTC_Date < 
                        '11/' + CAST(ABS(8 - DATEPART(WEEKDAY,'11/1/' 
+ CAST(YEAR(@UTC_Date) AS VARCHAR))) % 7 + 1 AS VARCHAR) +  '/' 
+ CAST(YEAR(@UTC_Date) AS VARCHAR) + ' 2:00' 
                THEN @DT_Offset ELSE @ST_Offset END
              END
, @UTC_Date
)
END
GO

In my case, I would not have any dates prior to this so I used this much simpler expression in order to calculate the offset in line for me.

DATEADD(HH,
    (CASE
        WHEN CreateDate >= '3/' + CAST(ABS(8 - DATEPART(WEEKDAY, '3/1/'
            + DATENAME(YEAR, CreateDate))) % 7 + 8 AS VARCHAR) + '/'
            + DATENAME(YEAR, CreateDate) + ' 2:00' AND
            CreateDate <
            '11/' + CASE(ABS(8 - DATEPART(WEEKDAY, '11/1/'
            + DATENAME(YEAR, CreateDate))) % 7 + 1 AS VARCHAR) + '/'
            + DATENAME(YEAR, CreateDate) + ' 2:00'
        THEN -5
        ELSE -6
        END) [CSDTTime]

Credit to: Vadivel

No responses yet

Mar 22 2013

What is Big Data?

Published by under Big Data,SQL Server

A very common question that is increasingly coming up during conversations with other technology professionals is a very simple one, and yet one that carries with it a whole range of complexity. Given that it’s part of the name of my blog, I thought it was a good idea to address to the question: “What is Big Data?”

It’s not a small question either. Microsoft has an entire color brochure prepared just to talk about the topic of Big Data and how they can use their solutions to help manage it. There are professionals in the industry who specialize in nothing but managing increasingly large volumes of information.

Traditionally defined big data is a volume of data wherein the size of the data makes traditional approaches to handling volumes of data untenable, such as relational database systems (SQ Server, Oracle, …) and standard integration approaches (SSIS). To different professionals even the idea of traditional approaches is vague and intangible as the common way to approach these problems depends largely upon experience.

The truth of the matter is that the definition of “Big Data” changes from person to person based on their past experience, the companies they have worked for, and the practices and methods that they have been exposed to. If you’re working for a small start-up, a million records in a table might seem like a lot of data. To other professionals, a million rows are insignificantly small.
These days Big Data is increasingly being used to refer to terabytes and petabytes of unstructured data storing transactional information or statistics. For example, credit card transactions or RFID hits. As storage and technology grows, it becomes increasingly easy to store that information and now companies are looking for ways to analyze that which has made Big Data a buzzword . More specifically, it’s being used less to describe the data and more to describe the technological solutions that are being used to handle the extreme amount of information. Methods such as Map Reduce and new technologies such as Hadoop, HDInsight (The Microsoft Azure distribution of Hadoop), Hive, Pig, and other amusing animal-centric technologies are being developed specifically to address this.

My advice is when you are speaking with other industry professionals to steer away from using vague terms and give them as much precise information as you can about the volume you are working with, even if it’s as indeterminate as “about 50 gigs” or “Terabyte plus”. This will help the person you are speaking with get on even terms with you and will help your conversation go farther faster.

For more information on Hadoop and related technologies, Lara Rubbelke has a great list of links assembled related to each technology set: http://sqlblog.com/blogs/lara_rubbelke/archive/2012/09/10/big-data-learning-resources.aspx

Additionally, Microsoft has several tutorials related to Hadoop and HDInsight specifically here: http://www.windowsazure.com/en-us/manage/services/hdinsight/

One response so far

Mar 15 2013

A Day of Data Warehousing

Published by under Uncategorized

Yesterday I was fortunate to be able to attend a day of data warehousing with the Data Chix. I was not only able to learn, but was also able to contribute to a great cause as all proceeds from the training seminar were sent to the Cloverleaf School which is a school for children with special needs.

We also got this great picture outside of the American Legion which provided meeting space!

IADayofDataWarehousing

No responses yet

Mar 08 2013

Scripting Create Table Statements from Temp Table Definitions

Published by under SQL Server

Using temp tables is a pretty common practice (though one that should be carefully considered as when working with all database development). One of the great little things that make them nifty as well is that you don’t have to declare their structure beforehand. However, sometimes it’s useful to manually declare the definition of one of these tables. Perhaps you want to make a default value, add a constraint, or declare an index.

Pulling up all the necessary definitions though can be a pain and even using DMV’s to query the table definition only gets you part of the way there. With the help of one of my co-workers, we constructed the script below to automatically construct the column definitions from a temp table that I have already created (perhaps with SELECT … INTO) so that all I would need to do was put a simple CREATE TABLE around it and move on with my life.

I thought others might enjoy having this script handy, and if I blog post it I won’t have to re-write it later!

Update: I’ve updated this script to include a select which provides a sample result-set, useful if working with embedding temp tables into stored procedures to be executed by SSIS.

SELECT
    TABLE_NAME
    ,COLUMN_NAME
    ,DATA_TYPE
    ,NUMERIC_PRECISION
    ,IS_NULLABLE
    ,COLUMN_NAME + ' ' + DATA_TYPE +
        (CASE
            WHEN C.DATA_TYPE LIKE '%CHAR%'
            THEN ' (' + CONVERT(VARCHAR(255),COALESCE(CHARACTER_MAXIMUM_LENGTH,0)) + ')'
            WHEN (C.DATA_TYPE LIKE '%DECIMAL%' OR C.DATA_TYPE LIKE '%NUMERIC%')
            THEN ' (' + CONVERT(VARCHAR(255),COALESCE(NUMERIC_PRECISION,0)) + ',' + CONVERT(VARCHAR(255),COALESCE(NUMERIC_SCALE,0)) + ')'
            ELSE ''
            END
        ) + ' ' +
        (CASE WHEN IS_NULLABLE = 'NO' THEN 'NOT NULL' ELSE 'NULL' END) [Definition]
    ,(CASE
        WHEN DATA_TYPE LIKE '%CHAR%' THEN 'CAST('''' AS ' + DATA_TYPE + '(' + CONVERT(VARCHAR(255),CHARACTER_MAXIMUM_LENGTH) + ')) [' + COLUMN_NAME + '],'
        WHEN DATA_TYPE LIKE '%INT' THEN 'CAST(0 AS ' + DATA_TYPE + ') [' + COLUMN_NAME + '],'
        WHEN DATA_TYPE LIKE '%DATE%' THEN 'CAST(0 AS ' + DATA_TYPE + ') [' + COLUMN_NAME + '],'
        WHEN DATA_TYPE LIKE 'DECIMAL' THEN 'CAST(0 AS ' + DATA_TYPE + ') [' + COLUMN_NAME + '],'
        WHEN DATA_TYPE LIKE 'FLOAT' THEN 'CAST(0 AS ' + DATA_TYPE + '(' + CONVERT(VARCHAR(255),NUMERIC_PRECISION) + ')) [' + COLUMN_NAME + '],'
        WHEN DATA_TYPE LIKE 'UNIQUEIDENTIFIER' THEN 'newid() [' + COLUMN_NAME + '],'
        ELSE NULL END) [SampleSelect]
    ,*
FROM tempdb.INFORMATION_SCHEMA.COLUMNS C
WHERE TABLE_NAME LIKE '%#TempTableName%'

One response so far

Feb 15 2013

Alternate SSIS Lookup Pattern – Merge Join

Published by under SSIS

Just about everyone is familiar with the standard pattern inside of a data flow within the SSIS application, however there are some alternate patterns available to help in different situations where the volumes of data grow too large to work with by the traditional means. One issue which occurs is when your lookup query is pulling out large amounts of data, it can significantly slow processing as all of the data is cached into SSIS memory.

One way to quickly resolve this problem is by use of the merge join transformation rather than using a lookup. This has two major advantages over standard lookup processing and one very significant drawback. The first advantage is that use of the merge join operation is a partially blocking transaction, which allows buffer pages to move past it before all of the data is fully cached on one side like a lookup. The second advantage is that the merge join does not need to have its data fully cached before the data flow executes, which allows preceding operations unrelated to the merge join to continue while the data caches. The drawback, of course, is that merge joins are significantly harder to develop.

The intention of this post is to provide a quick and simple walkthrough of the steps that need to be taken to make sure the merge join is effective. First of all, this approach is intended only for cases where the data being merged is large enough to be a significant factor. In most situations, the lookup will far outperform the merge operation. Second, the data must be sorted when it comes into the merge join. If you are stopping your data flow with a sort task, then you are putting in a fully blocking transformation which will significantly impact your performance.

On the topic of the data being sorted, in this example I am using two OLE DB data sources and I was able to include ORDER BY clauses in each query to produce results which are pre-sorted according to the columns which I will use in my merge join operation. However, the next step for configuring the source objects is to change their advanced properties. Even though both queries include ORDER BY statements, the data flow itself is not marked as sorted by default which will generate this error when you attempt to configure the merge join:

Alternate SSIS Lookup Pattern Example 01

To correct this, we have to tell SSIS that our data has already been sorted. Right click upon the OLE DB Source object and Show Advanced Editor. Then select the Input and Output Properties Tab. Selecting the OLE DB Source Output in the pick list will show you the common properties, as well as the IsSorted property which is False by default. This needs to be changed to true.

Alternate SSIS Lookup Pattern Example 02

At this point we’re halfway there for properly configuring the source. The next step is to tell SSIS which columns the data is sorted on, which will be used later as the basis for the merge operation. Expanding the OLE DB Source Output followed by the output columns will show you the columns which are included in the data flow. Selecting the column will display the properties which include the SortKeyPosition.

Alternate SSIS Lookup Pattern Example 03

For this property, 0 is defined as unsorted. The columns the data is sorted on is indicated by declaring the first sort row as 1, the next as 2, and so on with no overlapping numbers. Once you have completed this for both sources of pre-sorted data, you should be able to complete the merge join configuration. Dragging the arrow from one of the data sources will prompt you to select which side of the merge join the data will represent. It will not prompt you for the second data source as it will assume it is the opposite side.

Alternate SSIS Lookup Pattern Example 04

Opening the merge join transformation at this point should display the transformation editor window. If one of the steps was missed or performed improperly, it will display an appropriate error. You should have something appearing similar to this. The Merge Join object displays an error because it has not been configured yet.

Alternate SSIS Lookup Pattern Example 05

If your column names and sort positions are identical, at this point the editor will attempt to draw your join conditions for you. The following screenshot shows the transformation editor immediately after opening with no modifications made yet

Alternate SSIS Lookup Pattern Example 06

For this example, the join conditions do not need to be modified. The only thing that needs to be done is to select the output columns that I wish to be output from the transformation. This is done by checking the boxes to the left of the column name or selecting the check box in the header to select all of the columns from that side of the transformation. Because the match columns will always be equal, they do not need to be selected from both sides. For this example, I am executing a simple lookup and adding the reference column from the right side to my query results.

Alternate SSIS Lookup Pattern Example 07

At this point you should have a fully functioning merge join based lookup solution put into place. This can also be used as a reference for standard merge join operations. Please note that because the output of the merge join operation is not sorted, multiple successive merge joins cannot be used without introducing a full blocking sort transformation.

Alternate SSIS Lookup Pattern Example 08

Good luck!

No responses yet

Feb 15 2013

Welcome to the blog

Published by under General

People have been telling me for a long time that I ought to be writing a blog, so here we go.

One response so far