MERGE in SQL Server 2008

MERGE!  You’ve arrived just in time to help me with my ETL operations.

Introduced in SQL Server 2008, MERGE lets one update one table by inserting, updating, or deleting fields as appropriate based on a join with another table.  I recognized quickly that the command was clearly what I needed in order to synchronize the dimension tables of our data warehouse between our development, staging, and production servers.  We recently converted our data warehouse to use foreign keys between our fact and dimension tables — yes, that’s right, there were no foreign keys in our data warehouse — and the old method of deleting and replacing the entire dimension table when updating a dimension on our staging or production server was simply not appropriate anymore.  I needed “upsert” capability, a way to insert and update only the things that changed between the dimension tables.

Enter the MERGE command.  Here it is in action, as I use it to update a dimension (proprietary information removed and/or changed):

merge dimHospital as target
 using development_db.dbo.dimHospital src
 on src.hospital_key = target.hospital_key

 when not matched by target then
 insert (
 hospital_key,
 full_name,
 addr,
 city,
 state_abbr,
 zip_cde
 )
 values (
 src.hospital_key,
 src.full_name,
 src.addr,
 src.city,
 src.state_abbr,
 src.zip_cde
 ) 

 when matched and (
 target.full_name <> src.full_name
 or target.addr <> src.addr
 or target.city <> src.city
 or target.state_abbr <> src.state_abbr
 or target.zip_cde <> src.zip_cde
 ) then
 update set
 full_name = src.full_name
 ,city = src.city
 ,addr = src.addr
 ,state_abbr = src.state_abbr
 ,zip_cde = src.zip_cde

 output 

 inserted.hospital_key,
 inserted.full_name,
 inserted.addr,
 inserted.city,
 inserted.state_abbr,
 inserted.zip_cde,

 deleted.hospital_key,
 deleted.full_name,
 deleted.addr,
 deleted.city,
 deleted.state_abbr,
 deleted.zip_cde

 $action, @merge_dt into dimHospital_Merge_History
 ;

Note the semicolon at the end there, that is actually a necessary part of the command.

So here’s what’s going on.  I’m telling SQL Server to modify my destination table (aliased as “target”) based on some source version of that table, comparing them based on some join condition (here using the primary key, hospital_key):

merge dimHospital as target
 using development_db.dbo.dimHospital src
 on src.hospital_key = target.hospital_key

If rows exist in the source table that do not exist in the destination table, we insert those rows into the destination table:

when not matched by target then
 insert (
 hospital_key,
 full_name,
 addr,
 city,
 state_abbr,
 zip_cde
 )
 values (
 src.hospital_key,
 src.full_name,
 src.addr,
 src.city,
 src.state_abbr,
 src.zip_cde
 )

Where both the source and destination tables rows exist, and there is at least one difference in the fields I check, I update the destination table values from the source table:

 when matched and (
	target.full_name <> src.full_name
	or target.addr <> src.addr
	or target.city <> src.city
	or target.state_abbr <> src.state_abbr
	or target.zip_cde <> src.zip_cde
	) then
  update set
   	 full_name = src.full_name
	,city = src.city
	,addr = src.addr
	,state_abbr = src.state_abbr
	,zip_cde = src.zip_cde

Just what I needed to do.  One can also delete rows in the destination table which do not match your source table, but that isn’t shown here because, this being a dimension table, I never delete rows.

But Wait, There’s More!

And now for the best part, the OUTPUT clause.

output 

 inserted.hospital_key,
 inserted.full_name,
 inserted.addr,
 inserted.city,
 inserted.state_abbr,
 inserted.zip_cde,

 deleted.hospital_key,
 deleted.full_name,
 deleted.addr,
 deleted.city,
 deleted.state_abbr,
 deleted.zip_cde

 $action, @merge_dt into dimHospital_Merge_History

As you can see, I am storing the values of the destination table prior to any change (the deleted table entries), and after any change (the inserted table entries), along with the merge action and the merge date into a custom “merge history” table.

That table was created at the top of my dimension update script via the following SQL:

if not exists(select *
 from information_schema.tables
 where table_name = 'dimHospital_Merge_History')
 begin
 create table dimHospital_Merge_History (

 inserted_hospital_key int null,
 inserted_full_name varchar(60) null,
 inserted_addr varchar(40) null,
 inserted_city varchar(30) null,
 inserted_state_cde char(2) null,
 inserted_zip_cde char(5) null,

 deleted_hospital_key int null,
 deleted_full_name varchar(60) null,
 deleted_addr varchar(40) null,
 deleted_city varchar(30) null,
 deleted_state_cde char(2) null,
 deleted_zip_cde char(5) null,

 merge_action varchar(20) not null, merge_dt datetime not null default getdate()
 )

AND, I made sure that it was also initially populated with the existing values of the destination table prior to the very first merge operation (by putting it inside the if not exists clause shown started above):

insert dimHospital_Merge_History (

 inserted_hospital_key,
 inserted_full_name,
 inserted_addr,
 inserted_city,
 inserted_state_cde,
 inserted_zip_cde,

 merge_action, merge_dt

 )

 select     

 hospital_key,
 full_name,
 addr,
 city,
 state_cde,
 zip_cde,
 'BASELINE', @merge_dt

 from dimHospital

Using this information I should be able to recreate the dimension table at any point in its history, and know exactly what changed along the way.  I haven’t done it yet, but it is kind of neat to think that it can be done. I do know that the question of what changed at what time in our dimension tables comes up often from our analyst, and the history obtained from the OUTPUT clause should  prove useful in the future.

Posted in T-SQL | Comments Off on MERGE in SQL Server 2008

What, it’s December?

Well, so much for getting those November posts in. How do people make the time to do this? Still working on the MERGE post, finding some more interesting things about that SQL command, worth writing about. Here’s hoping it happens in December. More likely in the New Year, though…

Posted in Personal | Comments Off on What, it’s December?

SSIS Variables as Expressions Gotcha

SSIS work lesson of the morning:  don’t forget to set the EvaluateAsExpression property to True on your variable when you have an expression to represent the variable value.

I had three variables representing similar SELECT statements to be used in data flow sources, so I copied and pasted the expressions from one variable into the others, and tweaked them as required.  However, on one of the variables I forgot to set that EvaluateAsExpression property as well, being engrossed in the minor modifications of the similar Expression properties, I guess.  About a half an hour wasted debugging that little miss.

Setting this property seems to be an unnecessary extra step if you’ve got an an expression in the Expression property, but, that’s SSIS for you.

Posted in SSIS | Comments Off on SSIS Variables as Expressions Gotcha

Why I Write Software

I’m going to start this blog out with a story I wrote describing one of my primary motivations to write software, which is to make life easier.

While I work primarily in the database arena now, I was a developer for 20+ years, working in C, C++, Visual Basic, C#, PHP, Javascript, Perl, AWK, FoxPro — basically whatever was available and appropriate at the time for the task.   This is the story of some software I crafted once, with an emphasis on why I did it.

Once upon a time, I worked for the family business. Now, when you work for a family business, particularly when you are involved in the management and finances of the business, you gain a real appreciation for your good employees. Through their hard work, your family benefits. And when your family business is a billing service company with embarrassingly antiquated tools and procedures, your good employees work too hard too often. You want to give something back. Not more salary or a gift card and a pat on the back, but something that matters, like more hours in the day, less demands on their time, a clear head, their sanity.

Well, in an effort to achieve this — and also admittedly at the insistence of our largest client, who quite literally pounded the table about this to us — we decided to buy and install a business intelligence system. This would let customers and our account executives answer the increasingly frequent queries about their billing results without having to burden the already overburdened IT department.

Well, it turned out that the system we bought (not one of the major vendors) was absolutely, positively horrible. I mean horrible. We spent months trying to implement it, finding bug after bug after bug in the thing along the way. The account executives had absolutely no confidence in it, and working with it, at the insistence of management, became yet another burden on their time.

Being primarily responsible for the horrible decision to buy this horrible product, which was making work even more horrible for our account executives, I felt, well, horrible.

So here’s what I did: I wrote a program. (Did you expect something else?) Actually it was more like a series of Perl, PHP, and SQL scripts, but the end result was a web page which let the account executives easily create simple queries which, lo and behold, would answer questions about the billing results without burdening the already overburdened IT department.

And here’s the kicker — it really worked. A click here, a click there, and violá, answers. The joy and relief specifically on one of our account executive’s face that this little effort produced is an image that still motivates me to produce software today. For me, that’s what it is all about, creating something that is easy to use and gets results, software that helps out, makes life easier, that gives back.

There it is, what motivates me to write software.  Being out of the family business now, I must admit to less of a motivation to improve operations through software.  However, I’ve found that everywhere I work there is at least one person working unnecessarily hard, struggling through some process that could be made easier via an appropriate program here or there.  I always find that I like attacking those problems best.  Software can make life easier, and that’s a big reason why I do what I do.

Posted in Personal | Comments Off on Why I Write Software

Welcome

Welcome to my blog, wherein you will find thoughts about my professional aspirations, activities, failures, and successes, and a few personal notes thrown in along the way.

Posted in Uncategorized | Comments Off on Welcome