The 2016 Ambitions

Well, it is day 2 of  a new year, and I’ve already oversubscribed my time and expanded my ambitions.  But hey, it’s go big or go home, right?

So, the first priority on my 2016 list is to finish up the 2015 list.  I’ve got two certifications in the works.   I’m 2/3 of the way through the Microsoft SQL Server 2012 MCSA certification, with my 3rd test scheduled for March.   I am 6/10 of the way through the Coursera Data Science specialization, and signed up for course 7 today, Regression Models.  I’m hoping this will dovetail nicely with another course I signed up for recently, Mastering Data Analytics in Excel, since they are both talking about linear regression.  I really should not be doing that Excel class, but I have to say, I like the idea of using Excel to analyze data, and it looks like they are teaching some useful predictive analytics there.

In 2015, I successfully restarted the Pittsburgh Business Intelligence Meetup Group, and in 2016 I plan to keep that group moving.  Seeking presenters — any takers?

And then, there’s Big Data.  Based on a growing interest in this at work, and Ralph Kimball’s indication that there is a place for this in data warehousing, I hope to get some practical experience with Spark and Scala in 2016.

Here’s a picture of some of the books I currently have out of the library.  As you can see, I have some non-professional ambitions as well.  I just can’t give up my love of vegetable gardening and hiking.  Last year, Nancy and I successfully started veggies from seed, and hiked a bit of the Laurel Highlands trail.  We may do the whole thing this summer, which means I have to really up the gym attendance.  Been doing OK in 2015, but I have learned that a good morning workout fuels a productive day, so I hope to improve my attendance in 2016.

IMG_1308

Finally, the blog.  Here’s my annual promise to keep posting to the blog!  Stay tuned…

Posted in Personal | Comments Off on The 2016 Ambitions

Thanks for the Help with Deadlocks

This is a copy of a blog post I put on my SQLServerCentral blog site “Adventures with Data”.  Really have to figure out that syndication thing someday.

For a time there a few weeks ago (been meaning to post this for a while now), I saw more deadlocks than I’ve seen in my entire career prior to those odd weeks. Every day we had a few of them, and being unfamiliar with dealing with them, I had to do some research on diagnosing them. I found some great info out there that I wanted to summarize here. Many thanks to the SQL Server community for taking the time to put this stuff out there.

As I started Googling deadlocks, one of the first articles I latched onto was “The Difficulty with Deadlocks” by Jeremiah Peschka. As I find that the Brent Ozar Unlimited site is always a good place to find clear and concise articles, I started there. This led me to the referenced article by Jonathan Kehayias “Retrieving Deadlock Graphs with SQL Server 2008 Extended Events”, containing some very useful code to extract out the deadlock graph in XML format for all the deadlocks that had happened over the last few days.

Now, having the XML representation of the deadlock graph is one thing, interpreting it is another. Bart Duncan’s series on deadlocks (referenced by Jonathan’s article) was very helpful in this regard, and things started to make sense. But what was this Sch-M lock we were seeing? Turns out the Sch-M is a “schema modification lock”, and I have to agree with Michael J. Swart’s article, schema modification locks are indeed evil. They are essentially incompatible with every other lock mode, and as this additionally useful article (and code) by Dmitri Korotkevitch indicates, “deadlock possibilities are endless” with them in the mix.

We were getting these schema modification locks while our foreign key constraints were being set to “not trusted” by our bulk insert code (SqlBulkCopy), and while we were trying to rectify this by reapplying the constraints WITH CHECK. Full circle to the Brent Ozar Unlimited site, here’s a nice article on how easy it is to have your constraints get untrusted, and why you want to address this.

So, after mostly backing out some code changes we had made recently, we got our deadlocks under control. For the record, the situation was a bit more complicated than I’ve indicated here, and included a connection auto-promoted to the serializable isolation level, some recently applied ON DELETE CASCADE constraints, and some manipulation of the indexes. All of which, though, we could diagnose after being able to obtain and decipher (to a fashion, they still aren’t obvious!) the deadlock graphs through extended events.

Many thanks to you all for time and effort you put in to provide the useful information in all your posts. Below is a summary of the links I found useful, in case you find yourself suddenly in need to get up to speed on deadlocks like I did. I additionally want to recommend Kalen Delaney’s excellent eBook “SQL Server Concurrency”, available for free off the Red Gate site, for learning more about locking, blocking, and deadlocks.

eBooks:
Kalen Delaney’s book on SQL Server Concurrency
http://www.red-gate.com/products/dba/sql-monitor/entrypage/sql-server-concurrency-ebook

Links:
http://www.brentozar.com/archive/2011/07/difficulty-deadlocks
(led to the below article)
http://www.sqlservercentral.com/articles/deadlock/65658/
(led to the below article)
http://blogs.msdn.com/b/bartd/archive/2006/09/09/deadlock-troubleshooting_2c00_-part-1.aspx
(saw Sch-M locks and found)
http://michaeljswart.com/2013/04/the-sch-m-lock-is-evil/
http://aboutsqlserver.com/2012/04/05/locking-in-microsoft-sql-server-part-13-schema-locks/

MSDN Lock Modes Descriptions
http://msdn.microsoft.com/en-us/library/ms175519(v=sql.105).aspx

Posted in SQL Server | Comments Off on Thanks for the Help with Deadlocks

Passed Again

Passed the 70-463 exam last week, making the Florida vacation that immediately followed that much nicer.  Again, I found that a good amount of detail on a broad variety of topics was needed.  I did like the training kit book: Implementing a Data Warehouse with Microsoft SQL Server 2012, but supplemental material from MSDN was needed as well.  And again, enjoyed expanding my knowledge of an area I work in daily, so more confirmation that I’m in the right field.

Exam 70-462 (Administering Microsoft SQL Server 2012) is next, due up in May.  I hear that the training kit is not a good reference, but I’ll start there anyway.  Hope to have that MCSA by the time summer is in full swing, which will make summer that much nicer too.

 

Posted in SQL Server | Comments Off on Passed Again

Passed!

Passed the 70-461 exam today, step one on my goal for the MCSE: Business Intelligence.  Man, that thing was way harder than the practice tests I took.  I purchased the book “Training Kit (Exam 70-461) Querying Microsoft SQL Server 2012” to study for the exam, and it comes with a CD with practice exams by MeasureUp.  I had felt that the practice exams were a bit too easy to be really indicative of how the test should be, and I was correct in that belief.  The exam questions were more involved and the answers much pickier, so the actual test was a challenge.  I was quite glad to have passed it, I didn’t want to have to start over again. Lesson learned:  for the next test,  I will be studying a wide range of topics in great detail.

The book, by the way, is excellent.  Thoroughly enjoyed reading it — a sign I’m in the right field — and I’m sure I will continue to refer to it in my career.  Well done, gentlemen.

On to the 70-463!  (Saving the 70-462 for later, I am more of a developer and architect than a DBA).

 

Posted in SQL Server, Uncategorized | Comments Off on Passed!

MERGE and DELETE

Have you used

WHEN NOT MATCHED BY Source
THEN
  DELETE

much in your work with MERGE?

I’ve been using SQL Server’s MERGE statement since a little before my previous post on it, but I never used the clause above much at all.

Recently I had an opportunity to use it, and I found some unanticipated behavior.  And since our architect said “why is it doing that?” when I showed him what I found, I don’t feel too bad about posting this here.

The scenario:  my goal was to merge in three sets of data from three different hospitals into one fact table, including deleting records in the fact table that were not found in the source files, since each of the source files was a “complete refresh” of that hospital’s data.  The delete part seemed like a good application of the WHEN NOT MATCHED BY Source, right?

Indeed it was, but I discovered quickly that there were some specific things to be careful about when using this clause, as written above at least:

  • If your source is empty, you’ll delete all the data in your fact table.  This is a real bummer when this happens. <sad trombone>
  • If your source is one of multiple sources going to the destination fact table, and you use the clause above as is, your delete will delete all the other data in the fact table.  This is less of a bummer than deleting all your data, but still worthy of a <sad trombone>.

I tripped over the first item when I initially tested my changes and had put the source files in the wrong directory.  In our system, there is an initial truncate of the staging table, a load to it from the source files — which inserted zero rows, since it couldn’t find the source files — and then the merge.  The merge, containing the clause

WHEN NOT MATCHED BY Source
THEN
  DELETE

happily deleted every record that existed in the fact table (the merge target) that it did not have in the empty staging table (the merge source) — that is, all of them.

Fortunately, this is easily remedied by capturing the size of your source and either avoiding the merge if it is empty, or augmenting the WHEN NOT MATCHED BY Source clause.  I chose the latter, since it fit into our standard merge stored procedures more cleanly:

WHEN NOT MATCHED BY Source AND @SourceRowCount > 0
THEN
  DELETE

where @SourceRowCount was populated previously by simply doing a COUNT(*) on the staging table.

I put that in and tested it, since I was sure I’d make that mistake with the sources again (it’s good to know yourself), and all was good.  Then I put the three source files — one for each hospital — into the correct location, ran it again and bam! I found I only had the last hospital’s data in the fact table when I was done.

Hmm.  Being used to using MERGE solely for upserts, it took a while to grok, but it did make sense eventually — just like the empty merge source situation, it will delete all records that exist in the target but aren’t found in the source.

So my truncate fact table, load hospital A, load hospital B, load hospital C process ended up leaving only hospital C’s data in the fact table, since hospitals A and B’s records were not in hospital C’s file, of course.

How to remedy this?  I had to specify that it should only delete records corresponding to the hospital it was currently working on.  Unfortunately, there is no way to append a WHERE clause on the DELETE operation specifically, but augmenting the WHEN NOT MATCHED BY Source clause achieves the same goal.  I knew what hospital I was merging in, and so putting that into the clause:

WHEN NOT MATCHED BY Source 
   AND @SourceRowCount > 0 
   AND Target.hospital = @currentHospital
THEN
  DELETE

seemed the way to go.  You can see that this addition effectively restricts the DELETE to operating on only those records in the fact table corresponding to the hospital you specify by @currentHospital, similar to putting a WHERE clause on the DELETE.

And indeed, it did work.  I hope by putting this up here that I’ve saved you some head-scratching if you’re running into the same questions about using DELETE with MERGE.

 

 

 

Posted in SQL Server | Comments Off on MERGE and DELETE

Starting up again

What he said:  George Woods.

George and I were talking about blogging after finding Colleen Morrow’s excellent blog again recently.   We both agreed that blogging is a good thing to do, yet difficult to keep current.  I’ve got a number of  goals for 2015 already, but I’m adding regular blogging to it now.

Other goals for 2015:  MCSE in Business Intelligence, and the Data Science Specialization at Coursera.  I’ve got a good start on them already, as I’m scheduled to take the 70-461 exam on December 30th, and I have four of the nine courses for the specialization already completed.

So stay tuned, I plan to share the experience with all you regular readers (thanks, Mom and Dad!), and anyone else who wants to listen.  I’ll try and keep it interesting.

George, you’re no longer one up on me!

Posted in Personal, SQL Server | Comments Off on Starting up again

SQL Saturday #171 — The Schedule is Up

SQL Saturday #171 Pittsburgh preparations continue, and the schedule has been posted for your review. If you’re anywhere in the area on October 6th, please join us at La Roche College for some great free SQL training.

Personally, I’ve had to curtail my activities with the planning committee due to starting a class, but everyone else is going strong, and it is all shaping up well. At work I’ve been digging into Data Quality Services and Master Data Services lately, so I’m particularly looking forward to Rushabh Mehta’s “Intro to DQS” and Carlton Ramsey’s “Intro to MDS” sessions. I’m also trying to get up to speed on PowerPivot and PowerView, so Dan Clark’s “Building your First Tabular Model” will be of interest to me as well. There is a whole SQL 2012 track at the event, and we have two Administration tracks as well. Should be great — please join us if you can!

Pittsburgh, Pennsylvania. Photo:Herbert Spencer, used under Creative Commons License.

P.S. My last post about ComputeReach? Turns out that the girlfriend is going along with those computers to Ghana, to do installs and training. The opportunity came on suddenly, and it’s all happening very quickly. Had to laugh when she showed up at my house the other day with half a dozen band-aids crisscrossed on each arm after getting all the required inoculations all at once! To say she’s excited about the trip would be an understatement…

Posted in Uncategorized | Comments Off on SQL Saturday #171 — The Schedule is Up

ComputeReach

Man it’s tough to get in all that you want to do, isn’t it? I’ve been trying to get my blogging activities rebooted, and do some volunteer work as well. I’m starting to combine the two by volunteering on the SQL Saturday Committee for SQL Saturday #171 Pittsburgh (October 6th, come join us!), using the blog as one vehicle to get the word out about the event.

The real purpose of this blog entry, though, is to take a moment to talk about another volunteer activity I participated in recently, one that I think most of us in the tech world can appreciate.

In late June I spent a Saturday working with ComputeReach, a Pittsburgh-based humanitarian organization whose mission is to repurpose used computers for those less fortunate. In addition to putting computers into the hands of those who might not normally have access to them, it properly recycles computers that it cannot use, keeping those computers from potentially ending up in landfills, where they can cause environmental issues at the very least.

On this particular Saturday a big group of people — including individuals from the Pittsburgh office of Google, the Western PA Linux User’s Group, and ITT Tech — came together to repurpose approximately 180 Dell Optiplex computers. They were updated with expanded RAM and fresh installations of Edubuntu Linux, boxed and packaged for shipping. It was a hot, hot day to be working in a warehouse, moving, imaging, and repackaging computers, but everyone was enthusiastic and did a great job. For a video of the volunteers in action, see this link. Roll call is at the end of the video, where everyone introduces themselves. I’m the first one up, caught a bit off guard (what, me first? :-)).

While ComputeReach primarily serves impoverished communities in the US, thanks to a generous donation from Google, many of these computers will be going to the The Street Academy in Ghana, which provides education to children living on the streets in Accra, the capital city of Ghana.

If you are interested in learning more about ComputeReach, please contact Dave Sevick (b|t) at dave@computereach.org.

Posted in Personal | Comments Off on ComputeReach

Update

Well, the owl has flown the coop, there will be no nestcam.   I suspect the owl found a mate and is nesting somewhere else more appropriate.  I’m planning to build a bigger nest box over the summer, and hopefully next season will have some exciting images to share!

Professional update:  getting involved in the local SQL Server user’s group, which is supposed to be getting off the ground again after some years in limbo.  Hope to present at one of the meetings over the summer.  Data mining proof-of-concept work has been interesting, but there is still much to learn.  Currently working with the NPI data to improve our product — it’s already produced some new revenue, so that’s good.  Unfortunately, implementing it involves my doing some VB.NET programming (the VB being the unfortunate part), which is not where I want to go, but if it improves the product and keeps us in the game, I don’t mind doing that for a while.  Do hope to get back on that data mining soon, though.

Posted in Personal | Comments Off on Update

MERGE in SQL Server 2008

MERGE!  You’ve arrived just in time to help me with my ETL operations.

Introduced in SQL Server 2008, MERGE lets one update one table by inserting, updating, or deleting fields as appropriate based on a join with another table.  I recognized quickly that the command was clearly what I needed in order to synchronize the dimension tables of our data warehouse between our development, staging, and production servers.  We recently converted our data warehouse to use foreign keys between our fact and dimension tables — yes, that’s right, there were no foreign keys in our data warehouse — and the old method of deleting and replacing the entire dimension table when updating a dimension on our staging or production server was simply not appropriate anymore.  I needed “upsert” capability, a way to insert and update only the things that changed between the dimension tables.

Enter the MERGE command.  Here it is in action, as I use it to update a dimension (proprietary information removed and/or changed):

merge dimHospital as target
 using development_db.dbo.dimHospital src
 on src.hospital_key = target.hospital_key

 when not matched by target then
 insert (
 hospital_key,
 full_name,
 addr,
 city,
 state_abbr,
 zip_cde
 )
 values (
 src.hospital_key,
 src.full_name,
 src.addr,
 src.city,
 src.state_abbr,
 src.zip_cde
 ) 

 when matched and (
 target.full_name <> src.full_name
 or target.addr <> src.addr
 or target.city <> src.city
 or target.state_abbr <> src.state_abbr
 or target.zip_cde <> src.zip_cde
 ) then
 update set
 full_name = src.full_name
 ,city = src.city
 ,addr = src.addr
 ,state_abbr = src.state_abbr
 ,zip_cde = src.zip_cde

 output 

 inserted.hospital_key,
 inserted.full_name,
 inserted.addr,
 inserted.city,
 inserted.state_abbr,
 inserted.zip_cde,

 deleted.hospital_key,
 deleted.full_name,
 deleted.addr,
 deleted.city,
 deleted.state_abbr,
 deleted.zip_cde

 $action, @merge_dt into dimHospital_Merge_History
 ;

Note the semicolon at the end there, that is actually a necessary part of the command.

So here’s what’s going on.  I’m telling SQL Server to modify my destination table (aliased as “target”) based on some source version of that table, comparing them based on some join condition (here using the primary key, hospital_key):

merge dimHospital as target
 using development_db.dbo.dimHospital src
 on src.hospital_key = target.hospital_key

If rows exist in the source table that do not exist in the destination table, we insert those rows into the destination table:

when not matched by target then
 insert (
 hospital_key,
 full_name,
 addr,
 city,
 state_abbr,
 zip_cde
 )
 values (
 src.hospital_key,
 src.full_name,
 src.addr,
 src.city,
 src.state_abbr,
 src.zip_cde
 )

Where both the source and destination tables rows exist, and there is at least one difference in the fields I check, I update the destination table values from the source table:

 when matched and (
	target.full_name <> src.full_name
	or target.addr <> src.addr
	or target.city <> src.city
	or target.state_abbr <> src.state_abbr
	or target.zip_cde <> src.zip_cde
	) then
  update set
   	 full_name = src.full_name
	,city = src.city
	,addr = src.addr
	,state_abbr = src.state_abbr
	,zip_cde = src.zip_cde

Just what I needed to do.  One can also delete rows in the destination table which do not match your source table, but that isn’t shown here because, this being a dimension table, I never delete rows.

But Wait, There’s More!

And now for the best part, the OUTPUT clause.

output 

 inserted.hospital_key,
 inserted.full_name,
 inserted.addr,
 inserted.city,
 inserted.state_abbr,
 inserted.zip_cde,

 deleted.hospital_key,
 deleted.full_name,
 deleted.addr,
 deleted.city,
 deleted.state_abbr,
 deleted.zip_cde

 $action, @merge_dt into dimHospital_Merge_History

As you can see, I am storing the values of the destination table prior to any change (the deleted table entries), and after any change (the inserted table entries), along with the merge action and the merge date into a custom “merge history” table.

That table was created at the top of my dimension update script via the following SQL:

if not exists(select *
 from information_schema.tables
 where table_name = 'dimHospital_Merge_History')
 begin
 create table dimHospital_Merge_History (

 inserted_hospital_key int null,
 inserted_full_name varchar(60) null,
 inserted_addr varchar(40) null,
 inserted_city varchar(30) null,
 inserted_state_cde char(2) null,
 inserted_zip_cde char(5) null,

 deleted_hospital_key int null,
 deleted_full_name varchar(60) null,
 deleted_addr varchar(40) null,
 deleted_city varchar(30) null,
 deleted_state_cde char(2) null,
 deleted_zip_cde char(5) null,

 merge_action varchar(20) not null, merge_dt datetime not null default getdate()
 )

AND, I made sure that it was also initially populated with the existing values of the destination table prior to the very first merge operation (by putting it inside the if not exists clause shown started above):

insert dimHospital_Merge_History (

 inserted_hospital_key,
 inserted_full_name,
 inserted_addr,
 inserted_city,
 inserted_state_cde,
 inserted_zip_cde,

 merge_action, merge_dt

 )

 select     

 hospital_key,
 full_name,
 addr,
 city,
 state_cde,
 zip_cde,
 'BASELINE', @merge_dt

 from dimHospital

Using this information I should be able to recreate the dimension table at any point in its history, and know exactly what changed along the way.  I haven’t done it yet, but it is kind of neat to think that it can be done. I do know that the question of what changed at what time in our dimension tables comes up often from our analyst, and the history obtained from the OUTPUT clause should  prove useful in the future.

Posted in T-SQL | Comments Off on MERGE in SQL Server 2008