Trees of Pittsburgh

I recently found a bunch of publicly available data on the Western Pennsylvania Regional Data Center’s website, including this neat set of data on the City of Pittsburgh Trees.   I was able to cobble together a basic report in Power BI with the data, which I present below.  It’s not the best visualization, but it was a nice exercise.   Enjoy.

Posted in Uncategorized | Comments Off on Trees of Pittsburgh

The 2016 Ambitions

Well, it is day 2 of  a new year, and I’ve already oversubscribed my time and expanded my ambitions.  But hey, it’s go big or go home, right?

So, the first priority on my 2016 list is to finish up the 2015 list.  I’ve got two certifications in the works.   I’m 2/3 of the way through the Microsoft SQL Server 2012 MCSA certification, with my 3rd test scheduled for March.   I am 6/10 of the way through the Coursera Data Science specialization, and signed up for course 7 today, Regression Models.  I’m hoping this will dovetail nicely with another course I signed up for recently, Mastering Data Analytics in Excel, since they are both talking about linear regression.  I really should not be doing that Excel class, but I have to say, I like the idea of using Excel to analyze data, and it looks like they are teaching some useful predictive analytics there.

In 2015, I successfully restarted the Pittsburgh Business Intelligence Meetup Group, and in 2016 I plan to keep that group moving.  Seeking presenters — any takers?

And then, there’s Big Data.  Based on a growing interest in this at work, and Ralph Kimball’s indication that there is a place for this in data warehousing, I hope to get some practical experience with Spark and Scala in 2016.

Here’s a picture of some of the books I currently have out of the library.  As you can see, I have some non-professional ambitions as well.  I just can’t give up my love of vegetable gardening and hiking.  Last year, Nancy and I successfully started veggies from seed, and hiked a bit of the Laurel Highlands trail.  We may do the whole thing this summer, which means I have to really up the gym attendance.  Been doing OK in 2015, but I have learned that a good morning workout fuels a productive day, so I hope to improve my attendance in 2016.

IMG_1308

Finally, the blog.  Here’s my annual promise to keep posting to the blog!  Stay tuned…

Posted in Personal | Comments Off on The 2016 Ambitions

Thanks for the Help with Deadlocks

This is a copy of a blog post I put on my SQLServerCentral blog site “Adventures with Data”.  Really have to figure out that syndication thing someday.

For a time there a few weeks ago (been meaning to post this for a while now), I saw more deadlocks than I’ve seen in my entire career prior to those odd weeks. Every day we had a few of them, and being unfamiliar with dealing with them, I had to do some research on diagnosing them. I found some great info out there that I wanted to summarize here. Many thanks to the SQL Server community for taking the time to put this stuff out there.

As I started Googling deadlocks, one of the first articles I latched onto was “The Difficulty with Deadlocks” by Jeremiah Peschka. As I find that the Brent Ozar Unlimited site is always a good place to find clear and concise articles, I started there. This led me to the referenced article by Jonathan Kehayias “Retrieving Deadlock Graphs with SQL Server 2008 Extended Events”, containing some very useful code to extract out the deadlock graph in XML format for all the deadlocks that had happened over the last few days.

Now, having the XML representation of the deadlock graph is one thing, interpreting it is another. Bart Duncan’s series on deadlocks (referenced by Jonathan’s article) was very helpful in this regard, and things started to make sense. But what was this Sch-M lock we were seeing? Turns out the Sch-M is a “schema modification lock”, and I have to agree with Michael J. Swart’s article, schema modification locks are indeed evil. They are essentially incompatible with every other lock mode, and as this additionally useful article (and code) by Dmitri Korotkevitch indicates, “deadlock possibilities are endless” with them in the mix.

We were getting these schema modification locks while our foreign key constraints were being set to “not trusted” by our bulk insert code (SqlBulkCopy), and while we were trying to rectify this by reapplying the constraints WITH CHECK. Full circle to the Brent Ozar Unlimited site, here’s a nice article on how easy it is to have your constraints get untrusted, and why you want to address this.

So, after mostly backing out some code changes we had made recently, we got our deadlocks under control. For the record, the situation was a bit more complicated than I’ve indicated here, and included a connection auto-promoted to the serializable isolation level, some recently applied ON DELETE CASCADE constraints, and some manipulation of the indexes. All of which, though, we could diagnose after being able to obtain and decipher (to a fashion, they still aren’t obvious!) the deadlock graphs through extended events.

Many thanks to you all for time and effort you put in to provide the useful information in all your posts. Below is a summary of the links I found useful, in case you find yourself suddenly in need to get up to speed on deadlocks like I did. I additionally want to recommend Kalen Delaney’s excellent eBook “SQL Server Concurrency”, available for free off the Red Gate site, for learning more about locking, blocking, and deadlocks.

eBooks:
Kalen Delaney’s book on SQL Server Concurrency
http://www.red-gate.com/products/dba/sql-monitor/entrypage/sql-server-concurrency-ebook

Links:
http://www.brentozar.com/archive/2011/07/difficulty-deadlocks
(led to the below article)
http://www.sqlservercentral.com/articles/deadlock/65658/
(led to the below article)
http://blogs.msdn.com/b/bartd/archive/2006/09/09/deadlock-troubleshooting_2c00_-part-1.aspx
(saw Sch-M locks and found)
http://michaeljswart.com/2013/04/the-sch-m-lock-is-evil/
http://aboutsqlserver.com/2012/04/05/locking-in-microsoft-sql-server-part-13-schema-locks/

MSDN Lock Modes Descriptions
http://msdn.microsoft.com/en-us/library/ms175519(v=sql.105).aspx

Posted in SQL Server | Comments Off on Thanks for the Help with Deadlocks

Passed Again

Passed the 70-463 exam last week, making the Florida vacation that immediately followed that much nicer.  Again, I found that a good amount of detail on a broad variety of topics was needed.  I did like the training kit book: Implementing a Data Warehouse with Microsoft SQL Server 2012, but supplemental material from MSDN was needed as well.  And again, enjoyed expanding my knowledge of an area I work in daily, so more confirmation that I’m in the right field.

Exam 70-462 (Administering Microsoft SQL Server 2012) is next, due up in May.  I hear that the training kit is not a good reference, but I’ll start there anyway.  Hope to have that MCSA by the time summer is in full swing, which will make summer that much nicer too.

 

Posted in SQL Server | Comments Off on Passed Again

Passed!

Passed the 70-461 exam today, step one on my goal for the MCSE: Business Intelligence.  Man, that thing was way harder than the practice tests I took.  I purchased the book “Training Kit (Exam 70-461) Querying Microsoft SQL Server 2012” to study for the exam, and it comes with a CD with practice exams by MeasureUp.  I had felt that the practice exams were a bit too easy to be really indicative of how the test should be, and I was correct in that belief.  The exam questions were more involved and the answers much pickier, so the actual test was a challenge.  I was quite glad to have passed it, I didn’t want to have to start over again. Lesson learned:  for the next test,  I will be studying a wide range of topics in great detail.

The book, by the way, is excellent.  Thoroughly enjoyed reading it — a sign I’m in the right field — and I’m sure I will continue to refer to it in my career.  Well done, gentlemen.

On to the 70-463!  (Saving the 70-462 for later, I am more of a developer and architect than a DBA).

 

Posted in SQL Server, Uncategorized | Comments Off on Passed!

MERGE and DELETE

Have you used

WHEN NOT MATCHED BY Source
THEN
  DELETE

much in your work with MERGE?

I’ve been using SQL Server’s MERGE statement since a little before my previous post on it, but I never used the clause above much at all.

Recently I had an opportunity to use it, and I found some unanticipated behavior.  And since our architect said “why is it doing that?” when I showed him what I found, I don’t feel too bad about posting this here.

The scenario:  my goal was to merge in three sets of data from three different hospitals into one fact table, including deleting records in the fact table that were not found in the source files, since each of the source files was a “complete refresh” of that hospital’s data.  The delete part seemed like a good application of the WHEN NOT MATCHED BY Source, right?

Indeed it was, but I discovered quickly that there were some specific things to be careful about when using this clause, as written above at least:

  • If your source is empty, you’ll delete all the data in your fact table.  This is a real bummer when this happens. <sad trombone>
  • If your source is one of multiple sources going to the destination fact table, and you use the clause above as is, your delete will delete all the other data in the fact table.  This is less of a bummer than deleting all your data, but still worthy of a <sad trombone>.

I tripped over the first item when I initially tested my changes and had put the source files in the wrong directory.  In our system, there is an initial truncate of the staging table, a load to it from the source files — which inserted zero rows, since it couldn’t find the source files — and then the merge.  The merge, containing the clause

WHEN NOT MATCHED BY Source
THEN
  DELETE

happily deleted every record that existed in the fact table (the merge target) that it did not have in the empty staging table (the merge source) — that is, all of them.

Fortunately, this is easily remedied by capturing the size of your source and either avoiding the merge if it is empty, or augmenting the WHEN NOT MATCHED BY Source clause.  I chose the latter, since it fit into our standard merge stored procedures more cleanly:

WHEN NOT MATCHED BY Source AND @SourceRowCount > 0
THEN
  DELETE

where @SourceRowCount was populated previously by simply doing a COUNT(*) on the staging table.

I put that in and tested it, since I was sure I’d make that mistake with the sources again (it’s good to know yourself), and all was good.  Then I put the three source files — one for each hospital — into the correct location, ran it again and bam! I found I only had the last hospital’s data in the fact table when I was done.

Hmm.  Being used to using MERGE solely for upserts, it took a while to grok, but it did make sense eventually — just like the empty merge source situation, it will delete all records that exist in the target but aren’t found in the source.

So my truncate fact table, load hospital A, load hospital B, load hospital C process ended up leaving only hospital C’s data in the fact table, since hospitals A and B’s records were not in hospital C’s file, of course.

How to remedy this?  I had to specify that it should only delete records corresponding to the hospital it was currently working on.  Unfortunately, there is no way to append a WHERE clause on the DELETE operation specifically, but augmenting the WHEN NOT MATCHED BY Source clause achieves the same goal.  I knew what hospital I was merging in, and so putting that into the clause:

WHEN NOT MATCHED BY Source 
   AND @SourceRowCount > 0 
   AND Target.hospital = @currentHospital
THEN
  DELETE

seemed the way to go.  You can see that this addition effectively restricts the DELETE to operating on only those records in the fact table corresponding to the hospital you specify by @currentHospital, similar to putting a WHERE clause on the DELETE.

And indeed, it did work.  I hope by putting this up here that I’ve saved you some head-scratching if you’re running into the same questions about using DELETE with MERGE.

 

 

 

Posted in SQL Server | Comments Off on MERGE and DELETE

Starting up again

What he said:  George Woods.

George and I were talking about blogging after finding Colleen Morrow’s excellent blog again recently.   We both agreed that blogging is a good thing to do, yet difficult to keep current.  I’ve got a number of  goals for 2015 already, but I’m adding regular blogging to it now.

Other goals for 2015:  MCSE in Business Intelligence, and the Data Science Specialization at Coursera.  I’ve got a good start on them already, as I’m scheduled to take the 70-461 exam on December 30th, and I have four of the nine courses for the specialization already completed.

So stay tuned, I plan to share the experience with all you regular readers (thanks, Mom and Dad!), and anyone else who wants to listen.  I’ll try and keep it interesting.

George, you’re no longer one up on me!

Posted in Personal, SQL Server | Comments Off on Starting up again

SQL Saturday #171 — The Schedule is Up

SQL Saturday #171 Pittsburgh preparations continue, and the schedule has been posted for your review. If you’re anywhere in the area on October 6th, please join us at La Roche College for some great free SQL training.

Personally, I’ve had to curtail my activities with the planning committee due to starting a class, but everyone else is going strong, and it is all shaping up well. At work I’ve been digging into Data Quality Services and Master Data Services lately, so I’m particularly looking forward to Rushabh Mehta’s “Intro to DQS” and Carlton Ramsey’s “Intro to MDS” sessions. I’m also trying to get up to speed on PowerPivot and PowerView, so Dan Clark’s “Building your First Tabular Model” will be of interest to me as well. There is a whole SQL 2012 track at the event, and we have two Administration tracks as well. Should be great — please join us if you can!

Pittsburgh, Pennsylvania. Photo:Herbert Spencer, used under Creative Commons License.

P.S. My last post about ComputeReach? Turns out that the girlfriend is going along with those computers to Ghana, to do installs and training. The opportunity came on suddenly, and it’s all happening very quickly. Had to laugh when she showed up at my house the other day with half a dozen band-aids crisscrossed on each arm after getting all the required inoculations all at once! To say she’s excited about the trip would be an understatement…

Posted in Uncategorized | Comments Off on SQL Saturday #171 — The Schedule is Up

ComputeReach

Man it’s tough to get in all that you want to do, isn’t it? I’ve been trying to get my blogging activities rebooted, and do some volunteer work as well. I’m starting to combine the two by volunteering on the SQL Saturday Committee for SQL Saturday #171 Pittsburgh (October 6th, come join us!), using the blog as one vehicle to get the word out about the event.

The real purpose of this blog entry, though, is to take a moment to talk about another volunteer activity I participated in recently, one that I think most of us in the tech world can appreciate.

In late June I spent a Saturday working with ComputeReach, a Pittsburgh-based humanitarian organization whose mission is to repurpose used computers for those less fortunate. In addition to putting computers into the hands of those who might not normally have access to them, it properly recycles computers that it cannot use, keeping those computers from potentially ending up in landfills, where they can cause environmental issues at the very least.

On this particular Saturday a big group of people — including individuals from the Pittsburgh office of Google, the Western PA Linux User’s Group, and ITT Tech — came together to repurpose approximately 180 Dell Optiplex computers. They were updated with expanded RAM and fresh installations of Edubuntu Linux, boxed and packaged for shipping. It was a hot, hot day to be working in a warehouse, moving, imaging, and repackaging computers, but everyone was enthusiastic and did a great job. For a video of the volunteers in action, see this link. Roll call is at the end of the video, where everyone introduces themselves. I’m the first one up, caught a bit off guard (what, me first? :-)).

While ComputeReach primarily serves impoverished communities in the US, thanks to a generous donation from Google, many of these computers will be going to the The Street Academy in Ghana, which provides education to children living on the streets in Accra, the capital city of Ghana.

If you are interested in learning more about ComputeReach, please contact Dave Sevick (b|t) at dave@computereach.org.

Posted in Personal | Comments Off on ComputeReach

Update

Well, the owl has flown the coop, there will be no nestcam.   I suspect the owl found a mate and is nesting somewhere else more appropriate.  I’m planning to build a bigger nest box over the summer, and hopefully next season will have some exciting images to share!

Professional update:  getting involved in the local SQL Server user’s group, which is supposed to be getting off the ground again after some years in limbo.  Hope to present at one of the meetings over the summer.  Data mining proof-of-concept work has been interesting, but there is still much to learn.  Currently working with the NPI data to improve our product — it’s already produced some new revenue, so that’s good.  Unfortunately, implementing it involves my doing some VB.NET programming (the VB being the unfortunate part), which is not where I want to go, but if it improves the product and keeps us in the game, I don’t mind doing that for a while.  Do hope to get back on that data mining soon, though.

Posted in Personal | Comments Off on Update