Monday, April 8, 2019

Oracle VS SQL Server Query Tuning

Note: I'm not a certified relational database administrator of any sort. Although I am begrudgingly a certified Cassandra administrator...

Introduction

The relational database technologies in use by our clients tends to depend on what region they're in. In APAC and Europe they tend to use Microsoft SQL Server. In the USA it's typically Oracle.
In the performance team, we work on performance issues across all our clients. So we wind up optimising queries in both SQL Server and Oracle.
You could be forgiven for assuming that the challenges would be much the same. After all, it's all SQL, and SQL is a standard. In reality, they're very different beasts.
While not the topic of this post, the most immediately obvious difference is in the tooling.
With Oracle we tend to utilise Oracle's tooling to tune and optimise (AWR Reports, Enterprise Management Studio, etc).
With SQL Server, we tend to lean on third party and our own bespoke tooling because the tooling provided by Microsoft is fairly horrendous.
In this post I'd like to focus on the differences when it comes to the tuning of queries themselves, regardless of the tools available.

Query Tuning Prioritisation

When we're looking to optimise a poorly performing query, we like prioritise our avenues of investigation such that we're targeting the simplest fixes that have the least ongoing maintenance burden associated with them first.
In some cases we can simply ensure statistics are representative of the data (and that a representative view is maintained going forward), and that'll enable the optimiser to select a more appropriate execution plan without us needing to touch the query or the schema.
Failing that, we may need to add an index or improve the query itself. If the query to be changed is from product (as opposed to solution configuration), this will need to be driven (sometimes with urgency) through development, which can make for a relatively heavy-weight fix.
Then there are hints and the locking down of execution plans to effectively force (or at least strongly suggest) the optimiser does things the way you know to be better. This is generally avoided as it is very data dependent and may become less appropriate as data changes, and isn't really suitable to being productised for all customers. Additionally, we don't like to do this as it breaks the declarative model of SQL (more on this later). We typically don't wind up getting to this point with Oracle, because the optimiser is really good, so long as you are feeding in the right information.
However, once again SQL Server is a different story...

SQL Server Challenges

With SQL Server we are continually running into issues that can't be resolved by having up-to-date statistics, an optimally written query, and the appropriate indexes. We often find that SQL Server opts for execution plans that defy all logic.
It seems to be that when a query reaches a certain level of complexity, the optimiser just gives up. You add one more join and and go from a sane execution plan that takes seconds, to one that does a tonne of index spooling and takes minutes.
One example of this is were you have a (complex) sub query and some WHERE conditions outside of it.
SELECT * FROM
( SELECT [columna] ,
  [columnb] ,
  [columnc] ,
  ...
) SUBQUERY
WHERE SUBQUERY.columna = 'foo'
AND SUBQUERY.columnb IN ('foo','bar')
AND SUBQUERY.facilityCode IN ('1','2','3','4')
Running just the sub query might return a completely unfiltered result set of a few hundred rows in seconds.
However, including the WHERE clauses results in the query taking minutes, despite returning less data.
What seems logical to humans in this case would be running the sub query first, THEN applying the predicates.
However, SQL Server refuses to see this.
To make matters worse, Microsoft has acknowledged there is not a way to hint SQL server to do this...
So what can be done?
A dirty hack of course...
SELECT * FROM
(SELECT TOP 1000000000 [columna]
  [columnb] ,
  [columnc] ,
  ...
ORDER BY columna
) SUBQUERY
WHERE SUBQUERY.columna = 'foo'
AND SUBQUERY.columnb IN ('foo','bar')
AND SUBQUERY.facilityCode IN ('1','2','3','4')
What this does is it forces the database to evaluate the entire subquery first before applying the predicates as it needs to ensure that it gets the top 1,000,000,000 (a "safe" number because we know we'll only ever have as many as thousands of rows returned) rows ordered by columna first.
It's not quite as optimal as having the optimiser choose a sane execution plan itself, or using a hint would be, but it ain't bad.

The Declarative Paradigm

Aside the the obvious hackery in the workaround demonstrated above, the major concern we have with some of the optimisation work we wind up doing with SQL Server that we don't wind up needing to do with Oracle, is we tend to have to break the declarative paradigm of SQL.
SQL is meant to be a declarative language, in that you tell it what to do, not how to do it.
You say, get me some milk. You shouldn't have to tell it to get the milk from the fridge, not from the shop.
Sure, you have to let it know that there is milk in the fridge for it to get... But in the case of SQL Server, we've told it that there's milk in the fridge and we find it off looking for a cow to milk...