Very often, significant performance benefits can be obtained by using some very basic knowledge of the application, its data and business rules. Sometimes even less than that: even if you are not familiar with the application logic at all, you can still use common sense to make some reasonable guesses that would get you a long way in improving query’s performance. Here is an example (based on an actual query that I had to tune today).
Suppose you have a query that does a bunch of joins, but most of the time it spends reading data from a single table (say, T). The table is big, it’s partitioned (say, by range), but alas, the partitioning key (say, X) is not in the query. There is a filtering predicate Y>y0 with a reasonable selectivity, but it’s of no use to you as it’s not indexed. So you’re stuck choosing between a terribly inefficient index scan (on some other predicates with poor selectivity) or doing a full tablescan which looks equally unattractive. So what can you do?
The honest answer is — probably nothing. But let’s change the situation a little bit. Namely, let’s change some names here. Let’s say that the table T is called ORDERS, not T. Let’s say that column X is in fact a date column called ORDER_PROCESSED, while column Y is another date column, called ORDER_CREATED. Technically, nothing changed — you still have a predicate that you cannot use to make your data access more efficient, at least not directly. But now that column names mean something, you can make a very reasonable assumption (which later you would need to verify) that ORDER_PROCESSED > ORDER_CREATED (you can’t process something before it’s created, duh!). So your table is partitioned on ORDER_PROCESSED column, and your filtering condition is ORDER_CREATED > :some_date, and if this assumption that we just made is true, it implies that you can add ORDER_PROCESSED > :some_date to the query.
This additional predicate can be very useful, especially if :some_date is a relatively recent date, in which case it would allow the optimizer to eliminate all but one or a few most recent table partitions from the query (and this is also very likely to be true because people typically have much more interest in recent, rather than old, data).
As always with redundant predicates, there is a risk that it can affect the cardinality estimate, because the optimizer does not know that the two predicates are highly correlated (unless there exists extended statistics for this column group). So you need to be careful, and if the optimizer starts trying funny things, you need to get some sense into it (e.g. by using “cardinality” or “opt_estimate” hints).
Note that ideally any business rule like ORDER_PROCESSED > ORDER_CREATED should be coded as a check constraint, but you cannot really count on that (there are too many databases where even primary keys and foreign keys are missing!). If a check constraint is present, then sometimes the optimizer might be able to make use of it to generate additional predicates, but unfortunately you cannot always count on that, either. I’ve ran a simple test on 11.2gR2 and the check constraint did not have any impact on the execution plan.
Of course the same principle applies to a broader class of cases — e.g. instead of partition pruning there could be another optimization (e.g. use of an index), and the business rule doesn’t have to be an inequality, it can be something more complex.
So to sum up:
- relationships between data are important for query performance (so e.g. if database columns are in a foreign language, it helps to know that language, at least to this extent!)
- corrolary: meaningful database names are also very important
- ideally, relationships between data need to be coded as check constraints, but you can’t always count on that (plus, Oracle is currently unable to enforce mutlitable constraints, although a large number of users are pushing for implementation of such constraints, or SQL assertions)
- you can’t always count on the optimizer to make full use of constraints in query optimization, so sometimes you might need to give the optimizer a hand
- redundant predicates can lead to incorrect cardinality estimates, so you need to be careful when using them, and use hints or other tools to address that potential issue.
One thought on “Business rules, common sense and query performance”