

Simple, insignificant error in this particular case, but when that error propagates in a complex query, it can add up and lead to the wrong choice of JOIN down the line (or up the plan). If it doesn’t know this (because our standard statistics don’t correlate FIRST_NAME / LAST_NAME with ACTOR_ID), then we get the average number of films for any actor. If the database knows we’re dealing with ACTOR_ID = 1, it can pick the statistics on the number of films for that actor. SELECT count(*) c FROM film_actor GROUP BY actor_id Here are some interesting numbers: SELECT count(*) FROM film_actor WHERE actor_id = 1 |* 4 | INDEX RANGE SCAN | PK_FILM_ACTOR | 27 |Ĥ - access("A"."ACTOR_ID"="FA"."ACTOR_ID")Īs you can see, the estimate for the number of of FILM_ACTOR rows is too high, and the estimate for the NESTED LOOP result is too low. |* 3 | INDEX RANGE SCAN | IDX_ACTOR_LAST_NAME | 3 | |* 2 | TABLE ACCESS BY INDEX ROWID BATCHED| ACTOR | 1 | In particular, the cardinality estimates will be much more precise this way, as we can pick the estimate based on a concrete, constant predicate value, rather than, for example, the average number of films per actor as in this query (which returns the same result): SELECT first_name, last_name, film_id This has a few nice effects on more complex queries. A.ACTOR_ID = FA.ACTOR_ID (from the ON predicate).A.ACTOR_ID = 1 (from the WHERE predicate) and….The predicate ACTOR_ID = 1 is applied to both the ACTOR and FILM_ACTOR tables because of transitive closure. Specifically, the predicate section is really interesting. Predicate Information (identified by operation id): |* 4 | INDEX RANGE SCAN | PK_FILM_ACTOR | 19 | |* 3 | INDEX UNIQUE SCAN | PK_ACTOR | 1 | | 2 | TABLE ACCESS BY INDEX ROWID| ACTOR | 1 | Now, observe the execution plan if we run this query in Oracle:. The result being: FIRST_NAME LAST_NAME FILM_ID JOIN film_actor fa ON a.actor_id = fa.actor_id Let’s get all films for ACTOR_ID = 1: SELECT first_name, last_name, film_id It can be said that if A = B and B = C, then A = C.ĭuh, right? But this has some nice implications on SQL optimizers. It’s a really trivial concept that applies to a variety of math operations, i.e. Let’s start with something simple: transitive closure. We'll talk about 1-5 today, and 6-10 in Part 2. Impossible predicates and unneeded table accesses.These will be the ten optimization types:

In all of this article, I will be using queries against the Sakila database - as always. This post will evaluate 10 SQL optimizations on the five most popular RDBMS ( according to the db-engines ranking): Specifically, the second case is really cool, as these optimizations allow us to build complex libraries of views and table-valued functions, which we can reuse in several layers. In the first case, you could claim: “Well, then fix the stupid SQL already,” but then again, who never makes any mistakes, right? Allow for reusing complex views without actually executing the entire logic from the view.Most of these optimizations are applied to: These optimizations remove needless, optional work (as opposed to needless, mandatory work, which I’ve blogged about before) Where Do These Optimizations Apply? So, they’re not no-brainers in the sense whether they’re easy for the optimiser teams to implement, but they’re no-brainers in the sense whether they should be done. They’re usually no-brainers for a database to optimize because the optimization will always lead to a better execution plan, independently of whether there are any indexes, or how much data you have, or how skewed your data distribution is. We’ll look into much simpler optimizations that can be implemented purely based on metadata (i.e. optimizations that depend on a database’s cost model.

Today, we don’t want to talk about cost-based optimization, i.e. I’ve recently delivered a talk about that topic: It is the reason why it is really hard to implement a complex, hand-written algorithm in a 3GL (third-generation programming language) such as Java that outperforms a dynamically calculated database execution plan that has been generated from a modern optimizer. Cost-based optimization is the de-facto standard way to optimize SQL queries in most modern databases.
