Oracle optimizer doesn’t understand elementary math


If A < B and B < C then A < C

Right? We all know the transitive rule… all of us except the Oracle optimizer.
Here’s a seasonal example to illustrate how the Optimizer changes it’s methods based on seemingly trivial options.

Let’s say I bake 10 types of cookies and then I want to see how many distinct bundles of 3 types I can make.
Mathematically we’d represent that as C(10,3) = 120.

In SQL, we could solve for them like this:

CREATE TABLE cookies
AS
SELECT 'oatmeal' cookie FROM DUAL
UNION ALL
SELECT 'm&m' cookie FROM DUAL
UNION ALL
SELECT 'chocolate chip' cookie FROM DUAL
UNION ALL
SELECT 'molasses' cookie FROM DUAL
UNION ALL
SELECT 'gingerbread' cookie FROM DUAL
UNION ALL
SELECT 'macadamia nut' cookie FROM DUAL
UNION ALL
SELECT 'peanut butter' cookie FROM DUAL
UNION ALL
SELECT 'snickerdoodle' cookie FROM DUAL
UNION ALL
SELECT 'sugar cookie' cookie FROM DUAL
UNION ALL
SELECT 'shortbread' cookie FROM DUAL;

EXEC dbms_stats.gather_table_stats(ownname=>user,tabname=>'COOKIES');

Now run a couple of test queries to check our math.

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie < b.cookie AND b.cookie < c.cookie;

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie < b.cookie AND b.cookie < c.cookie AND a.cookie < c.cookie;

Note, both queries return the expected 120 rows.
However, the optimizer came up with 2 different plans for these functionally identical queries.

With implied transitivity

--------------------------------------------------
| Id  | Operation             | Name    | E-Rows |
--------------------------------------------------
|   0 | SELECT STATEMENT      |         |    203 |
|   1 |  MERGE JOIN           |         |    203 |
|   2 |   SORT JOIN           |         |     45 |
|   3 |    MERGE JOIN         |         |     45 |
|   4 |     SORT JOIN         |         |     10 |
|   5 |      TABLE ACCESS FULL| COOKIES |     10 |
|*  6 |     SORT JOIN         |         |     10 |
|   7 |      TABLE ACCESS FULL| COOKIES |     10 |
|*  8 |   SORT JOIN           |         |     10 |
|   9 |    TABLE ACCESS FULL  | COOKIES |     10 |
--------------------------------------------------

With explicit transitivity

-------------------------------------------------
| Id  | Operation            | Name    | E-Rows |
-------------------------------------------------
|   0 | SELECT STATEMENT     |         |     91 |
|   1 |  MERGE JOIN          |         |     91 |
|   2 |   MERGE JOIN         |         |     45 |
|   3 |    SORT JOIN         |         |     10 |
|   4 |     TABLE ACCESS FULL| COOKIES |     10 |
|*  5 |    SORT JOIN         |         |     10 |
|   6 |     TABLE ACCESS FULL| COOKIES |     10 |
|*  7 |   FILTER             |         |        |
|*  8 |    SORT JOIN         |         |     10 |
|   9 |     TABLE ACCESS FULL| COOKIES |     10 |
-------------------------------------------------

We can take it a step futher, let’s say I want all of the cookies to be the same type.
This is obviously going to be just 10 different combinations and no joins are needed; but I’ll write it this way to further illustrate the point.

With implied transitivity

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie = b.cookie AND b.cookie = c.cookie;

------------------------------------------------
| Id  | Operation           | Name    | E-Rows |
------------------------------------------------
|   0 | SELECT STATEMENT    |         |     10 |
|*  1 |  HASH JOIN          |         |     10 |
|*  2 |   HASH JOIN         |         |     10 |
|   3 |    TABLE ACCESS FULL| COOKIES |     10 |
|   4 |    TABLE ACCESS FULL| COOKIES |     10 |
|   5 |   TABLE ACCESS FULL | COOKIES |     10 |
------------------------------------------------

With explicit transitivity

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie = b.cookie AND b.cookie = c.cookie AND a.cookie = c.cookie;

------------------------------------------------
| Id  | Operation           | Name    | E-Rows |
------------------------------------------------
|   0 | SELECT STATEMENT    |         |      1 |
|*  1 |  HASH JOIN          |         |      1 |
|*  2 |   HASH JOIN         |         |     10 |
|   3 |    TABLE ACCESS FULL| COOKIES |     10 |
|   4 |    TABLE ACCESS FULL| COOKIES |     10 |
|   5 |   TABLE ACCESS FULL | COOKIES |     10 |
------------------------------------------------

Again note the change in estimated rows for an evaluation that we might assume the optimizer could figure out on its own.

The optimizer doesn’t understand transitivity of columns; but what if we add variables or constants?
Going back to the original query and take a closer look:

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie < b.cookie AND b.cookie < c.cookie;

The plan’s predicates include:

Predicate Information (identified by operation id):
---------------------------------------------------
 
   6 - access(INTERNAL_FUNCTION("A"."COOKIE")<INTERNAL_FUNCTION("B"."COOKIE"))
       filter(INTERNAL_FUNCTION("A"."COOKIE")<INTERNAL_FUNCTION("B"."COOKIE"))
   8 - access("B"."COOKIE"<"C"."COOKIE")
       filter("B"."COOKIE"<"C"."COOKIE")

But, if we add a variable, obviously this changes the results, but it’s also interesting to note how the SQL condition becomes interpreted as filter conditions within the optimizer’s plan predicates.

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie < b.cookie AND b.cookie < c.cookie and a.cookie= :b1;

------------------------------------------------
| Id  | Operation           | Name    | E-Rows |
------------------------------------------------
|   0 | SELECT STATEMENT    |         |      1 |
|   1 |  NESTED LOOPS       |         |      1 |
|   2 |   NESTED LOOPS      |         |      1 |
|*  3 |    TABLE ACCESS FULL| COOKIES |      1 |
|*  4 |    TABLE ACCESS FULL| COOKIES |      1 |
|*  5 |   TABLE ACCESS FULL | COOKIES |      1 |
------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("A"."COOKIE"=:B1)
   4 - filter("B"."COOKIE">:B1 AND "A"."COOKIE"<"B"."COOKIE")
   5 - filter("C"."COOKIE">:B1 AND "B"."COOKIE"<"C"."COOKIE")

The variable does get propagated through to the other conditions as we might expect.
Now, let’s try adding a literal value and we’ll see Oracle again is able to derive transitivity and propagates the constant literal through to all of the table predicates and radically alters the estimates and execution plan

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie < b.cookie AND b.cookie < c.cookie and a.cookie= 'gingerbread';

--------------------------------------------------
| Id  | Operation             | Name    | E-Rows |
--------------------------------------------------
|   0 | SELECT STATEMENT      |         |     14 |
|   1 |  MERGE JOIN           |         |     14 |
|   2 |   MERGE JOIN CARTESIAN|         |      8 |
|*  3 |    TABLE ACCESS FULL  | COOKIES |      1 |
|   4 |    BUFFER SORT        |         |      8 |
|*  5 |     TABLE ACCESS FULL | COOKIES |      8 |
|*  6 |   FILTER              |         |        |
|*  7 |    SORT JOIN          |         |      8 |
|*  8 |     TABLE ACCESS FULL | COOKIES |      8 |
--------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter("A"."COOKIE"='gingerbread')
   5 - filter("C"."COOKIE">'gingerbread')
   6 - filter("B"."COOKIE"<"C"."COOKIE")
   7 - access("A"."COOKIE"<"B"."COOKIE")
       filter("A"."COOKIE"<"B"."COOKIE")
   8 - filter("B"."COOKIE">'gingerbread')

But, even though it now seems to understands the transitivity, that understanding is still flawed, because if we put the implicit condition of a.cookie < c.cookie, which offers no functional change, we still get different plans with different joins and estimates.

SELECT *
  FROM cookies a, cookies b, cookies c
 WHERE a.cookie < b.cookie
   AND b.cookie < c.cookie
   AND a.cookie < c.cookie
   AND a.cookie = 'gingerbread';
   
-------------------------------------------------
| Id  | Operation            | Name    | E-Rows |
-------------------------------------------------
|   0 | SELECT STATEMENT     |         |      8 |
|   1 |  MERGE JOIN          |         |      8 |
|   2 |   NESTED LOOPS       |         |      4 |
|*  3 |    TABLE ACCESS FULL | COOKIES |      1 |
|*  4 |    TABLE ACCESS FULL | COOKIES |      4 |
|*  5 |   FILTER             |         |        |
|*  6 |    SORT JOIN         |         |      8 |
|*  7 |     TABLE ACCESS FULL| COOKIES |      8 |
-------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter("A"."COOKIE"='gingerbread')
   4 - filter("A"."COOKIE"<"C"."COOKIE" AND "C"."COOKIE">'gingerbread')
   5 - filter("B"."COOKIE"<"C"."COOKIE")
   6 - access("A"."COOKIE"<"B"."COOKIE")
       filter("A"."COOKIE"<"B"."COOKIE")
   7 - filter("B"."COOKIE">'gingerbread')

So, what are the lessons here?

First, I really like oatmeal cookies.

Second, I’m reminded that I should bake more.

Third, if you’re stuck on a query where the optimizer seems like it’s ignoring some crucial information, it may be a hole in its logic where it doesn’t extend “obvious” rules throughout the plan. If so, it may be worth changing the where conditions to include, exclude or alter functionally identical conditions.

For example:
WHERE a=b and b=c
could be written
WHERE a=b and a=c
Or
WHERE a=b and a=c and b=c

All three conditions are functionally identical; but writing them in different ways could hide or expose information the optimizer can’t derive on its own (even if we think it should.)

Nifty math trick with hierarchical query


Using the algorithm discovered by David Bailey, Peter Borwein, and Simon Plouffe, you can easily generate values of pi with a simple recursive query…

SQL> SELECT d, to_char(bbp, rpad('0.',55, '9')) bbp,
  2    TO_CHAR(
  3       SUM(bbp) OVER (ORDER BY d),
  4       '9.' || RPAD('9', d - 1, '9')
  5    ) pi
  6    FROM (SELECT LEVEL d,
  7                   POWER(16, -(LEVEL - 1))
  8                 * (  4 / (8 * (LEVEL - 1) + 1)
  9                    - 2 / (8 * (LEVEL - 1) + 4)
 10                    - 1 / (8 * (LEVEL - 1) + 5)
 11                    - 1 / (8 * (LEVEL - 1) + 6))
 12                     bbp
 13            FROM DUAL
 14          CONNECT BY LEVEL <= 38);

  D BBP                                                          PI
--- ------------------------------------------------------------ ----------------------------------------
  1  3.13333333333333333333333333333333333333000000000000000      3.
  2  0.00808913308913308913308913308913308913308800000000000      3.1
  3  0.00016492392411510058568882098293862999745400000000000      3.14
  4  0.00000506722085385878489326765188834154351396000000000      3.142
  5  0.00000018789290093772001666738508843772001666720000000      3.1416
  6  0.00000000776775121517735681309382263783112139415700000      3.14159
  7  0.00000000034479329305086272635969401468053759158800000      3.141593
  8  0.00000000001609187715553700527429096273205488602519000      3.1415927
  9  0.00000000000077957029540010122791277882390414359723220      3.14159265
 10  0.00000000000003887115259909751224518898399125507993590      3.141592654
 11  0.00000000000000198322539359813099744403743678780728487      3.1415926536
 12  0.00000000000000010309712169788873230460615359913961954      3.14159265359
 13  0.00000000000000000544347406057178666434000298497848856      3.141592653590
 14  0.00000000000000000029121117943841783833432916474186319      3.1415926535898
 15  0.00000000000000000001575498009770082070311785298679383      3.14159265358979
 16  0.00000000000000000000086069263270039599262676755656370      3.141592653589793
 17  0.00000000000000000000004742046744556226855262717169064      3.1415926535897932
 18  0.00000000000000000000000263228669401317585860675420461      3.14159265358979324
 19  0.00000000000000000000000014709093902773314327264205417      3.141592653589793238
 20  0.00000000000000000000000000826833002827638605906177487      3.1415926535897932385
 21  0.00000000000000000000000000046727110163528570518908320      3.14159265358979323846
 22  0.00000000000000000000000000002653485901449924370022705      3.141592653589793238463
 23  0.00000000000000000000000000000151345479607531561281484      3.1415926535897932384626
 24  0.00000000000000000000000000000008666828560347770253967      3.14159265358979323846264
 25  0.00000000000000000000000000000000498130681533106648748      3.141592653589793238462643
 26  0.00000000000000000000000000000000028727019197415804591      3.1415926535897932384626434
 27  0.00000000000000000000000000000000001661838244489253463      3.14159265358979323846264338
 28  0.00000000000000000000000000000000000096413362728707559      3.141592653589793238462643383
 29  0.00000000000000000000000000000000000005608493642017818      3.1415926535897932384626433833
 30  0.00000000000000000000000000000000000000327065934758869      3.14159265358979323846264338328
 31  0.00000000000000000000000000000000000000019117544868650      3.141592653589793238462643383280
 32  0.00000000000000000000000000000000000000001119879586043      3.1415926535897932384626433832795
 33  0.00000000000000000000000000000000000000000065734562356      3.14159265358979323846264338327950
 34  0.00000000000000000000000000000000000000000003865856221      3.141592653589793238462643383279503
 35  0.00000000000000000000000000000000000000000000227760336      3.1415926535897932384626433832795029
 36  0.00000000000000000000000000000000000000000000013441452      3.14159265358979323846264338327950288
 37  0.00000000000000000000000000000000000000000000000794528      3.141592653589793238462643383279502884
 38  0.00000000000000000000000000000000000000000000000047036      3.1415926535897932384626433832795028842

38 rows selected.

SQL>
%d bloggers like this: