Deferrable RI

Here’s a lovely little example that just came up on the OTN database forum of how things break when features collide. It’s a bug (I haven’t looked for the number) that seems to be fixed in 12.1.0.1. All it takes is a deferrable foreign key and an outer join. I’ve changed the table and column names from the original, and limited the deferability to just the foreign key:


create table parent(id_p date primary key);
create table child(id_c date not null references parent(id_p) deferrable);
 
alter session set constraints = deferred;
 
insert into child values(sysdate);
insert into child values(sysdate);
insert into child values(sysdate);
 
select	
	par.id_p, chi.id_c 
from
	child chi 
left join 
	parent par 
on 	par.id_p = chi.id_c
where	par.id_p is null 
and	chi.id_c is not null
;

select
	chi.id_c 
from
	child chi 
left join 
	parent par 
on	par.id_p = chi.id_c
where	par.id_p is null 
and	chi.id_c is not null
;

You’ll notice that the difference between the two queries is that the first one selects columns from both the parent and child tables; the second selects only from the child. Since the join is across a parent/child referential integrity constraint, and the primary key is a single column, and no columns from the parent appear in the select list, the optimizer is able to invoke “table elimination” in the second case – except that it shouldn’t because the side effect is to produce the wrong answer. Here are the two sets of results when running 11.2.0.4:

ID_P      ID_C
--------- ---------
          18-OCT-13
          18-OCT-13
          18-OCT-13

3 rows selected.

no rows selected

In 12.1.0.1 both queries produce the same (first) set of results.

In the second query 11g Oracle (incorrectly) removes the join to the parent table and replaces it with the predicate “id_c is null” (since the only effect of the join would normally be to eliminate any rows where id_c is null), this predicate is then combined with the existing “id_c is not null” predicate to produce “null is not null” – which is why we get no rows returned.

The problem, of course, is that the removal and substitution is only valid if the constraint check is in a valid state, and at this point we have deferred the constraint and got some bad data into the child table – Oracle should not do table elimination. (As a side note, this is why you do not see table elimination occurring when the critical constraints are in the state: “enable novalidate” – the tables may contain invalid data which means the predicate substitution may change the result.)

Here are the two sets of plans, first 11.2.0.4


------------------------------------------------------------------------------------
| Id  | Operation           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |              |    82 |  1476 |     2   (0)| 00:00:01 |
|*  1 |  FILTER             |              |       |       |            |          |
|   2 |   NESTED LOOPS OUTER|              |    82 |  1476 |     2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| CHILD        |    82 |   738 |     2   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN| SYS_C0010228 |     1 |     9 |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("PAR"."ID_P" IS NULL)
   4 - access("PAR"."ID_P"(+)="CHI"."ID_C")



----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |     9 |     0   (0)|          |
|*  1 |  FILTER            |       |       |       |            |          |
|   2 |   TABLE ACCESS FULL| CHILD |    82 |   738 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NULL IS NOT NULL)

And now 12.1.0.1


------------------------------------------------------------------------------------
| Id  | Operation           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |              |    82 |  1476 |     2   (0)| 00:00:01 |
|*  1 |  FILTER             |              |       |       |            |          |
|   2 |   NESTED LOOPS OUTER|              |    82 |  1476 |     2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| CHILD        |    82 |   738 |     2   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN| SYS_C0011569 |     1 |     9 |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("PAR"."ID_P" IS NULL)
   4 - access("PAR"."ID_P"(+)="CHI"."ID_C")


-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |    82 |  1476 |     2   (0)| 00:00:01 |
|   1 |  NESTED LOOPS ANTI |              |    82 |  1476 |     2   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| CHILD        |    82 |   738 |     2   (0)| 00:00:01 |
|*  3 |   INDEX UNIQUE SCAN| SYS_C0011569 |     1 |     9 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PAR"."ID_P"="CHI"."ID_C")

It’s interesting to note that 12c is able to convert the second query into an anti join (in other words it has changed an outer join to a (not exists) subquery, and then transformed it back into a different type of join).

One little aside – the first thought I had about the error was that it might be a side effect of the ANSI style join and something that the optimizer was messing up in the transformation to “traditional” style, so I have repeated the test using traditional Oracle syntax, and the problem persists.

Update 20/10/2013:

Corrected thanks to a twitter comment, I had swapped the ‘id_c is null’ and ‘id_c is not null’ in my explanation of the error.