Quantcast
Channel: Bugs – Oracle Scratchpad
Viewing all articles
Browse latest Browse all 123

Case Study

$
0
0

A recent post on the Oracle SQL and PL/SQL forum posted the following query with a complaint that it produced the wrong results:

select  count(*) 
from    all_synonyms
left join 
        all_objects b 
on      (b.owner,object_name)=(select table_owner,table_name from dual)
;

This caused a little confusion to start with since the opening complaint was that the query sometimes produced the wrong results and sometimes produced Oracle error ORA-01799: a column may not be outer-joined to a subquery. The difference appears, of course, because the restriction applies to older versions of Oracle but was lifted by 12c.

The other point of confusion is that there’s no easy way to tell from the output that the result is wrong – especially when the count was in the thousands. This point was addressed by the OP supplying an image of the first rows (and a subset of the columns) for the query when “count(*)” was changed to just “*” (and switched from the “all_” views to the “dba_” views: the output for columns dba_objects.owner and dba_objects.object_name reported the same pair of values for every row – in the case of the OP this was “SYS” and “DUAL”.

Investigation

You could attack this problem from several directions – you could just edit the query to avoid the problem (the structure is a little curious); you could investigate the 10053 trace file to see what optimizer is doing at every step of the way, or you could try to simplify the model and see if the problem still appears. So I created table my_synonyms as a copy of the data in dba_synonyms, and my_objects as a copy of the data in dba_objects and created an index on my_objects(owner, object_name). So my query turned into:

select
        *
from    
        my_synonyms   syn
left join 
        my_objects    obj
on 
        (obj.owner, obj.object_name) = (
                select /*+ qb_name(dual_bug) */ syn.table_owner, syn.table_name from dual
        )
/

You’ll notice that I’ve given both tables a “meaningful” alias and used the aliases for every column, and I’ve also added a query block name (qb_name) to the subquery against dual – because I’d assumed that the subquery probably played a key role in messing the optimizer up (I was wrong) and I wanted to be able to track it easily.

The query produced the wrong results. In my case every row reported the object owner and name as “SYS” / “PRINT_TABLE” – selecting 11,615 rows. The basic execution plan, pulled from memory using dbms_xplan.display_cursor()) looked like this:

----------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |       |       | 46548 (100)|          |
|   1 |  MERGE JOIN OUTER                      |                 | 11615 |  6261K| 46548   (1)| 00:00:02 |
|   2 |   TABLE ACCESS FULL                    | MY_SYNONYMS     | 11615 |   805K|    18   (6)| 00:00:01 |
|   3 |   BUFFER SORT                          |                 |     1 |   481 | 46530   (1)| 00:00:02 |
|   4 |    VIEW                                | VW_LAT_881C048D |     1 |   481 |     4   (0)| 00:00:01 |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| MY_OBJECTS      |     1 |   132 |     4   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN                  | OBJ_I1          |     1 |       |     3   (0)| 00:00:01 |
|   7 |       FAST DUAL                        |                 |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access("OBJ"."OWNER"= AND "OBJ"."OBJECT_NAME"=)

The optimizer had transformed the query into a form using a lateral view – and there have been bugs with lateral views in the past, particularly relating to decorrelation – so maybe the problem was there and a few tests with various hints or fix_controls, or optimizer parameter settings to disable certain features might identify the source of the problem; but before doing that I thought I’d just get a small result set to check the results and execution plan in detail by adding a predicate “rownum <= 12” re-executing with rowsource execution stats enabled, and pulling the plan including the Query Block Names and Object Aliases, Outline Data, Column Projection Information and not forgetting the kitchen sink.

The query produced results that looked as if they were likely to be correct – I no longer had a repeating, incorrect, object owner and name on every row. I also had a plan that had changed to:

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |      1 |        |     12 |00:00:00.01 |      24 |
|*  1 |  COUNT STOPKEY                         |                 |      1 |        |     12 |00:00:00.01 |      24 |
|   2 |   NESTED LOOPS OUTER                   |                 |      1 |     12 |     12 |00:00:00.01 |      24 |
|   3 |    TABLE ACCESS FULL                   | MY_SYNONYMS     |      1 |     12 |      7 |00:00:00.01 |       3 |
|   4 |    VIEW                                | VW_LAT_881C048D |      7 |      1 |     12 |00:00:00.01 |      21 |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| MY_OBJECTS      |      7 |      1 |     12 |00:00:00.01 |      21 |
|*  6 |      INDEX RANGE SCAN                  | OBJ_I1          |      7 |      1 |     12 |00:00:00.01 |      17 |
|   7 |       FAST DUAL                        |                 |      7 |      1 |      7 |00:00:00.01 |       0 |
--------------------------------------------------------------------------------------------------------------------

There’s a “count stopkey” operation to handle the “rownum <= 12” predicate, of course; but most significantly the “merge join outer” operation has changed to a “nested loop outer” operation.

Repeating the test but with “rownum <= 1200” I still got results that looked correct, and still got a “nested loop outer”. A third attempt, with “rownum <= “30000” switched back to wrong results and a “merge join outer” (with a “count stopkey”). So – initial conclusion – it’s the implementation of the merge join that has gone wrong.

Resolution

If the only problem is the choice of join mechanism we can patch the code (by hand or with an SQL_PATCH) to bypass the problem. Given the simple pattern that produces the problem I should be able to find two critical hints in the Outline Data, one which is a leading() hint that dictates the order (my_synonyms, vw_lat_xxxxxxxx) – the user will presumably have a different name generated for their lateral view – and a use_merge() or use_nl() hint that dictates the mechanism to use to join vw_lat_xxxxxxxx to my_synonyms.

In both plans the Outline Data showed me the following leading() hint (cosmetically adjusted):

leading(@sel$d9e17d64 syn@sel$1 vw_lat_6b6b5ecb@sel$6b6b5ecb)

This was then followed by a use_nl() hint in the correct case:

use_nl(@sel$d9e17d64 vw_lat_6b6b5ecb@sel$6b6b5ecb)

and by a use_merge_cartesian() hint in the erroneous case (so it’s possibly just the “cartesian” bit that’s going wrong.:

use_merge_cartesian(@sel$d9e17d64 vw_lat_6b6b5ecb@sel$6b6b5ecb)

So, as a final test, I edited the query to include the leading() and use_nl() hints, and ran it.

select  /*+
                leading(@sel$d9e17d64 syn@sel$1 vw_lat_6b6b5ecb@sel$6b6b5ecb)
                use_nl(@sel$d9e17d64 vw_lat_6b6b5ecb@sel$6b6b5ecb)
        */
        *
from    
        my_synonyms syn
left join 
        my_objects obj
on 
        (obj.owner, obj.object_name) = (
                select /*+ qb_name(dual_bug) */ syn.table_owner, syn.table_name from dual
        )
/

The problem of the repetitious “SYS”/”PRINT_TABLE” disappeared, and the total number of rows selected increased from 11,615 to 12,369 (mostly due to package and package bodies having the same owner and name and so doubling up a number of rows from the dba_synonyms table).

Summary

It’s almost always a good idea to simplify a messy problem.

Always check the execution plan, including Predicate Information, Outline Data and Query Block / Object Alias details.

Although I didn’t use the results in this case, executing with rowsource execution statistics enabled can be very helpful.

This case focused on a detail that looked as if a “newish” feature might be guilty; but the simple model suggested it’s an implementation detail of an old feature (possibly reached because the optimizer has applied a new feature). We can work around it very easily, and we can build a very simple test case to raise with Oracle in an SR.


Viewing all articles
Browse latest Browse all 123

Latest Images

Trending Articles



Latest Images