Shrink Space

March 13, 2014, 12:08 am

Here’s a lovely effect looking at v$lock (on 11.2.0.4)

select
        sid, type, id1, id2, lmode, request, ctime, block
from
        V$lock
where   sid in (
                select  sid
                from    V$session
                where   username = 'TEST_USER'
        )
order by
        sid, type desc
;

       SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
---------- -- ---------- ---------- ---------- ---------- ---------- ----------
       145 TX     589833       7518          6          0        489          0
           TM      92275          0          2          6        489          1
           SK          7   25165955          6          0        489          0
           AE        100          0          4          0        582          0

       148 TX     524308       7383          6          0        490          0
           TM      92275          0          2          6        490          1
           SK          7   25173379          6          0        490          0
           AE        100          0          4          0        998          0

You’ll notice I’ve got two sessions holding a TM lock on the same table (object 92275) in mode 2 (sub-share) and waiting for an exclusive lock on the same table. Both sessions are recording the fact that they are blocking something else. You’ll have trust me when I say there are no other user sessions on the system at this point, and none of the background sessions is doing anything with that table.

The clue to what’s happening is the SK lock – it’s the “segment shrink” lock. I had two sessions start an “alter index I_n shrink space” (two different indexes on the same table) at the same time. The problem is that “shrink space” without the “compact” tries to drop the highwater mark on the index’ allocated space after completing the shrink phase – and there’s a defect in the internal code that tries to get the required exclusive lock on the underlying table: it doesn’t seem to allow for all the possible ways you can fail to get the lock. If you look at v$session_wait_history for either of these sessions, you’ll see something like the following:


  SID  SEQ#     EVENT# EVENT                            P1         P2         P3  WAIT_TIME WAIT_TIME_MICRO TIME_SINCE_LAST_WAIT_MICRO
----- ----- ---------- ------------------------ ---------- ---------- ---------- ---------- --------------- --------------------------
  145     1        798 Wait for shrink lock              0          0          0         10           99954                      70137
          2        235 enq: TM - contention     1414332422      92275          0        600         6002022                        101
          3        798 Wait for shrink lock              0          0          0         10          100723                      69335
          4        235 enq: TM - contention     1414332422      92275          0        600         6001589                         81
          5        798 Wait for shrink lock              0          0          0         10          100500                      69542
          6        235 enq: TM - contention     1414332422      92275          0        600         6002352                         86
          7        798 Wait for shrink lock              0          0          0         10          100618                      69145
          8        235 enq: TM - contention     1414332422      92275          0        600         6001545                        144
          9        798 Wait for shrink lock              0          0          0         10          100996                      69445
         10        235 enq: TM - contention     1414332422      92275          0        600         6002744                        310

The attempt to acquire the TM enqueue (lock) times out every three seconds – and I think the session then releases and re-acquires the SK lock before trying to re-acquire the TM lock – and it’s never going to get it.

The alert log, by the way, looked like this:


Wed Mar 12 12:53:27 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
Wed Mar 12 12:53:30 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
Wed Mar 12 12:53:40 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
Wed Mar 12 12:53:43 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.

I’ve said it before, and I keep repeating it when people say “Oracle resolves deadlocks automatically”: Oracle does NOT resolve deadlocks automatically – one of the sessions will rollback its last DML statement to clear the deadlock, but the other session will (almost invariably) still be waiting. It’s up to the application to do something sensible to resolve the deadlock after it receives the ORA-00060 error.

Don’t ask about the trace files – but they had both reached 400MB by the time I finished this note.

Strategy Note

If you are going to shrink objects, it’s probably best to do it in two steps: “shrink space compact“, followed by “shrink space”.
If you’re going to try to use multiple sessions to shrink several indexes as quickly as possible, make sure there’s no way that two sessions can try to “shrink space” on a single table concurrently.

Footnote:

This article was prompted by the recent appearance of bug 18388128 on MoS.

↧

RLS bug

March 20, 2014, 6:21 am

≫ Next: Easy – Oops.

≪ Previous: Shrink Space

RLS – row level security, aka VPD (virtual private database) or FGAC (fine grained access control) has a critical bug in 11g. The bug is unpublished, but gets mentioned in various other documents, so can be identified as: Bug: 7828323 ”SYS_CONTEXTS RETURNS WRONG VALUE WITH SHARED_CONTEXT_SENSITIVE”

The title tells you nearly everything you need to know – if you’ve declared a security policy as context_sensitive or shared_context_sensitive then a change to the context ought to result in the associated predicate function being called to generate a new security predicate the next time the policy becomes relevant. Thanks to bug 7828323 this doesn’t always happen – so queries can return the wrong set of results.

There are some patches for older versions (11.1.0.7 and 11.2.0.2 as far as I’ve checked), but if you don’t have, or can’t get, a patch the “workaround” is to change any relevant policies to dynamic; unfortunately the consequence of this is that the predicate function will then be called for every execution of any statement against any objects protected by that policy.

Depending on how your application has been written and how many queries are likely to invoke security policies this could easily increase your CPU usage by several percent (and if it’s a badly written application maybe a lot more).

Footnote:

It has occurred to me to wonder what happens if you have use a (normal) pl/sql function in a select list, and the function executes a statement against a table, and the table is protected by a context_sensitive security policy – and you decide to use the pl/sql result cache on the function. How long is an item supposed to stay in the result cache, and if it’s longer than a single execution of a statement will the result cache be invalidated if your context changes in a way that invalidates the current security predicate ? No time to check or test at present, though, but I’d be very cautious about putting RLS predicate functions into the result cache until I’ve played around with that idea for a bit.

↧

Easy – Oops.

April 2, 2014, 11:47 am

≫ Next: Cache anomaly

≪ Previous: RLS bug

A question came up on OTN today asking for suggestions on how to enforce uniqueness on a pair of columns only when the second column was not null. There’s an easy and obvious solution – but I decided to clone the OP’s example and check that I’d typed my definition up before posting it; and the result came as a bit of a surprise. Here’s a demo script (not using the OP’s table):


create table t1  
(  
	col1	int not null,
	col2	varchar2(1)
);  

create unique index t1_i1 on t1( 
--	case col2 when null then cast(null as int) else col1 end,
--	case when col2 is null then cast(null as int) else col1 end,
	case when col2 is not null then col1 end,
	col2
)
;

insert into t1 values(1,null);
insert into t1 values(1,null);
insert into t1 values(1,'x');
insert into t1 values(1,'y');
insert into t1 values(1,'y');

commit;

column ind1_is   format a5
column ind1_when format 9999

set null N/A

select
	case when col2 is null then cast (null as int) else col1 end	ind1_is,
	case col2 when null then cast (null as int)  else col1 end	ind1_when
from 
	t1
;

The strategy is simple, you create a unique function-based index with two columns; the first column of the index id defined to show the first column of the table if the second column of the table is not null, the second column of the index is simply the second column of the table. So if the second column of the table is null, both columns in the index are null and there is no entry in the index; but if the second column of the table is not null then the index copies both columns from the table and a uniqueness test applies.

Based on the requirement and definition you would expect the first 4 of my insert statements to succeed and the last one to fail. The index will then have two entries, corresponding to my 3rd and 4th insertions.

I’ve actually shown three ways to use the case statement to produce the first column of the index. The last version is the cleanest, but the first option is the one I first thought of – it’s virtually a literal translation the original requirement. The trouble is, with my first definition the index acquired an entry it should not have got, and the second insert raised a “duplicate key” error; the error didn’t appear when I switched the syntax of the case statement to the second version.

That’s why the closing query of the demo is there – when you run it the two values reported should be the same as each other for all four rows in the table – but they’re not. This is what I got on 11.2.0.4:


IND1_IS IND1_WHEN
------- ---------
N/A             1
N/A             1
      1         1
      1         1

I’m glad I did a quick test before suggesting my original answer.

Anyone who has other versions of Oracle available is welcome to repeat the test and report back which versions they finding working correctly (or not).

Update

It’s not a bug (see note 2 below from Jason Bucata), it’s expected behaviour.

↧

Cache anomaly

April 3, 2014, 5:27 am

≫ Next: Coincindences

≪ Previous: Easy – Oops.

Just a quick heads-up for anyone who likes to play around with the Keep and Recycle caches.

In 11g Oracle introduced the option for serial direct path reads for tablescans on tables that was sufficiently large – which meant more than the small_table_threshold – provided the table wasn’t already sufficient well cached. (The rules mean that the choice of mechanism can appear to be a little random in the production environment for tables that are near the threshold size – but if you try testing by doing “alter system flush buffer_cache” you find that you always get direct path reads in testing.)

I’ve just discovered a little oddity about this, though. I have a table of about 50MB which is comfortably over the threshold for direct path reads. But if I create a KEEP cache (db_keep_cache_size) that is a little larger than the table and then assign the table to the KEEP cache (alter table xxx storage(buffer_pool keep)) then 11.2.0.4 stops doing direct path reads, and caches the table.

Now this doesn’t seem unreasonable – if you’ve assigned an object to the KEEP cache you probably want it (or once wanted it) to be kept in cache as much as possible; so using the KEEP cache if it’s defined and specified makes sense. The reason I mention this as an oddity, though, is that it doesn’t reproduce in 11.1.0.7.

I think I saw a bug note about this combination a few months ago- I was looking for something else at the time and, almost inevitably, I can’t find it when I want it – but I don’t remember whether it was the 11.1 or 11.2 behaviour that was deemed to be correct.

Update

See comments 1 and 2 below. I’ve written about this previously, and the caching bechaviour is the correct behaviour. The patch is in 11.2.0.2 and backports are available for 11.1.0.7 and 11.2.0.1. The patch ensures that the table will be read into the cache if it is smaller than the db_keep_cache_size. (Although we might want to check – see Tanel’s notes – whether this is based on the high water mark recorded in the segment header or on the optimizer stats for the table; and I suppose it might be worth checking that the same feature applies to index fast full scans). From the MoS document:

With the bug fix applied, any object in the KEEP buffer pool, whose size is less than DB_KEEP_CACHE_SIZE, is considered as a small or medium sized object. This will cache the read blocks and avoid subsequent direct read for these objects.

↧

Coincindences

April 19, 2014, 1:22 am

≫ Next: Bigfiles

≪ Previous: Cache anomaly

I had another of those odd timing events today that make me think that Larry Ellison has access to a time machine. I found (yet another example of a) bug that had been reported on MoS just a few days before it appeared on an instance I was running. How is it possible that someone keeps doing things that I’m doing, but just a few days before I do them !

For no good reason I happened to browse through a load of background trace files on an 11.2.0.4 instance and found the following in an “m000″ file:

*** SERVICE NAME:(SYS$BACKGROUND) 2014-04-19 08:55:20.617
*** MODULE NAME:(MMON_SLAVE) 2014-04-19 08:55:20.617
*** ACTION NAME:(Auto-Purge Slave Action) 2014-04-19 08:55:20.617

*** KEWROCISTMTEXEC - encountered error: (ORA-06525: Length Mismatch for CHAR or RAW data
ORA-06512: at "SYS.DBMS_STATS", line 29022
ORA-06512: at line 1
)
  *** SQLSTR: total-len=93, dump-len=93,
      STR={begin dbms_stats.copy_table_stats('SYS', :bind1, :bind2, :bind3, flags=>1, force=>TRUE); end;}

Before trying to track down what had gone wrong I did a quick check on MoS, searching for “copy_table_stats” and “29022″ and found bug 17079301 – fixed in 12.2, and 12.1.0.2, with a patch for 12.1.0.1 (and some back-ports for 11.2.0.4). The description of the bug in the note was basically: “it happens”.

I may get around to looking more closely at what’s gone wrong but as an initial though I’m guessing that, even though the action name is “auto-purge slave action”, this may be something to do with adding a partition to some of the AWR history tables and rolling stats forward – so at some point I’ll probably start by checking for partitions with missing stats in the SYS schema.

The bug note, by the way, was published (last updated, on second thoughts) on 14th April 2014 – just 5 days before I first happened to spot the occurrence of the bug.

↧

Bigfiles

April 29, 2014, 6:12 am

≫ Next: Costing Bug

≪ Previous: Coincindences

It’s always the combinations that catch you out.

Bigfile tablespaces have their uses – especially in big systems

Materialized views have their users – especially in big systems

There’s absolutely no reason why the two technologies should interfere with each other … until you find a bug !

Running an example, stripped to the bare minimum, and doing a couple of things that I personally don’t like doing, on 11.2.0.4:

drop materialized view t1_mv;
drop table t1;

create table t1 (
	id	number constraint t1_pk primary key,
	n1	number,
	n2	number
)
tablespace bigfile_ts
;

create materialized view log on t1
with
	rowid		-- ugh!
;

create materialized view t1_mv
refresh fast on demand
with rowid		-- ugh!
enable query rewrite
as
select
	id, n1
from
	t1
where
	n2 is not null
;

-- Another ugh coming up !
insert into t1 values(1,null,null);
update t1 set n1 = 2, n2 = 1 where id = 1;
commit;

begin
        dbms_mview.refresh(
                list           => 'T1_MV',
                method         => 'F',
                atomic_refresh  => true
        );
end;
/

Things I don’t like:

Doing anything that is strongly dependent on rowid – I’d rather do my materialized view stuff by primary key … but, as Mick Jagger once told us: “You can’t always get what you want”.
Processes that insert an “empty” row and then update it – it’s very inefficient, generates excessive undo and redo, and often leads to row migration

Most significantly I don’t like operations that have worked for ages suddenly crashing when someone adds a new piece of code to the system.

If you run the fragment about, the refresh will fail with the follow string of errors:


ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-00001: unique constraint (TEST_USER.I_SNAP$_T1_MV) violated
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2809
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 3025
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2994
ORA-06512: at line 2

When you try to find out what the unique constraint is, it’s based on a hidden column (M_ROW$$) of type rowid that has been added to the materialized view to emulate a primary key; and, for some reason, if your base table is in a bigfile tablespace and you insert a row and then update it before you’ve run a refresh then you can’t do a fast refresh again until you clean up the mess (e.g. with a complete refresh).

Note – if you complete all your inserts, then refresh, you can update a row as many times as you like without causing subsequent refresh problems – it looks like it’s just “insert it, update it, refresh” (or “insert it, delete it, refresh”) that has a problem because the only way to hit the issue is to get a correct rowid (from an update or delete) and an incorrect rowid (from an insert) for the same row in the log at the same time.

Workaround: if you move the base table to a smallfile tablespace you might just find you get what you need.

This shows up on MoS as bug 17603987, fixed in 12.2.

If you want to dig in a little bit more, you can start by comparing the contents of t1 and mlog$_t1 (the materialized view log table). Critically:

SQL> select rowid from t1;

ROWID
------------------
AAAXN2AAAAAAAJEAAA

1 row selected.

SQL> select m_row$$, dmltype$$ from mlog$_t1;

M_ROW$$                          D
-------------------------------- -
AAAXN2AQAAAAAJEAAA               I
AAAXN2AAAAAAAJEAAA               U

2 rows selected.

SQL> select rowidtochar(m_row$$) m_row$$, dmltype$$ from mlog$_t1;

M_ROW$$                          D
-------------------------------- -
AAAXN2AAAAAAAJEAAA               I
AAAXN2AAAAAAAJEAAA               U

2 rows selected.

If you compare the rowid from t1 with the m_row$$ column (which should hold the rowids of rows from t1) from mlog$_t1 you’ll notice that the insert and the update have been given different values for the source row. Somehow the logging code for the insert has generated the wrong rowid value; however, when you apply a rowidtochar() to the wrong value the return value is the right value – and the merge statement that transfers modified data from the source table into the materialized view uses the rowidtochar() function to join the materialized view log back to the source table, with the result that the same rowid can be inserted twice – except the attempt results in a “duplicate key” error.

↧

Costing Bug

May 2, 2014, 12:53 am

≫ Next: Subquery with OR

≪ Previous: Bigfiles

It’s amazing how you can find little bugs (or anomalies) as soon as you start to look closely at how things work in Oracle. I started to write an article for All Things Oracle last night about execution plans with subqueries, so wrote a little script to generate some sample data, set up the first sample query, checked the execution plan, and stopped because the final cost didn’t make sense. Before going on I should point out that this probably doesn’t matter and probably wouldn’t cause a change in the execution plan if the calculation were corrected – but it is just an interesting indication of the odd things that can happen when sections of modular code are combined in an open-ended way. Here’s the query (running on 11.2.0.4) with execution plan:


update t1 set 
	n1 = (
		select	max(mod100)
		from	t2
		where	t2.id = t1.id
	),
	n2 = (
		select	max(trunc100)
		from	t3
		where	t3.id = t1.id
	)
where
	id between 101 and 200
;

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT              |       |   101 |  1212 |   812  (25)| 00:00:05 |
|   1 |  UPDATE                       | T1    |       |       |            |          |
|*  2 |   INDEX RANGE SCAN            | T1_I1 |   101 |  1212 |     2   (0)| 00:00:01 |
|   3 |   SORT AGGREGATE              |       |     1 |     7 |            |          |
|   4 |    FIRST ROW                  |       |     1 |     7 |     3   (0)| 00:00:01 |
|*  5 |     INDEX RANGE SCAN (MIN/MAX)| T2_I1 |     1 |     7 |     3   (0)| 00:00:01 |
|   6 |   SORT AGGREGATE              |       |     1 |     7 |            |          |
|   7 |    FIRST ROW                  |       |     1 |     7 |     3   (0)| 00:00:01 |
|*  8 |     INDEX RANGE SCAN (MIN/MAX)| T3_I1 |     1 |     7 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID">=101 AND "ID"<=200)
   5 - access("T2"."ID"=:B1)
   8 - access("T3"."ID"=:B1)

So the cost of running each of the subqueries is 3 – there are two of them, and we expect to run each of the 101 times: for a total cost of 606. So how do we get to 812 as the total cost of the query ?

Further testing:

the cost of the plan for updating the two columns with constants is just 4.
rebuild the indexes with different values for pctfree to see how the cost changes
vary the number of columns updated by subquery
check the 10053 trace – for issues or presentation vs. rounding, particularly

Ultimately I decided that for each column updated by subquery the optimizer added 1 to the cost of accessing the table for each row; or, to view it another way, the optimizer used “sum(subquery costs + 1) * number of rows to be updated” so (4 + 4) * 101 + a little bit for the driving table access = 812. This doesn’t seem entirely reasonable – given that a cost is essentially equivalent to assuming that a single block visit is a disk read when we know that when we update multiple columns of the same row we need only read the block into memory at most once. As I said at the start, though this anomaly in costing probably doesn’t matter – there are no further steps to be taken after the update so there’s nothing the optimizer might do differently if the cost of the update had been calculated as 612 rather then 812.

Footnote:

If you want to play about with this query, here’s the code to create the tables – with one proviso, the plan above happens to be one I produced after rebuilding the indexes on t2 and t3 with pctfree 99


create table t1
as
with generator as (
	select  --+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum				id,	
	mod(rownum-1,100)		mod100,
	trunc((rownum - 1)/100)		trunc100,
	rownum				n1,
	rownum				n2,
	lpad(rownum,6,'0')		vc1,
	rpad('x',100)			padding
from
	generator
where
	rownum <= 10000
;

create table t2 as select * from t1;
create table t3 as select * from t1;

create index t1_i1 on t1(id);
create index t2_i1 on t2(id,mod100);
create index t3_i1 on t3(id,trunc100);

begin
	dbms_stats.gather_table_stats(user,'t1');
	dbms_stats.gather_table_stats(user,'t2');
	dbms_stats.gather_table_stats(user,'t3');
end;
/

↧

Subquery with OR

May 15, 2014, 10:23 am

≫ Next: Ignoring Hints

≪ Previous: Costing Bug

Prompted by a pingback on this post, followed in very short order by a related question (with a most gratifying result) on Oracle-L, I decided to write up a note about another little optimizer enhancement that appeared in 12c. Here’s a query that differs slightly from the query in the original article:


select
	id, modded, mod_15
from
	t1
where
	t1.mod_15 = 1                     -- originally t1.mod_15 > 0
and	(   t1.modded is null             -- originally t1.modded = 0
	 or exists (
		select	null
		from	t2
		where	t2.id = t1.modded
	    )
	)
;

As a general principle, the “OR EXISTS” stops the optimizer from unnesting the subquery, so my original article suggested a workaround that required you to rewrite the query with a UNION ALL, using the lnnvl() function (where possible) as the easy way to eliminate accidental duplication. Take a look at the plans for my new query, though – first in 11.2.0.4, then in 12.1.0.1:


Execution Plan for 11.2.0.4
----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |    34 |   374 |    50   (0)| 00:00:01 |
|*  1 |  FILTER            |       |       |       |            |          |
|*  2 |   TABLE ACCESS FULL| T1    |   667 |  7337 |    50   (0)| 00:00:01 |
|*  3 |   INDEX UNIQUE SCAN| T2_PK |     1 |     3 |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("T1"."MODDED" IS NULL OR  EXISTS (SELECT 0 FROM "T2" "T2"
              WHERE "T2"."ID"=:B1))
   2 - filter("T1"."MOD_15"=1)
   3 - access("T2"."ID"=:B1)

Execution Plan for 12.1.0.1
------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |    27 |   378 |    50   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI NA|       |    27 |   378 |    50   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | T1    |   667 |  7337 |    50   (0)| 00:00:01 |
|*  3 |   INDEX UNIQUE SCAN  | T2_PK |     1 |     3 |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."MOD_15"=1)
   3 - access("T2"."ID"="T1"."MODDED")

As expected, 11.2.0.4 has had to use a filter subquery approach – but 12.1.0.1 has found a different path. For this special “is null” case the optimizer has unnested the subquery and used a “null aware (NA) semi-join”. In this very small example there is no change in the reported cost, and the mechanics of the execution plan will be quite similar at run time – but in real systems there are bound to be cases where the new strategy is more efficient.

Unfortunately …

Bug 18650065 (fixed in 12.2) rears it’s ugly head: WRONG RESULTS ON QUERY WITH SUBQUERY USING OR EXISTS.
I can demonstrate this with the following code:


update t1 set modded = null
where id <= 30;
commit;

select
	id, modded, mod_15
from
	t1
where
	t1.id = 1                     -- previously mod_15 = 1
and	(   t1.modded is null
	 or exists (
		select	null
		from	t2
		where	t2.id = t1.modded
	    )
	)
;

alter table t1 add constraint t1_pk primary key(id);

select
	id, modded, mod_15
from
	t1
where
	t1.id = 1                     -- previously mod_15 = 1
and	(   t1.modded is null
	 or exists (
		select	null
		from	t2
		where	t2.id = t1.modded
	    )
	)
;

And here’s the output from the above script:


30 rows updated.

Commit complete.

        ID     MODDED     MOD_15
---------- ---------- ----------
         1                     1

1 row selected.

Table altered.

no rows selected

I’ve modified a few rows so that the “null-aware” bit of the new transformation matters, but I’ve now got a data set and transformation where I get the wrong results because I’ve defined a primary key (unique would have done) on a critical column in the query. If you check the execution plan you’ll find that the optimizer has switched from a null aware semi-join to a simple nested loop join.

There is a workaround for this problem – disable the relevant feature:

alter session set "_optimizer_null_accepting_semijoin"=false;

For Reference:

Here’s the SQL to generate the data for the above demonstration:

create table t1
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 5000
)
select
	rownum			id,
	mod(rownum,999)		modded,
	mod(rownum,15)		mod_15,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 10000
;

update t1 set modded = null where modded = 26;

create index t1_i1 on t1(id);
create index t1_i2 on t1(modded);

create table t2
as
select
	2 * rownum		id,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	all_Objects
where
	rownum <= 20
;	

alter table t2 add constraint t2_pk primary key(id);

↧

Ignoring Hints

May 19, 2014, 10:21 am

≫ Next: Securefiles

≪ Previous: Subquery with OR

Does Oracle ignore hints – not if you use them correctly, and sometimes it doesn’t ignore them even when you use them incorrectly!

Here’s an example that I’ve run on 11.2.0.4 and 12.1.0.1


create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	rownum			n1,
	rpad('x',100)		padding
from
	generator	v1
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create index t1_i1 on t1(id);
alter index t1_i1 unusable;

select n1 from t1 where id = 15;
select /*+ index(t1 (id)) */ n1 from t1 where id = 15;

Any guesses about the output from the last 4 statements ?

Index created.

Index altered.

        N1
----------
        15

1 row selected.

select /*+ index(t1 (id)) */ n1 from t1 where id = 15
*
ERROR at line 1:
ORA-01502: index 'TEST_USER.T1_I1' or partition of such index is in unusable state

That’s a pretty convincing display of Oracle not ignoring hints.

Update:

Technically, of course, I haven’t demonstrated that Oracle is not ignoring the hint (i.e. that it’s obeying the hint – if you want to avoid the double negative) until I demonstrate that in the absence of the hint the error would not occur – but that task is left as an exercise to the reader.

↧

Securefiles

May 29, 2014, 10:15 am

≫ Next: Delete Costs

≪ Previous: Ignoring Hints

A few weeks ago someone emailed me about a problem they had importing securefiles – it was very slow. Such things are never easy to address by email, of course, but there were three features to consider: (a) it was securfiles, (b) it was impdp, and (c) it was across a database link. If you read my blog regularly you’ll have seen me comment a few times that the easiest way to break Oracle is to mix a few features – so

securefiles and impdp (I know why LOBs generally appear to be slow to import, was it “LOBs” or specifically securefile LOBs)
securefiles and database links (db links are always slower than local actions – easy to do a comparative test)
impdp and database links (probably not, they’re supposed to work very well together in general – could do a local/remote comparison)
impdp with securefiles across a database link (easy enough to factor out the database link)

It was just a brief email, and I didn’t have an answer offhand, so I pointed out that there were a few bugs on MoS about impdp and LOBs and left it at that. A couple of days ago I got a follow-up email telling me that the problem was Bug 13609098 : IMPORTING SMALL SECUREFILE LOBS USING DATA PUMP IS SLOW.

There are two reasons for writing this note – the first, of course, is just to publicise the bug because I’ve seen three of four complaints over the Internet about slow imports with LOBs and maybe a couple of those were actually “small securefile LOBs”; and then it’s possible that there are other people who haven’t even realised that their imports could be running faster.

The second reason, though, is to highlight a viewpoint that leaves me approaching Oracle features with extreme caution: this looks like the sort of bug that many people should have noticed, but the first reference is Jan 2012, and the earliest patch seems to be dated Oct 2013 – 22 months later! There could be various reasons for the long gap – but the one that always comes to my mind first in cases like this is: “are there so few people using ‘feature X’ that this bug stayed near the bottom of the todo list for a long time ?” – followed by the slightly less alarmist “maybe there are quite a lot of people, but very few have noticed” and “but the specific combination is, perhaps, just a little unlikely”. If there really are very few people using the feature then I’m not going to be keen to advise a client to take it on without doing an extremely careful set of tests – at scale – of everything they’re likely to do with the feature. I don’t want something to break after go-live and find that it take weeks to identify the root cause and months to fix.

In this particular case I’ll believe that the combination of Securefile LOBs (“large” objects) that were actually small and in large numbers is significant. I’m prepared to assume that the customer base using Securefiles is a reasonable size but the subset who hit this combination is a small fraction of the whole; and that means I won’t be quite so paranoid about suggesting Securefiles as an option to a client – though I’d still insist on modelling any special cases that their requirements might highlight.

Footnote

The bug is fixed in 12.2 with several backports to 11.2.0.x for different platforms.

↧

Delete Costs

June 19, 2014, 10:18 am

≫ Next: Group By Bug

≪ Previous: Securefiles

One of the quirky little anomalies of the optimizer is that it’s not allowed to select rows from a table after doing an index fast full scan (index_ffs) even if it is obviously the most efficient (or, perhaps, least inefficient) strategy. For example:


create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	mod(rownum,100)		n1,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e5
;

create index t1_i1 on t1(id, n1);
alter table t1 modify id not null;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

explain plan for
select /*+ index_ffs(t1) */ max(padding) from t1 where n1 = 0;

select * from table(dbms_xplan.display(null,null,'outline -note'));

In this case we can see that there are going to be 1,000 rows where n1 = 0 spread evenly across the whole table so a full tablescan is likely to be the most efficient strategy for the query, but we can tell the optimizer to do an index fast full scan with the hint that I’ve shown, and if the hint is legal (which means there has to be at least one column in it declared as not null) the optimizer should obey it. So here’s the plan my hinted query produced:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |   104 |   207   (4)| 00:00:02 |
|   1 |  SORT AGGREGATE    |      |     1 |   104 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  1000 |   101K|   207   (4)| 00:00:02 |
---------------------------------------------------------------------------

We’d have to examine the 10053 trace file to be certain, but it seems the optimizer won’t consider doing an index fast full scan followed by a trip to the table for a select statement (in passing, Oracle would have obeyed the skip scan – index_ss() – hint). It’s a little surprising then that the optimizer will obey the hint for a delete:


explain plan for
delete /*+ index_ffs(t1) cluster_by_rowid(t1) */ from t1 where n1 = 0;

select * from table(dbms_xplan.display(null,null,'outline -note'));

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | DELETE STATEMENT      |       |  1000 |  8000 |    38  (11)| 00:00:01 |
|   1 |  DELETE               | T1    |       |       |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_I1 |  1000 |  8000 |    38  (11)| 00:00:01 |
-------------------------------------------------------------------------------

You might note three things from this plan. First, the optimizer can consider a fast full scan followed by a table visit (so why can’t we do that for a select); secondly that the cost of the delete statement is only 38 whereas the cost of the full tablescan in the earlier query was much larger at 207 – surprisingly Oracle had to be hinted to consider this fast full scan path, despite the fact that the cost was cheaper than the cost of the tablescan path it would have taken if I hadn’t included the hint; finally you might note the cluster_by_rowid() hint in the SQL – there’s no matching “Sort cluster by rowid” operation in the plan, even though this plan came from 11.2.0.4 where the mechanism and hint are available.

The most interesting of the three points is this: there is a bug recorded for the second one (17908541: CBO DOES NOT CONSIDER INDEX_FFS) reported as fixed in 12.2 – I wonder if this means that an index fast full scan followed by table access by rowid will also be considered for select statements in 12.2.

Of course, there is a trap – and something to be tested when the version (or patch) becomes available. Why is the cost of the delete so low (only 38, the cost of the index fast full scan) when the number of rows to be deleted is 1,000 and they’re spread evenly through the table ? It’s because the cost of a delete is actually calculated as the cost of the query: “select the rowids of the rows I want to delete but don’t worry about the cost of going to the rows to delete them (or the cost of updating the indexes that will have to be maintained, but that’s a bit irrelevant to the choice anyway)”.

So when Oracle does do a delete following an index fast full scan in 12.2, will it be doing it because it’s the right thing to do, or because it’s the wrong thing ?

To be continued … (after the next release/patch).

↧

Group By Bug

September 4, 2014, 9:11 am

≫ Next: Plan depth

≪ Previous: Delete Costs

This just in from OTN Database Forum – a surprising little bug with “group by elimination” exclusive to 12c.


alter session set nls_date_format='dd-Mon-yyyy hh24:mi:ss';

select
       /* optimizer_features_enable('12.1.0.1')*/
       trunc (ts,'DD') ts1, sum(fieldb) fieldb
from (
  select
        ts, max(fieldb) fieldb
  from (
  select trunc(sysdate) - 1/24 ts, 1 fieldb from dual
  union all
  select trunc(sysdate) - 2/24 ts, 2 fieldb from dual
  union all
  select trunc(sysdate) - 3/24 ts, 3 fieldb from dual
  union all
  select trunc(sysdate) - 4/24 ts, 4 fieldb from dual
  union all
  select trunc(sysdate) - 5/24 ts, 5 fieldb from dual
  )
  group by ts
)
group by trunc (ts,'DD')
/

You might expect to get one row as the answer – but this is the result I got, with the execution plan pulled from memory:


TS1                      FIELDB
-------------------- ----------
03-Sep-2014 00:00:00          1
03-Sep-2014 00:00:00          5
03-Sep-2014 00:00:00          4
03-Sep-2014 00:00:00          2
03-Sep-2014 00:00:00          3

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |       |       |    11 (100)|          |
|   1 |  HASH GROUP BY   |      |     5 |    60 |    11  (10)| 00:00:01 |
|   2 |   VIEW           |      |     5 |    60 |    10   (0)| 00:00:01 |
|   3 |    UNION-ALL     |      |       |       |            |          |
|   4 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
|   5 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
|   6 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
|   7 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
|   8 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------

You’ll notice that I’ve got an “optimizer_features_enable()” comment in the code: if I change it into a hint I get the following (correct) result and plan:


TS1                      FIELDB
-------------------- ----------
03-Sep-2014 00:00:00         15

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |       |       |    12 (100)|          |
|   1 |  HASH GROUP BY   |      |     5 |    60 |    12  (17)| 00:00:01 |
|   2 |   VIEW           |      |     5 |    60 |    11  (10)| 00:00:01 |
|   3 |    HASH GROUP BY |      |     5 |    60 |    11  (10)| 00:00:01 |
|   4 |     VIEW         |      |     5 |    60 |    10   (0)| 00:00:01 |
|   5 |      UNION-ALL   |      |       |       |            |          |
|   6 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|   7 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|   8 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|   9 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|  10 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------

Somehow 12.1.0.2 has managed to get confused by the combination of “group by ts” and “group by trunc(ts,’DD’)” and has performed “group-by elimination” when it shouldn’t have. If you use the ‘outline’ option for dbms_xplan.display_cursor() you’ll find that the bad result reports the hint elim_groupby(@sel$1), which leads to an alternative solution to hinting the optimizer_features level. Start the code like this:


select
       /*+ qb_name(main) no_elim_groupby(@main) */
       trunc (ts,'DD') ts1, sum(fieldb) fieldb
from  ...

The (no_)elim_groupby is a hint that appeared in v$sql_hints only in the 12.1.0.2.

↧

Plan depth

October 19, 2014, 10:20 am

≫ Next: first_rows(10)

≪ Previous: Group By Bug

A recent posting on OTN reminded me that I haven’t been poking Oracle 12c very hard to see which defects in reporting execution plans have been fixed. The last time I wrote something about the problem was about 20 months ago referencing 11.2.0.3; but there are still oddities and irritations that make the nice easy “first child first” algorithm fail because the depth calculated by Oracle doesn’t match the level that you would get from a connect-by query on the underlying plan table. Here’s a simple fail in 12c:


create table t1
as
select
	rownum 			id,
	lpad(rownum,200)	padding
from	all_objects
where	rownum <= 2500
;

create table t2
as
select	* from t1
;

-- call dbms_stats to gather stats

explain plan for
select
	case mod(id,2)
		when 1 then (select max(t1.id) from t1 where t1.id <= t2.id)
		when 0 then (select max(t1.id) from t1 where t1.id >= t2.id)
	end id
from	t2
;

select * from table(dbms_xplan.display);

It ought to be fairly clear that the two inline scalar subqueries against t1 should be presented at the same level in the execution hierarchy; but here’s the execution plan you get from Oracle:

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |  2500 | 10000 | 28039   (2)| 00:00:02 |
|   1 |  SORT AGGREGATE      |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   125 |   500 |    11   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  4 |     TABLE ACCESS FULL| T1   |   125 |   500 |    11   (0)| 00:00:01 |
|   5 |  TABLE ACCESS FULL   | T2   |  2500 | 10000 |    11   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."ID"<=:B1)
   4 - filter("T1"."ID">=:B1)

As you can see, the immediate (default?) visual impression you get from the plan is that one of the subqueries is subordinate to the other. On the other hand if you check the id and parent_id columns from the plan_table you’ll find that lines 1 and 3 are both direct descendents of line 0 – so they ought to have the same depth. The plan below is what you get if you run the 8i query from utlxpls.sql against the plan_table.


SQL> select id, parent_id from plan_table;

        ID  PARENT_ID
---------- ----------
         0
         1          0
         2          1
         3          0
         4          3
         5          0

--------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |          |     2K|    9K|  28039 |       |       |
|  SORT AGGREGATE           |          |     1 |    4 |        |       |       |
|   TABLE ACCESS FULL       |T1        |   125 |  500 |     11 |       |       |
|  SORT AGGREGATE           |          |     1 |    4 |        |       |       |
|   TABLE ACCESS FULL       |T1        |   125 |  500 |     11 |       |       |
|  TABLE ACCESS FULL        |T2        |     2K|    9K|     11 |       |       |
--------------------------------------------------------------------------------

So next time you see a plan and the indentation doesn’t quite seem to make sense, perhaps a quick query to select the id and parent_id will let you check whether you’ve found an example where the depth calculation produces a misleading result.

Update 20th Oct 2014

A question via twitter – does the error also show up with dbms_xplan.display_cursor(), SQL tuning sets, AWR, etc. or is it just a defect of explain plan. Since the depth is (probably) a derived value for display purposes that Oracle doesn’t use internally for executing the plan I would be inclined to assume that the defect is universal, but I’ve only checked it through explain plan/display, and through execution/display_cursor().

↧

first_rows(10)

October 31, 2014, 10:31 am

≫ Next: 12c Downgrade

≪ Previous: Plan depth

No, not the 10th posting about first_rows() this week – whatever it may seem like – just an example that happens to use the “calculate costs for fetching the first 10 rows” optimizer strategy and does it badly. I think it’s a bug, but it’s certainly a defect that is a poster case for the inherent risk of using anything other than all_rows optimisation. Here’s some code to build a couple of sample tables:


begin
	dbms_stats.set_system_stats('MBRC',16);
	dbms_stats.set_system_stats('MREADTIM',12);
	dbms_stats.set_system_stats('SREADTIM',5);
	dbms_stats.set_system_stats('CPUSPEED',1000);
end;
/

create table t1
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
)
select
	rownum					id,
	trunc(dbms_random.value(1,1000))	n1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;

create index t1_n1 on t1(id, n1);

create table t2
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
)
select
	rownum					id,
	trunc(dbms_random.value(10001,20001))	x1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;

create index t2_i1 on t2(x1);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		estimate_percent => 100,
		method_opt	 => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		estimate_percent => 100,
		method_opt	 => 'for all columns size 1'
	);

end;
/


create or replace view  v1
as
select 
	id, n1, small_vc, padding
from	t1 
where	n1 between 101 and 300
union all
select 
	id, n1, small_vc, padding
from	t1 
where	n1 between 501 and 700
;

The key feature of this demonstration is the UNION ALL view and what the optimizer does with it when we have first_rows_N optimisation – this is a simplified model of a production problem I was shown a couple of years ago, so nothing special, nothing invented. Here’s a query that behaves badly:


select
	/*+ gather_plan_statistics */
	v1.small_vc,
	v1.n1
from
	v1,
	t2
where
	t2.id = v1.id
and	t2.x1 = 15000
;

I’m going to execute this query in three different ways – as is, using all_rows optimisation; as is, using first_rows_10 optimisation, then using all_rows optimisation but with the necessary hints to make it follow the first_rows_10 execution path. Here are the resulting plans from an instance of 12.1.0.2 (the same thing happens in 11.2.0.4):


first_rows_10 plan
----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |     1 |    35 |   107   (0)| 00:00:01 |
|*  1 |  HASH JOIN                           |       |     1 |    35 |   107   (0)| 00:00:01 |
|   2 |   VIEW                               | V1    |    24 |   600 |     4   (0)| 00:00:01 |
|   3 |    UNION-ALL                         |       |       |       |            |          |
|*  4 |     TABLE ACCESS FULL                | T1    |    12 |   240 |     2   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL                | T1    |    12 |   240 |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |   100 |  1000 |   103   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN                  | T2_I1 |   100 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

all_rows plan
------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |       |    40 |  1400 |   904   (1)| 00:00:01 |
|   1 |  NESTED LOOPS                          |       |    40 |  1400 |   904   (1)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED  | T2    |   100 |  1000 |   103   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                    | T2_I1 |   100 |       |     3   (0)| 00:00:01 |
|   4 |   VIEW                                 | V1    |     1 |    25 |     8   (0)| 00:00:01 |
|   5 |    UNION ALL PUSHED PREDICATE          |       |       |       |            |          |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    20 |     4   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                  | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
|   8 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    20 |     4   (0)| 00:00:01 |
|*  9 |      INDEX RANGE SCAN                  | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

first_rows_10 plan hinted under all_rows optimisation
---------------------------------------------------------------------------------------------- 
| Id  | Operation                    | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | 
---------------------------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT             |       |   200 |  8600 |       |  6124   (3)| 00:00:01 | 
|*  1 |  HASH JOIN                   |       |   200 |  8600 |    17M|  6124   (3)| 00:00:01 |
|   2 |   VIEW                       | V1    |   402K|    12M|       |  5464   (3)| 00:00:01 | 
|   3 |    UNION-ALL                 |       |       |       |       |            |          | 
|*  4 |     TABLE ACCESS FULL        | T1    |   201K|  3933K|       |  2731   (3)| 00:00:01 | 
|*  5 |     TABLE ACCESS FULL        | T1    |   201K|  3933K|       |  2733   (3)| 00:00:01 | 
|   6 |   TABLE ACCESS BY INDEX ROWID| T2    |   100 |  1000 |       |   103   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN          | T2_I1 |   100 |       |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

I’m not sure why the first_rows_10 plan uses “table access by rowid batched”, but I’d guess it’s because the optimiser calculates that sorting the index rowids before visiting the table may have a small benefit on the speed of getting the first 10 rows – eventually I’ll get around to checking the 10053 trace file. The important thing, though, is the big mistake in the strategy, not the little difference in table access.

In the first_rows_10 plan the optimizer has decided building an in-memory hash table from the UNION ALL of the rows fetched from the two copies of the t1 table will be fast and efficient; but it’s made that decision based on the assumption that it will only get 10 rows from each copy of the table – and at run-time it HAS to get all the relevant t1 rows to build the hash table before it can get any t2 rows. We can get some idea of the scale of this error when we look at the hinted plan under all_rows optimisation – it’s a lot of redundant data and a very expensive hash table build.

In contrast the all_rows plan does an efficient indexed access into the t2 table then, for each row, does a join predicate pushdown into the union all view using an indexed access path. If we only wanted to fetch 10 rows we could stop after doing a minimum amount of work. To demonstrate the error more clearly I’ve re-run the experiment for the first two plans from SQL*PLus, setting the arraysize to 11, the pagesize to 5, and stopping after the first 10 rows. Here are the plans showing the rowsource execution stats:


first_rows_10 plan
------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |   107 (100)|     12 |00:00:00.43 |   35150 |       |       |          |
|*  1 |  HASH JOIN                           |       |      1 |      1 |   107   (0)|     12 |00:00:00.43 |   35150 |    24M|  3582K|   23M (0)|
|   2 |   VIEW                               | V1    |      1 |     24 |     4   (0)|    400K|00:00:06.48 |   35118 |       |       |          |
|   3 |    UNION-ALL                         |       |      1 |        |            |    400K|00:00:04.20 |   35118 |       |       |          |
|*  4 |     TABLE ACCESS FULL                | T1    |      1 |     12 |     2   (0)|    200K|00:00:00.71 |   17559 |       |       |          |
|*  5 |     TABLE ACCESS FULL                | T1    |      1 |     12 |     2   (0)|    200K|00:00:00.63 |   17559 |       |       |          |
|   6 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |      1 |    100 |   103   (0)|     28 |00:00:00.01 |      32 |       |       |          |
|*  7 |    INDEX RANGE SCAN                  | T2_I1 |      1 |    100 |     3   (0)|     28 |00:00:00.01 |       4 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------

all_rows plan
-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |       |      1 |        |   904 (100)|     12 |00:00:00.01 |     213 |
|   1 |  NESTED LOOPS                          |       |      1 |     43 |   904   (1)|     12 |00:00:00.01 |     213 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED  | T2    |      1 |    100 |   103   (0)|     28 |00:00:00.01 |      32 |
|*  3 |    INDEX RANGE SCAN                    | T2_I1 |      1 |    100 |     3   (0)|     28 |00:00:00.01 |       4 |
|   4 |   VIEW                                 | V1    |     28 |      1 |     8   (0)|     12 |00:00:00.01 |     181 |
|   5 |    UNION ALL PUSHED PREDICATE          |       |     28 |        |            |     12 |00:00:00.01 |     181 |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     28 |    212K|     4   (0)|      8 |00:00:00.01 |      93 |
|*  7 |      INDEX RANGE SCAN                  | T1_N1 |     28 |      1 |     3   (0)|      8 |00:00:00.01 |      85 |
|   8 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     28 |    213K|     4   (0)|      4 |00:00:00.01 |      88 |
|*  9 |      INDEX RANGE SCAN                  | T1_N1 |     28 |      1 |     3   (0)|      4 |00:00:00.01 |      84 |
-----------------------------------------------------------------------------------------------------------------------

If I had set the optimizer_mode to first_rows_10 because I really only wanted to fetch (about) 10 rows then I’ve managed to pay a huge overhead in buffer visits, memory and CPU for the privilege – the all_rows plan was much more efficient.

Remember – we often see cases where the first_rows(n) plan will do more work to get the whole data set in order to be able to get the first few rows more quickly (the simplest example is when the optimizer uses a particular index to get the first few rows of a result set in order without sorting rather than doing a (faster) full tablescan with sort. This case, though, is different: the optimizer is choosing to build a hash table as if it only has to put 10 rows into that hash table when it actually HAS to build the whole has table before it can take any further steps – we don’t get 10 rows quicker and the rest more slowly; we just get 10 very slow rows.

Footnote:

It’s possible that this is an example of bug 9633142: (FIRST_ROWS OPTIMIZER DOES NOT PUSH PREDICATES INTO UNION INLINE VIEW) but that’s reported as fixed in 12c, with a couple of patches for 11.2.0.2/3. However, setting “_fix_control”=’4887636:off’, does bypass the problem. (The fix control, introduced in 11.1.0.6 has description: “remove restriction from first K row optimization”)

↧

12c Downgrade

July 20, 2015, 5:12 am

≫ Next: Invalidation

≪ Previous: first_rows(10)

No, not really – but sometimes the optimizer gets better and gives you worse performance as a side effect when you upgrade. Here’s an example where 11.2.0.4 recognised (with a few hints) the case for a nested loop semi-join and 12c went a bit further and recognised the opportunity for doing a cunning “semi_to_inner” transformation … which just happened to do more work than the 11g plan.

Here’s a data set to get things going, I’ve got “parent” and “child” tables, but in this particular demonstration I won’t be invoking referential integrity:


create table chi
as
with generator as (
        select  --+ materialize
                rownum  id
        from dual
        connect by
                level &lt;= 1e4
)
select
        rownum - 1                              id,
        trunc((rownum-1)/10)                    n1,
        trunc(dbms_random.value(0,1000))        n2,
        rpad('x',1000)                          padding
from
        generator
;

create table par
as
with generator as (
        select  --+ materialize
                rownum  id
        from dual
        connect by
                level &lt;= 1e4
)
select
        rownum - 1      id,
        rpad('x',1000)  padding
from
        generator
where
        rownum &lt;= 1e3
;

alter table par modify id not null;
alter table par add constraint par_pk primary key(id)
-- deferrable
;

-- Now gather stats on the tables.

The code uses my standard framework that could generate a few million rows even though it’s only generating 1,000 in par and 10,000 in chi. The presence of the commented “deferrable” for the primary key constraint is for a secondary demonstration.

You’ll notice that the 1,000 values that appear in chi.n1 and chi.n2 are matched by the 1,000 rows that appear in the primary key of par – in some other experiment I’ve got two foreign keys from chi to par. Take note that the values in n1 are very well clustered because of the call to trunc() while the values in n2 are evenly scattered because of the call to dbms_random() – the data patterns are very different although the data content is very similar (the randomised data will still produce, on average, 10 rows per value).

So here’s the test code:


set serveroutput off
set linesize 156
set trimspool on
set pagesize 60

alter session set statistics_level = all;

prompt  =============================
prompt  Strictly ordered driving data
prompt  =============================

select
        /*+
                leading(@sel$5da710d3 chi@sel$1 par@sel$2)
                full   (@sel$5da710d3 chi@sel$1)
                use_nl (@sel$5da710d3 par@sel$2)
                index  (@sel$5da710d3 par@sel$2 (par.id))
        */
        count(*)
from
        chi
where   exists (
                select null
                from par
                where par.id = chi.n1
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline alias cost'));

prompt  =============================
prompt  Randomly ordered driving data
prompt  =============================

select
        /*+
                leading(@sel$5da710d3 chi@sel$1 par@sel$2)
                full   (@sel$5da710d3 chi@sel$1)
                use_nl (@sel$5da710d3 par@sel$2)
                index  (@sel$5da710d3 par@sel$2 (par.id))
        */
        count(*)
from
        chi
where   exists (
                select null
                from par
                where par.id = chi.n2
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline alias cost'));

set serveroutput on
alter session set statistics_level = typical;

In both cases I’ve hinted the query into running with a nested loop semi-join from chi to par. Since there are 10,000 rows in chi with no filter predicates, you might expect to see the probe into the par table starting 10,000 times returning (thanks to our perfect data match) one row for each start. Here are the run-time plans with rowsource execution stats from 11.2.0.4

=============================
Strictly ordered driving data
=============================

--------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   190 (100)|      1 |00:00:00.14 |    1450 |   1041 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.14 |    1450 |   1041 |
|   2 |   NESTED LOOPS SEMI |        |      1 |  10065 |   190   (4)|  10000 |00:00:00.12 |    1450 |   1041 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10065 |   186   (2)|  10000 |00:00:00.07 |    1434 |   1037 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |   1000 |   1048 |     0   (0)|   1000 |00:00:00.01 |      16 |      4 |
--------------------------------------------------------------------------------------------------------------

=============================
Randomly ordered driving data
=============================

-----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   190 (100)|      1 |00:00:00.12 |    5544 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.12 |    5544 |
|   2 |   NESTED LOOPS SEMI |        |      1 |  10065 |   190   (4)|  10000 |00:00:00.10 |    5544 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10065 |   186   (2)|  10000 |00:00:00.02 |    1434 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |   4033 |   1048 |     0   (0)|   4033 |00:00:00.02 |    4110 |
-----------------------------------------------------------------------------------------------------

Notice how we do 1,000 starts of operation 4 when the data is well ordered, and 4,033 starts when the data is randomly ordered. For a semi-join nested loop the run-time engine uses the same caching mechanism as it does for scalar subqueries – a fact you can corroborate by removing the current hints and putting the /*+ no_unnest */ hint into the subquery so that you get a filter subquery plan, in which you will note exactly the same number of starts of the filter subquery.

As an extra benefit you’ll notice that the index probes for the well-ordered data have managed to take advantage of buffer pinning (statistic “buffer is pinned count”) – keeping the root block and most recent leaf block of the par_pk index pinned almost continually through the query; while the randomised data access unfortunately required Oracle to unpin and repin the index leaf blocks (even though there were only 2 in the index) as the scan of chi progessed.

Time to upgrade to 12.1.0.2 and see what happens:

=============================
Strictly ordered driving data
=============================

--------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   189 (100)|      1 |00:00:00.22 |    1448 |   1456 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.22 |    1448 |   1456 |
|   2 |   NESTED LOOPS      |        |      1 |  10000 |   189   (4)|  10000 |00:00:00.20 |    1448 |   1456 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10000 |   185   (2)|  10000 |00:00:00.03 |    1432 |   1429 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |  10000 |      1 |     0   (0)|  10000 |00:00:00.06 |      16 |     27 |
--------------------------------------------------------------------------------------------------------------

=============================
Randomly ordered driving data
=============================

--------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   189 (100)|      1 |00:00:00.22 |   11588 |   1429 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.22 |   11588 |   1429 |
|   2 |   NESTED LOOPS      |        |      1 |  10000 |   189   (4)|  10000 |00:00:00.19 |   11588 |   1429 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10000 |   185   (2)|  10000 |00:00:00.03 |    1432 |   1429 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |  10000 |      1 |     0   (0)|  10000 |00:00:00.07 |   10156 |      0 |
--------------------------------------------------------------------------------------------------------------

Take a close look at operation 2 – it’s no longer a NESTED LOOP SEMI, the optimizer has got so smart (recognising the nature of the primary key on par) that it’s done a “semi_to_inner” transformation. But a side effect of the transformation is that the scalar subquery caching mechanism no longer applies so we probe the par table 10,000 times. When the driving data is well-ordered this hasn’t made much difference to the buffer gets (and related latch activity), but when the data is randomised the extra probes ramp the buffer gets up even further.

The timings (A-time) on all these experiments are not particularly trustworthy – the differences between cached reads and direct path reads introduced more variation than the difference in Starts and Buffers, and the total CPU load is pretty small anyway – and I suspect that this difference won’t make much difference to most people most of the time. No doubt, though, there will be a few cases where a small change like this could have a noticeable effect on some important queries.

Footnote

There is a hint /*+ no_semi_to_inner(@queryblock object_alias) */ that I thought might persuade the optimizer to stick with the semi-join, but it didn’t have any effect. Since the “semi to inner” transformation (and the associated hints) are available in 11.2.0.4 I was a little puzzled that (a) I didn’t see the same transformation in the 11g test, and (b) that I couldn’t hint the transformation. This makes me wonder if there’s a defect in 11g that might be fixed in a future patch.

It’s also nice to think that the scalar subquery caching optimisation used in semi-joins might also become available to standard joins (in cases such as “join to parent”, perhaps).

↧

Invalidation

July 22, 2015, 4:45 am

≫ Next: Subquery Factoring (10)

≪ Previous: 12c Downgrade

Someone who attended my sessions at the Bucharest Oracle Summit earlier on this year sent me an example of a quirky little bug, possibly related to the newer “fine-grained” invalidation mechanisms, possibly related to ANSI syntax SQL, that’s very easy to reproduce. (That’s always nice for Oracle support – a perfect test case.)

All it takes is two tables and a packaged procedure that queries those tables. The package is coded to do something that should not be allowed in production code; but “should not” and “is not” are very different things. For anyone who wants to play with the example, here’s the script to create the necessary objects:


drop package pkg_test;
drop table t2 purge;
drop table t1 purge;

create table t1 (id1 number, val1 varchar2(10));
create table t2 (id2 number, val2 varchar2(10));

insert into t1 values(1,rpad('x',10,'x'));
insert into t2 values(1,rpad('x',10,'x'));

execute dbms_stats.gather_table_stats(user,'t1')
execute dbms_stats.gather_table_stats(user,'t2')

create or replace package pkg_test is
   procedure pr_call;
end pkg_test;
/

create or replace package body pkg_test as

   procedure pr_call is

      cursor cur_ids is
         select *                   -- Naughty !
           from t1
           join t2
             on t2.id2 = t1.id1
         ;

      rec_id cur_ids%rowtype := null;

   begin
      open cur_ids;
      fetch cur_ids into rec_id;
      close cur_ids;
      dbms_output.put_line(rec_id.val1 || '-' || rec_id.val2);

   exception
      when others then
         if cur_ids%isopen then
            close cur_ids;
         end if;
         raise;

   end pr_call;

end pkg_test;
/

Having created the procedure I’m now going to call it – and then add a column to table t1. What’s that going to do to a packaged procedure with a “select *”?

Pause for thought …

Here’s some SQL to run the test.


set serveroutput on

prompt   *** Make a first call to the procedure: no error ***
execute  pkg_test.pr_call

prompt   *** add a column to one of the tables
alter table t1 add col_test varchar2(20);

prompt   *** Make two more calls to the procedure: ouch! ***
execute  pkg_test.pr_call
execute  pkg_test.pr_call

prompt  *** Recompile before a third call ***
execute  dbms_ddl.alter_compile('package body', user, 'pkg_test')
execute  pkg_test.pr_call

Unless I’ve managed to cut-n-paste the wrong bits of code, you would have got the following error for the 2nd and 3rd calls to the package:


BEGIN
*
ERROR at line 1:
ORA-00932: inconsistent datatypes: expected - got -
ORA-06512: at "TEST_USER.PKG_TEST", line 25
ORA-06512: at line 2

The package body should (I believe) have invalidated and recompiled itself for the second execution, and even if it failed on the first attempt surely it should have invalidated itself on the ORA-932 and recompiled itself and succeeded on the third execution. (If you remove the exception clause you’ll find that the error is intially raised at the fetch, by the way).

If we change the “select *” to explicitly name the columns we want, viz:“select t1.id1, t1.val1, t2.id2, t2.val2″ we don’t get the ORA-00932 errors (just as we would probably expect). What we might not expect is that the errors also disappear if we leave the “select *” in place but change the query from ANSI syntax to traditional Oracle syntax.

Footnote:

Obviously you shouldn’t use the lazy “*” notation in any production code – it can cause several different problems (including the dangers of “whoops, I didn’t mean to make that one invisible”) – but if you do you may find that you end up with packaged procedures that crash for no apparent reason until you recompile them. Perhaps ORA-00932 is the only possible error message, but maybe it’s possible to cause other errors to appear. Even worse, though I haven’t tried to force it yet, you may find that you can construct cases where the package reports no error but modifies the wrong data.

I’ve tested this code on versions 11.2.0.4 and 12.1.0.2 and see the same results on both.

↧

Subquery Factoring (10)

July 27, 2015, 5:26 am

≫ Next: IN/EXISTS bugs

≪ Previous: Invalidation

What prompted me to write my previous note about subquerying was an upgrade to 12c, and a check that a few critical queries would not do something nasty on the upgrade. As ever it’s always interesting how many little oddities you can discover while looking closely as some little detail of how the optimizer works. Here’s an oddity that came up in the course of my ~~playing around~~ investigation in 12.1.0.2 – first some sample data:


create table t1
nologging
as
select * from all_objects;

create index t1_i1 on t1(owner) compress nologging;

begin
        dbms_stats.gather_table_stats(
                ownname          =&gt; user,
                tabname          =&gt;'T1',
                method_opt       =&gt; 'for all columns size 1 for columns owner size 254'
        );
end;
/

The all_objects view is convenient as a tool for modelling what I wanted to do since it has a column with a small number of distinct values and an extreme skew across those values. Here’s a slightly weird query that shows an odd costing effect:


with v1 as (
        select /*+ inline */ owner from t1 where owner &gt; 'A'
)
select count(*) from v1 where owner = 'SYS'
union all
select count(*) from v1 where owner = 'SYSTEM'
;

Since the query uses the factored subquery twice and there’s a predicate on the subquery definition, I expect to see materialization – and that’s what happens (even though I’ve engineered the query so that materialization is more expensive than executing inline). Here are the two plans from 12.1.0.2 (the same pattern appears in 11.2.0.4, though the costs are a little less across the board):


=======================
Unhinted (materializes)
=======================

---------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                            |     2 |   132 |    25  (20)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION |                            |       |       |            |          |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D661B_876C2CB |       |       |            |          |
|*  3 |    INDEX FAST FULL SCAN    | T1_I1                      | 85084 |   498K|    21  (15)| 00:00:01 |
|   4 |   UNION-ALL                |                            |       |       |            |          |
|   5 |    SORT AGGREGATE          |                            |     1 |    66 |            |          |
|*  6 |     VIEW                   |                            | 85084 |  5483K|    13  (24)| 00:00:01 |
|   7 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D661B_876C2CB | 85084 |   498K|    13  (24)| 00:00:01 |
|   8 |    SORT AGGREGATE          |                            |     1 |    66 |            |          |
|*  9 |     VIEW                   |                            | 85084 |  5483K|    13  (24)| 00:00:01 |
|  10 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D661B_876C2CB | 85084 |   498K|    13  (24)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

=============
Forced inline
=============

--------------------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |     2 |    12 |    22  (14)| 00:00:01 |
|   1 |  UNION-ALL             |       |       |       |            |          |
|   2 |   SORT AGGREGATE       |       |     1 |     6 |            |          |
|*  3 |    INDEX FAST FULL SCAN| T1_I1 | 38784 |   227K|    21  (15)| 00:00:01 |
|   4 |   SORT AGGREGATE       |       |     1 |     6 |            |          |
|*  5 |    INDEX RANGE SCAN    | T1_I1 |   551 |  3306 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------

I’m not surprised that the optimizer materialized the subquery – as I pointed out in my previous article, the choice seems to be rule-based (heuristic) rather than cost-based. What surprises me is that the cost for the default plan is not self-consistent – the optimizer seems to have lost the cost of generating the temporary table. The cost of the materialized query plan looks as if it ought to be 21 + 13 + 13 = 47. Even if the optimizer were smart enough to assume that the temporary table would be in the cache for the second scan (and therefore virtually free to access) we ought to see a cost of 21 + 13 = 34. As it is we have a cost of 25, which is 13 + 13 (or, if you check the 10053 trace file, 12.65 + 12.65, rounded).

Since the choice to materialize doesn’t seem to be cost-based (at present) this doesn’t really matter – but it’s always nice to see, and be able to understand, self-consistent figures in an execution plan.

Footnote

It is worth pointing out as a side note that materialization can actually be more expensive than running in-line, even for very simple examples. Subquery factoring seems to have become more robust and consistent over recent releases in terms of consistency of execution plans when the subqueries are put back inline, but you still need to think a little bit before rewriting a query for cosmetic (i.e. totally valid “readability”) reasons just to check whether the resulting query is going to produce an unexpected, and unexpectedly expensive, materialization.

↧

IN/EXISTS bugs

September 2, 2015, 12:11 am

≫ Next: Result Cache

≪ Previous: Subquery Factoring (10)

Here’s a simple data set – I’m only interested in three of the columns in the work that follows, but it’s a data set that I use for a number of different models:


execute dbms_random.seed(0)

create table t1
nologging
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	trunc(dbms_random.value(0,1000))	n_1000,
	trunc(dbms_random.value(0,750))		n_750,
	trunc(dbms_random.value(0,600))		n_600,
	trunc(dbms_random.value(0,400))		n_400,
	trunc(dbms_random.value(0,90))		n_90,
	trunc(dbms_random.value(0,72))		n_72,
	trunc(dbms_random.value(0,40))		n_40,
	trunc(dbms_random.value(0,3))		n_3
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;

create table t2 nologging
as
select * from t1
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		method_opt 	 => 'for all columns size 1'
	);
end;
/

The columns I want to consider are n_3, n_400, and n_1000. As their names suggest the columns have 3, 400, and 1000 distinct values respectively and since I’ve used the dbms_random.value() function to generate the data the distinct values are fairly evenly spread across the million rows of the table.

Consider, then, the following two queries:


select
        *
from
        t1
where
        exists (
                select  null
                from    t2
                where   n_1000 = 0
                and     t2.n_400 = t1.n_400
                and     t2.n_3 = t1.n_3
        )
;


select
        *
from
        t1
where
        (t1.n_400, t1.n_3) in (
                select  t2.n_400, t2.n_3
                from    t2
                where   t2.n_1000 = 0
        )
;

The first point to check is that these two queries are logically equivalent.

Once you’re happy with that idea we can work out, informally, how many rows we should expect the queries ought to return: there are 1,200 combinations for (n_400, n_3) so each combination should return roughly 833 rows; if we pick 1,000 rows from the 1 million available we can expect to see 679 of those combinations (that’s Alberto Dell’Era’s “selection without replacement” formula that Oracle uses for adjusting num_distinct to allow for filter predicates). So we might reasonably suggest that the final number of rows as 833 * 679 = 565,607. It turns out that that’s a pretty good estimate – when I ran the query the result was actually 567,018 rows.

So what does Oracle produce for the two execution plans – here are the result from 12c (EXISTS first, then IN):


===================
Multi-column EXISTS
===================
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   920K|    34M|  1259  (11)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT SEMI|      |   920K|    34M|  1259  (11)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | T2   |  1000 | 11000 |   610   (8)| 00:00:01 |
|   3 |   TABLE ACCESS FULL  | T1   |  1000K|    26M|   628  (11)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N_400"="T1"."N_400" AND "T2"."N_3"="T1"."N_3")
   2 - filter("N_1000"=0)

===================
Equivalent IN query
===================
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   833K|    30M|  1259  (11)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT SEMI|      |   833K|    30M|  1259  (11)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | T2   |  1000 | 11000 |   610   (8)| 00:00:01 |
|   3 |   TABLE ACCESS FULL  | T1   |  1000K|    26M|   628  (11)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."N_400"="T2"."N_400" AND "T1"."N_3"="T2"."N_3")
   2 - filter("T2"."N_1000"=0)

The first thing to note is that the shape of the plans is identical, and the predicate sections are identical – but the final cardinalities are different. Clearly at least one of the cardinalities has to be wrong by a significant amount (7.5% or 10.4%, depending which way round you want to look at it). If you run the test on 11.2.0.4 you find that both plans give the same estimated row count – and it’s the 920,000 rows; so arguably 12c has “fixed” the IN subquery calculation, bringing it closer to a reasonable prediction, but it hasn’t fixed the EXISTS subquery calculation. That 833K prediction, by the way, is what you would expect to see with this data with a basic join – and a semi-join shouldn’t be able to produce more data than a join.

But both predictions are way off the (informal) expectation, so how have they appeared ?

Working backwards it’s easy to spot that: 833K = 833 * 1,000: Oracle is behaving as if every single row identified in the subquery will produce a separate combination of (n_400, n_3). If we reverse engineer 920K we get: 920K / 833 = 1104 – it would appear that the optimizer thinks the 1,000 rows produced by the subquery will produce 1,104 distinct combinations of (n_400, n_3) so we how did the impossible 1,104 appear in the arithmetic.

If you apply the “selection without replacement” formula to picking 1,000 rows with 400 distinct values from 1,000,000 rows the expected number of distinct values (with rounding) will be 368; if you apply the formula for picking 1,000 rows with 3 distinct values from 1,000,000 rows the expected number will be 3. And 3 * 368 = 1,104. (Remember that in my original estimate I applied the formula after multiplying out the combination of distinct values). The optimizer is using its standard methods, but using internediate results in an unsuitable fashion.

It’s impossible to say what the impact of this particular code path – and the change on the upgrade – might be. The optimizer has over-estimated by 47% in one case and 62% in the other but (a) there may be something about my data that exaggerated an effect that few people will see in the wild and (b) in many cases getting in the right ballpark is enough to get a reasonable plan, and a factor of 2 is the right ballpark.

Of course, a few people will be unlucky with a few queries on the upgrade where the estimate changes – after all a single row difference in the estimate can cause the optimizer to flip between a hash join and a nested loop – but at least you’ve got a little extra information that might help when you see a bad estimate on an important semi-join.

So is there a workaround ? Given that I’ve got 12c, the obvious thing to try is to create a column group at both ends of the semi-join and see what happens. It shouldn’t really make any difference because column groups are targeted at the problems of correlated column – but we might as well try it:


execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for columns (n_400,n_3) size 1')
execute dbms_stats.gather_table_stats(user,'t2',method_opt=>'for columns (n_400,n_3) size 1')

Unfortunately when I did this the final cardinality estimate for both queries dropped to just 833 (the absence of a K on the end isn’t a typo!).

Manually unnesting got me closer:


select
        *
from
        (
        select  distinct n_3, n_400
        from    t2
        where   n_1000 = 0
        )       sq,
        t1
where
        sq.n_400 = t1.n_400
and     sq.n_3 = t1.n_3
;

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   649K|    33M|  1260  (11)| 00:00:01 |
|*  1 |  HASH JOIN           |      |   649K|    33M|  1260  (11)| 00:00:01 |
|   2 |   VIEW               |      |   779 | 20254 |   612   (8)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |   779 |  8569 |   612   (8)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T2   |  1000 | 11000 |   610   (8)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T1   |  1000K|    26M|   628  (11)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("SQ"."N_400"="T1"."N_400" AND "SQ"."N_3"="T1"."N_3")
   4 - filter("N_1000"=0)

The cardinality of 649K is (allowing for rounding) 833 * 779; so we need to know where the 779 came from. It’s the optimizer standard arithmetic for “distinct” – multiply the N individual selectivities together then divide by the sqrt(2) “N-1” times. So we apply the “selection without replacement formula twice”:

adjusted selectivity of n_400 = 367.21
adjusted selectivity of n_3 = 3
367.21 * 3 / sqrt(2) = 779

If you create column group statistics for (n_400, n_3) this doesn’t change the optimizer’s estimate for the number of distinct combinations after selection – maybe that’s another enhancement in the pipeline – but, at least in this case, the manual unnesting has got us a little closer to the right estimates without any statistical intervention.

Footnote:

Just for the sake of completeness, here are the plans (with yet more cardinality predictions) that you get if you block the unnesting:


select
	*
from
	t1
where
	exists (
		select
			/*+ no_unnest */
			null
		from	t2
		where	n_1000 = 0
		and	t2.n_400 = t1.n_400
		and	t2.n_3 = t1.n_3
	)
;



---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1179 | 33012 |   766K (12)| 00:00:30 |
|*  1 |  FILTER            |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1   |  1000K|    26M|   632  (11)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T2   |     1 |    11 |   638  (12)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "T2" "T2" WHERE
              "N_1000"=0 AND "T2"."N_400"=:B1 AND "T2"."N_3"=:B2))
   3 - filter("N_1000"=0 AND "T2"."N_400"=:B1 AND "T2"."N_3"=:B2)



=====================================
Unnesting blocked and subquery pushed
=====================================
select
	*
from
	t1
where
	exists (
		select
			/*+ no_unnest push_subq */
			null
		from	t2
		where	n_1000 = 0
		and	t2.n_400 = t1.n_400
		and	t2.n_3 = t1.n_3
	)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 50000 |  1367K|  1271  (12)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL | T1   | 50000 |  1367K|   632  (11)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T2   |     1 |    11 |   638  (12)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ PUSH_SUBQ NO_UNNEST */ 0 FROM "T2"
              "T2" WHERE "N_1000"=0 AND "T2"."N_400"=:B1 AND "T2"."N_3"=:B2))
   2 - filter("N_1000"=0 AND "T2"."N_400"=:B1 AND "T2"."N_3"=:B2)

The 1179 comes from the magic of sqrt(2): 1179 = 1,000,000 / (400 * 3 / sqrt(2)).

The 50,000 is just the basic “I dunno, let’s call it 5%”.

Reference script: aggregate_selectivity_c.sql

↧

Result Cache

September 22, 2015, 2:11 am

≫ Next: 12c Downgrade

≪ Previous: IN/EXISTS bugs

Yesterday I thought I’d spend half an hour before breakfast creating a little demonstration of a feature; some time about midnight I felt it was time to stop because I’d spent enough time chasing around a couple of bugs that produced wrong results in a variety of ways. Today’s short post is just little warning: be VERY careful what you do with the PL/SQL result cache – if you use the results of database queries in the cache you may end up with inconsistent results in your application. Here’s one very simple example of what can go wrong, starting with a little script:


alter system flush shared_pool;

truncate table gtt1;
drop table gtt1;
create global temporary table gtt1 (n1 number)
on commit preserve rows
;

insert into gtt1 values(1);
commit;

execute dbms_stats.gather_table_stats(user,'gtt1');


create or replace function f_cache
return number
result_cache
-- relies_on (gtt1)
is
        m_ret number;
begin
        select  max(n1)
        into    f_cache.m_ret
        from    gtt1
        ;

        return f_cache.m_ret;
end;
/

execute dbms_output.put_line(f_cache)

Here’s the output from a session that’s just connected and run this script (the table already existed from an earlier run):

SQL> @temp

System altered.


Table truncated.


Table dropped.


Table created.


1 row created.


Commit complete.


PL/SQL procedure successfully completed.


Function created.

1

PL/SQL procedure successfully completed.

SQL>

Let’s call this session A, and I’m going to connect with another session which I’ll call session B. The following extracts are cut-and-pastes as I alternate between sessions:

Session B:

SQL> execute dbms_output.put_line(f_cache);
1

PL/SQL procedure successfully completed.

SQL> insert into gtt1 values(0);

1 row created.

SQL> execute dbms_output.put_line(f_cache);
0

PL/SQL procedure successfully completed.

Session B has just seen the data inserted into a global temporary table by session A; but after inserting a row into the table it now sees what it perceives to be the correct answer.

Session A:

SQL> truncate table gtt1;

Table truncated.

SQL> execute dbms_output.put_line(f_cache);
1

PL/SQL procedure successfully completed.

Despite truncating the table, session A still sees the data which has been eliminated unrecoverably.

Session B (where I hadn’t yet committed):


SQL> commit;

Commit complete.

SQL>

Session A (where I’ve done nothing new):

SQL> execute dbms_output.put_line(f_cache);


PL/SQL procedure successfully completed.

SQL>

The row has finally “disappeared” because session B committed.

Session B (where I haven’t done anything since committing):


SQL> execute dbms_output.put_line(f_cache);


PL/SQL procedure successfully completed.

SQL>

Session B no longer sees the data because it’s now seeing what session A has just seen.

Warning.

This is just one of several ways I managed to get surprising inconsistencies when using the (cross-session) PL/SQL Result Cache. Oracle (in 12c) is supposed to know what the cache relies on and react accordingly, but it doesn’t manage to do it properly (even if you include the relies_on clause) – if you query the database in your PL/SQL you may find strange things happen.

The most important point to note in this example is that a session that wasn’t necessarily doing anything wrong got the wrong results because of the actions of another session that Oracle should have protected it from.

The good thing about this is example that it’s documented (sort of) – as it says in the manual: “to be result cached … does not reference … temporary tables ..”. So that should warn people off copying my example; on the other hand the problem I’m seeing arises because Oracle seems to be trying to use the result cache when the manuals suggests it shouldn’t be.

Conclusion

Do not mix the pl/sql result cache with database queries. The cache is public but (unlike the buffer cache) it is not guaranteed to give you read-consistency.

If you think this was an unrealistic example and you don’t need to worry about it – I’ll post a couple more examples in the next couple of weeks. They’ll be slightly special cases again, but I find the Oracle world is full of special cases.

Update

This behaviour is now visible on MoS as “Bug 21907155 : PL/SQL RC FN SHOWS ONE SESSION THE CONTENTS OF ANOTHER SESSION’S GTT”

↧

12c Downgrade

July 20, 2015, 5:12 am

≫ Next: Invalidation

≪ Previous: Result Cache

Here’s a data set to get things going, I’ve got “parent” and “child” tables, but in this particular demonstration I won’t be invoking referential integrity:


create table chi
as
with generator as (
        select  --+ materialize
                rownum  id
        from dual
        connect by
                level <= 1e4
)
select
        rownum - 1                              id,
        trunc((rownum-1)/10)                    n1,
        trunc(dbms_random.value(0,1000))        n2,
        rpad('x',1000)                          padding
from
        generator
;

create table par
as
with generator as (
        select  --+ materialize
                rownum  id
        from dual
        connect by
                level <= 1e4
)
select
        rownum - 1      id,
        rpad('x',1000)  padding
from
        generator
where
        rownum <= 1e3
;

alter table par modify id not null;
alter table par add constraint par_pk primary key(id)
-- deferrable
;

-- Now gather stats on the tables.

So here’s the test code:


set serveroutput off
set linesize 156
set trimspool on
set pagesize 60

alter session set statistics_level = all;

prompt  =============================
prompt  Strictly ordered driving data
prompt  =============================

select
        /*+
                leading(@sel$5da710d3 chi@sel$1 par@sel$2)
                full   (@sel$5da710d3 chi@sel$1)
                use_nl (@sel$5da710d3 par@sel$2)
                index  (@sel$5da710d3 par@sel$2 (par.id))
        */
        count(*)
from
        chi
where   exists (
                select null
                from par
                where par.id = chi.n1
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline alias cost'));

prompt  =============================
prompt  Randomly ordered driving data
prompt  =============================

select
        /*+
                leading(@sel$5da710d3 chi@sel$1 par@sel$2)
                full   (@sel$5da710d3 chi@sel$1)
                use_nl (@sel$5da710d3 par@sel$2)
                index  (@sel$5da710d3 par@sel$2 (par.id))
        */
        count(*)
from
        chi
where   exists (
                select null
                from par
                where par.id = chi.n2
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline alias cost'));

set serveroutput on
alter session set statistics_level = typical;

In both cases I’ve hinted the query quite heavily, using internally generated query block names, into running with a nested loop semi-join from chi to par. Since there are 10,000 rows in chi with no filter predicates, you might expect to see the probe into the par table starting 10,000 times returning (thanks to our perfect data match) one row for each start. Here are the run-time plans with rowsource execution stats from 11.2.0.4

=============================
Strictly ordered driving data
=============================

--------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   190 (100)|      1 |00:00:00.14 |    1450 |   1041 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.14 |    1450 |   1041 |
|   2 |   NESTED LOOPS SEMI |        |      1 |  10065 |   190   (4)|  10000 |00:00:00.12 |    1450 |   1041 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10065 |   186   (2)|  10000 |00:00:00.07 |    1434 |   1037 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |   1000 |   1048 |     0   (0)|   1000 |00:00:00.01 |      16 |      4 |
--------------------------------------------------------------------------------------------------------------

=============================
Randomly ordered driving data
=============================

-----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   190 (100)|      1 |00:00:00.12 |    5544 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.12 |    5544 |
|   2 |   NESTED LOOPS SEMI |        |      1 |  10065 |   190   (4)|  10000 |00:00:00.10 |    5544 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10065 |   186   (2)|  10000 |00:00:00.02 |    1434 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |   4033 |   1048 |     0   (0)|   4033 |00:00:00.02 |    4110 |
-----------------------------------------------------------------------------------------------------

Time to upgrade to 12.1.0.2 and see what happens:

=============================
Strictly ordered driving data
=============================

--------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   189 (100)|      1 |00:00:00.22 |    1448 |   1456 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.22 |    1448 |   1456 |
|   2 |   NESTED LOOPS      |        |      1 |  10000 |   189   (4)|  10000 |00:00:00.20 |    1448 |   1456 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10000 |   185   (2)|  10000 |00:00:00.03 |    1432 |   1429 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |  10000 |      1 |     0   (0)|  10000 |00:00:00.06 |      16 |     27 |
--------------------------------------------------------------------------------------------------------------

=============================
Randomly ordered driving data
=============================

--------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   189 (100)|      1 |00:00:00.22 |   11588 |   1429 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.22 |   11588 |   1429 |
|   2 |   NESTED LOOPS      |        |      1 |  10000 |   189   (4)|  10000 |00:00:00.19 |   11588 |   1429 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10000 |   185   (2)|  10000 |00:00:00.03 |    1432 |   1429 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |  10000 |      1 |     0   (0)|  10000 |00:00:00.07 |   10156 |      0 |
--------------------------------------------------------------------------------------------------------------

The timings (A-time) on these experiments are not particularly trustworthy – the differences between cached reads and direct path reads introduced more variation than the difference in Starts and Buffers, and the total CPU load is pretty small anyway – and I suspect that this difference won’t make much difference to most people most of the time. No doubt, though, there will be a few cases where a small change like this could have a noticeable effect on some important queries.

Footnote

It’s also nice to think that the scalar subquery caching optimisation used in semi-joins might eventually become available to standard joins (in cases such as “join to parent”, perhaps).

↧