Quantcast
Channel: Bugs – Oracle Scratchpad
Viewing all articles
Browse latest Browse all 123

Sorted Hash Clusters RIP

$
0
0

Sorted Hash Clusters have been around for several years, but I’ve not yet seen them being used, or even investigated in detail. This is a bit of a shame, really, because they seem to be engineered to address a couple of interesting performance patterns.

The basic concept is that data items that look alike are stored together (clustered) by applying a hashing function to generate a block address; but on top of that, if you query the data by “hashkey”, the results are returned in sorted order of a pre-defined “sortkey” without any need for sorting. (On top of everything else, the manuals describing what happens and how it works are wrong).

Yesterday I had reason to take a closer look at them, and decided that perhaps the reason no one talks about them is that they simply aren’t safe.  Here’s a trivial demonstration, which I’ve run on 10.2.0.5, 11.2.0.3, and 12.1.0.1:


execute dbms_random.seed(0)

create cluster sorted_hash_cluster (
	hash_value	number(6,0),
	sort_value	varchar2(2)	sort
)
size 300
hashkeys 100
;

create table sorted_hash_table (
	hash_value	number(6,0),
	sort_value	varchar2(2),
	v1		varchar2(10),
	padding		varchar2(30)
)
cluster sorted_hash_cluster (
	hash_value, sort_value
)
;


begin
	for i in 1..5000 loop
		insert into sorted_hash_table values(
			trunc(dbms_random.value(0,99)),
			dbms_random.string('U',2),
			lpad(i,10),
			rpad('x',30,'x')
		);
		commit;
	end loop;
end;
/

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'sorted_hash_table'
	);
end;
/

select count(*) from sorted_hash_table where hash_value = 92;
select count(*) from sorted_hash_table where hash_value = 92 and sort_value is null;
select count(*) from sorted_hash_table where hash_value = 92 and sort_value is not null;

select * from sorted_hash_table where hash_value = 92 and sort_value >= 'YR';
select * from sorted_hash_table where hash_value = 92 and sort_value > 'YR';

I think the nature of the last two queries is exactly the type for which the feature has been invented – just check the results, which come from a cut-n-paste after setting echo on:


SQL> select count(*) from sorted_hash_table where hash_value = 92;

  COUNT(*)
----------
        60

1 row selected.

SQL> select count(*) from sorted_hash_table where hash_value = 92 and sort_value is null;

  COUNT(*)
----------
        60

1 row selected.

SQL> select count(*) from sorted_hash_table where hash_value = 92 and sort_value is not null;

  COUNT(*)
----------
        60

1 row selected.

SQL> select * from sorted_hash_table where hash_value = 92 and sort_value >= 'YR';

HASH_VALUE SO V1         PADDING
---------- -- ---------- ------------------------------
        92 YR       4773 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
        92 ZF        250 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
        92 ZJ       2046 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
        92 ZT         65 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

4 rows selected.

SQL> 
SQL> select * from sorted_hash_table where hash_value = 92 and sort_value > 'YR';

no rows selected


So: Null is not null, and ‘ZF’ is not greater than ‘YR’, it’s only greater than or equal to ‘YR’ !
I’d be interested to see the test cases that the developer used for this feature that allowed it to ship at all.



Viewing all articles
Browse latest Browse all 123