Disjunction

2015-01-23T07:33:22+00:00

Забавно, Николай, у меня Ваш запрос вернул не 4712, а 4812 …
Пользуясь случаем, хочу выразить благодарность за Ваш замечательный блог.
Сергей
Россия, Вологда.

Reply

2015-01-23T07:38:40+00:00

Hi Sergey, thanks for visiting my blog!

It’s not really surprising that you got a slightly different number of rows here because there is a sysdate in the query (and if you’ll re-run the query now, you’ll see a yet different number…).

I know it’s somewhat confusing but it would have taken me much longer to come up with a more deterministic example, so I decide that it didn’t really matter.

Best regards,
Nikolay

Reply

2015-01-23T09:06:52+00:00

Hi Nikolay!

Sometimes it enough to use “union all” solutions, because even in your example it works quite good:
we can create 5 indexes – one index per each column:

	create table xt(
	id number,
	x number,
	date1 date not null,
	date2 date not null,
	date3 date not null,
	date4 date not null,
	date5 date not null
	);

	insert into xt
	select level,
	dbms_random.value,
	sysdate – 1000*dbms_random.value,
	sysdate – 1000*dbms_random.value,
	sysdate – 1000*dbms_random.value,
	sysdate – 1000*dbms_random.value,
	sysdate – 1000*dbms_random.value
	from dual
	connect by level <= 1e6;

	create index i$xt$dates1 on xt(date1, date2, date3, date4, date5);
	create index i$xt$dates2 on xt(greatest(date1, date2, date3, date4, date5));

	create index i$xt$date1 on xt(date1);
	create index i$xt$date2 on xt(date2);
	create index i$xt$date3 on xt(date3);
	create index i$xt$date4 on xt(date4);
	create index i$xt$date5 on xt(date5);

	call dbms_stats.gather_table_stats(user, 'XT');

	alter session set statistics_level=all;
	set echo on;

	select/+ full(xt) / count(*)
	from xt
	where
	date1 > trunc(sysdate) – 1 or
	date2 > trunc(sysdate) – 1 or
	date3 > trunc(sysdate) – 1 or
	date4 > trunc(sysdate) – 1 or
	date5 > trunc(sysdate) – 1;
	select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

	select count(*)
	from xt
	where
	greatest(date1, date2, date3, date4, date5) > trunc(sysdate) – 1;

	select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

	with rids(rid) as (
	select d1.rowid from xt d1 where d1.date1 > trunc(sysdate) – 1
	union all
	select d2.rowid from xt d2 where d2.date2 > trunc(sysdate) – 1
	union all
	select d3.rowid from xt d3 where d3.date3 > trunc(sysdate) – 1
	union all
	select d4.rowid from xt d4 where d4.date4 > trunc(sysdate) – 1
	union all
	select d5.rowid from xt d5 where d5.date5 > trunc(sysdate) – 1
	)
	select count(distinct rid)
	from rids;

	select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
	set echo off;

view raw

disj.sql

hosted with ❤ by GitHub

In that case we haven’t FBI, we can still use these indexes for other queries and result not so bad:
https://gist.github.com/xtender/b7635b0621ded18274c1

Regards,
Sayan Malakshinov

Reply

2015-01-23T09:07:50+00:00

Sorry, right link for spool:

	SQL> select/+ full(xt) / count(*)
	2 from xt
	3 where
	4 date1 > trunc(sysdate) – 1 or
	5 date2 > trunc(sysdate) – 1 or
	6 date3 > trunc(sysdate) – 1 or
	7 date4 > trunc(sysdate) – 1 or
	8 date5 > trunc(sysdate) – 1;

	COUNT(*)
	———-
	7340

	SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

	PLAN_TABLE_OUTPUT
	———————————————————————————————-
	SQL_ID 9mvtaxfu15pgz, child number 0
	————————————-
	Plan hash value: 627935069

	———————————————————————————————-
	\| Id \| Operation \| Name \| Starts \| E-Rows \| A-Rows \| A-Time \| Buffers \| Reads \|
	———————————————————————————————-
	\| 0 \| SELECT STATEMENT \| \| 1 \| \| 1 \|00:00:04.79 \| 9871 \| 9868 \|
	\| 1 \| SORT AGGREGATE \| \| 1 \| 1 \| 1 \|00:00:04.79 \| 9871 \| 9868 \|
	\|* 2 \| TABLE ACCESS FULL\| XT \| 1 \| 7312 \| 7340 \|00:00:04.79 \| 9871 \| 9868 \|
	———————————————————————————————-

	Predicate Information (identified by operation id):
	—————————————————

	2 – filter(("DATE1">TRUNC(SYSDATE@!)-1 OR "DATE4">TRUNC(SYSDATE@!)-1 OR
	"DATE3">TRUNC(SYSDATE@!)-1 OR "DATE5">TRUNC(SYSDATE@!)-1 OR
	"DATE2">TRUNC(SYSDATE@!)-1))


	SQL> select count(*)
	2 from xt
	3 where
	4 greatest(date1, date2, date3, date4, date5) > trunc(sysdate) – 1;

	COUNT(*)
	———-
	7340

	SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

	PLAN_TABLE_OUTPUT
	—————————————————————————————————-
	SQL_ID fwfsnq78kf6ga, child number 1
	————————————-
	Plan hash value: 2527608477

	—————————————————————————————————-
	\| Id \| Operation \| Name \| Starts \| E-Rows \| A-Rows \| A-Time \| Buffers \| Reads \|
	—————————————————————————————————-
	\| 0 \| SELECT STATEMENT \| \| 1 \| \| 1 \|00:00:00.06 \| 22 \| 22 \|
	\| 1 \| SORT AGGREGATE \| \| 1 \| 1 \| 1 \|00:00:00.06 \| 22 \| 22 \|
	\|* 2 \| INDEX RANGE SCAN\| I$XT$DATES2 \| 1 \| 7116 \| 7340 \|00:00:00.06 \| 22 \| 22 \|
	—————————————————————————————————-

	Predicate Information (identified by operation id):
	—————————————————

	2 – access("XT"."SYS_NC00008$">TRUNC(SYSDATE@!)-1)


	SQL> select/+ NO_TRANSFORM_DISTINCT_AGG / count(distinct rid)
	2 from (
	3 select d1.rowid rid from xt d1 where d1.date1 > trunc(sysdate) – 1
	4 union all
	5 select d2.rowid from xt d2 where d2.date2 > trunc(sysdate) – 1
	6 union all
	7 select d3.rowid from xt d3 where d3.date3 > trunc(sysdate) – 1
	8 union all
	9 select d4.rowid from xt d4 where d4.date4 > trunc(sysdate) – 1
	10 union all
	11 select d5.rowid from xt d5 where d5.date5 > trunc(sysdate) – 1
	12 );

	COUNT(DISTINCTRID)
	——————
	7340

	SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

	PLAN_TABLE_OUTPUT
	——————————————————————————————————————————-
	SQL_ID 3qmsrc2bqshh6, child number 0
	————————————-
	Plan hash value: 4043828305

	——————————————————————————————————————————–
	\| Id \| Operation \| Name \| Starts \| E-Rows \| A-Rows \| A-Time \| Buffers \| Reads \| OMem \| 1Mem \| Used-Mem \|
	——————————————————————————————————————————–
	\| 0 \| SELECT STATEMENT \| \| 1 \| \| 1 \|00:00:00.19 \| 35 \| 35 \| \| \| \|
	\| 1 \| SORT GROUP BY \| \| 1 \| 1 \| 1 \|00:00:00.19 \| 35 \| 35 \| 478K\| 448K\| 424K (0)\|
	\| 2 \| VIEW \| \| 1 \| 7297 \| 7366 \|00:00:00.15 \| 35 \| 35 \| \| \| \|
	\| 3 \| UNION-ALL \| \| 1 \| \| 7366 \|00:00:00.14 \| 35 \| 35 \| \| \| \|
	\|* 4 \| INDEX RANGE SCAN\| I$XT$DATE1 \| 1 \| 1458 \| 1444 \|00:00:00.03 \| 7 \| 7 \| \| \| \|
	\|* 5 \| INDEX RANGE SCAN\| I$XT$DATE2 \| 1 \| 1466 \| 1519 \|00:00:00.03 \| 7 \| 7 \| \| \| \|
	\|* 6 \| INDEX RANGE SCAN\| I$XT$DATE3 \| 1 \| 1439 \| 1523 \|00:00:00.02 \| 7 \| 7 \| \| \| \|
	\|* 7 \| INDEX RANGE SCAN\| I$XT$DATE4 \| 1 \| 1473 \| 1414 \|00:00:00.03 \| 7 \| 7 \| \| \| \|
	\|* 8 \| INDEX RANGE SCAN\| I$XT$DATE5 \| 1 \| 1461 \| 1466 \|00:00:00.02 \| 7 \| 7 \| \| \| \|
	——————————————————————————————————————————–

	Predicate Information (identified by operation id):
	—————————————————

	4 – access("D1"."DATE1">TRUNC(SYSDATE@!)-1)
	5 – access("D2"."DATE2">TRUNC(SYSDATE@!)-1)
	6 – access("D3"."DATE3">TRUNC(SYSDATE@!)-1)
	7 – access("D4"."DATE4">TRUNC(SYSDATE@!)-1)
	8 – access("D5"."DATE5">TRUNC(SYSDATE@!)-1)

view raw

disj_spool.sql

hosted with ❤ by GitHub

Reply

2015-01-23T13:26:09+00:00

Sayan,

thanks for your comment.

1) I did mention that OR expansion is often a solution to OR problems, but I don’t think that it’s a very good one in this case
2) just imagine how much uglier and more difficult to support the code would become with 4 UNION ALL’s in place. Please bear in mind that the code in my example just a simplified representation of the actual problem. The real-life prototype of it was several screens long. Now you’re basically — how would you feel if you were to support code like this?
3) I suspect that updating 5 indexes with DML would be more expensive than updating 1 FBI
4) I suspect that the 5 indexes would occupy more space than the FBI. I think even more than the table itself.

So to sum up, while this might be a working solution, it’s far from being practical, let alone optimal.

Reply

2015-01-23T13:50:49+00:00

Nikolay,

1. I noticed about “OR expansion”, because of that i commented exactly :)

2. Strictly speaking oracle can do it itself, but not so good(or_expand+concatenation) as manual “union all”. So sometimes we even haven’t to rewrite code as we have to do with FBI.

3-4. Usually other indexes already exists and they are needed for other queries, so often we have to compare just to add or not to add new FBI. Also note that when you create FBI, you create additional hidden column in the table. Also note that updates of anyone of these 5 columns will cause FBI value recalculating. So we have to also analyze frequency of updates of these columns.

Reply

2015-01-23T14:11:47+00:00

btw, even if i had chosen FBI, for convenience and transperancy i’d create virtual column with logical name like “VIRT_LATEST_DATE” and the index on this column.

Reply

2015-01-26T11:44:25+00:00

Hi

One nice use of this technique is to create a virtual column (partition_date) based as greatest(date1, date2, date3, date4, date5) and interval partition table by this virtual column. Constraints defining
date1 <= partition_date
enable optimizer to utilize partitition pruning even if partitition_date is not mentioned.
If dateN columns can be null, then currect version of optimizer seems to require
mentioning in where clause 'date1<= partition_date' condition if date1 is used for filtering rows. Also nvl is required in greatest functio for nullable columns.

This is useful for tables which have many date columns, which do not differ so much from each other and all of them can and are used in where clauses. Typically queries will refer to 'new' data, so this technique would prune partitions efficiently.

Reply

2015-01-26T11:56:35+00:00

Hi Lasse,

thanks for your comment. A virtual column would be implicitly created by the optimizer when creating the FBI, but I agree that explicitly creating it could offer some advantages.

Using this virtual column as a partitioning key is not always an option (partitioning can be not available due to licensing restrictions, or the table may need to have a different partitioning key for some reason, e.g. reference partitioning to align partitions with a logically connected table). Besides, even if partitioning can be used in principle, it’s not something that can be done easily to an existing table with data (although various techniques like ALTER TABLE EXCHANGE PARTITION could simplify that task). Those are the reasons why I didn’t focus (well, that, and the desire not to make the blog too large). Nevertheless I agree that where appropriate, this could be a very good solution.

The most important thing, however, is to realize that getting rid of OR can be done by other means that OR exnapsion with UNION ALL — e.g. using logical equivalence and rules of arithmetic as described in the example. Once you do that, there are many other techniques that you can combine this trick with to make it even more efficient.

Regarding NVL’s — yes, if the columns are NULLABLE then the greatest function would need to contain NVLs. I did mention that in my blog.

Best regards,
Nikolay

Reply

Disjunction

Published by savvinov

9 thoughts on “Disjunction”

Leave a comment Cancel reply

Share this:

Related

Published by savvinov

9 thoughts on “Disjunction”

Leave a comment Cancel reply