索引，妙不可言

王硕 · 发表于 2015-4-6 23:03:00

第五章：索引天地妙不可言

1.btree索引

1.1 高度较低

1.2存储列值和 rowid

1.3结构有序

2.位图索引

3.函数索引

索引，由Root（根块），Branch（茎块），Leaf（叶子块）组成。

其中，Leaf（叶子块)主要存储了 key column value （索引列具体值）

以及能具体定位到数据所在位置的rowid。

[此贴子已经被作者于2015-04-06 23:09:39编辑过]

王硕 · 发表于 2015-4-6 23:03:19

5.2.1.5妙用三特征之高低较低

drop table t1 purge；

drop table t2 purge;

drop table t3 purge;

drop table t4 purge;

drop table t5 purge;

drop table t6 purge;

drop table t7 purge;

create table t1 as select rownum as id,rownum+1 as id2 from dual connect by level<=5;

create table t2 as select rownum as id,rownum+1 as id2 from dual connect by level<=50;

create table t3 as select rownum as id,rownum+1 as id2 from dual connect by level<=500;

create table t4 as select rownum as id,rownum+1 as id2 from dual connect by level<=5000;

create table t5 as select rownum as id,rownum+1 as id2 from dual connect by level<=50000;

create table t6 as select rownum as id,rownum+1 as id2 from dual connect by level<=500000;

create table t7 as select rownum as id,rownum+1 as id2 from dual connect by level<=5000000;

create index idx_id_t1 on t1(id);

create index idx_id_t2 on t2(id);

create index idx_id_t3 on t3(id);

create index idx_id_t4 on t4(id);

create index idx_id_t5 on t5(id);

create index idx_id_t6 on t6(id);

create index idx_id_t7 on t7(id);

观察比较各个索引的大小：

select segment_name,bytes/1024

from user_segments

where segment_name in ('IDX_ID_T1','IDX_ID_T2','IDX_ID_T3','IDX_ID_T4','IDX_ID_T5','IDX_ID_T6','IDX_ID_T7');

观察比较各个索引的高度：

select index_name,blevel,leaf_blocks,num_rows,distinct_keys,clustering_factor

from user_ind_statistics where table_name in ('T1','T2','T3','T4','T5','T6','T7');

由于一个区含有8个块，一个块8KB，即便是一个空表，大小也是64KB。

如果删除了索引，再执行查看查询性能。

5.2.1.5.1分区索引设计误区

建立分区表part_tab ,插入数据，并分别在col2上建局部索引，在col3上建立全局索引：

drop table part_tab purge;

create table part_tab (id int,col2 int,col3 int)

partition by range(id)

(

partition p1 values less than (10000),

partition p2 values less than (20000),

partition p3 values less than (30000),

partition p4 values less than (40000),

partition p5 values less than (50000),

partition p6 values less than (60000),

partition p7 values less than (70000),

partition p8 values less than (80000),

partition p9 values less than (90000),

partition p10 values less than (100000),

partition p11 values less than (maxvalue)

)

;

insert into part_tab select rownum,rownum+1,rownum+2 from dual connect by rownum<=110000;

commit;

create index idx_par_tab_col2 on part_tab(col2) local;

create index idx_par_tab_col3 on part_tab(col3);

查看分区表各个分区及其类型：

col segment_name format a20

select segment_name,partition_name,segment_type

from user_segments

where segment_name='PART_TAB';

select segment_name,partition_name,segment_type

from user_segments

where segment_name='IDX_PAR_TAB_COL2';

select segment_name,partition_name,segment_type

from user_segments

where segment_name='IDX_PAR_TAB_COL3';

建立一个普通表norm_tab，与part_tab类似：

drop table norm_tab purge;

create table norm_tab（id int,col2 int,col3 int);

insert into norm_tab select rownum,rownum+1,rownum+2 from dual connect by rownum<=110000;

commit;

create index idx_nor_tab_col2 on norm_tab(col2);

create index idx_nor_tab_col3 on norm_tab(col3);

查询性能分析：

set autotrace traceonly

set linesize 1000

set timing on

select * from part_tab where col2=8;

select * from norm_tab where col3=8;

针对分区表查询逻辑读为普通表的6倍，cost也为6倍。

分区表索引IO个数要乘以分区个数，分区查询，导致查询变慢。

分别观察分区表和普通表的索引高度：

select index_name,

blevel,

leaf_blocks,

num_rows,

distinct_keys,

clustering_factor

from user_ind_partitions

where index_name='IDX_PAR_TAB_COL2';

select index_name,

blevel,

leaf_blocks,

num_rows,

distinct_keys,

clustering_factor

from user_ind_statistics

where index_name='IDX_PAR_TAB_COL2';

分区索引扫描落在某一个分区，性能大幅提升：

select * from part_tab where col2=8 and id=7;

这里pstart 与pstop 只遍历一个分区。

[此贴子已经被作者于2015-06-28 01:06:17编辑过]

王硕 · 发表于 2015-4-6 23:03:49

5.2.1.6 巧用三特征之存储列值

1.count(*)优化

drop table t purge;

create table t as select * from dba_objects;

create index idx1_object_id on t(object_id);

select count(*) from t;

count(*)在索引列有空值时无法用到索引：

set autotrace on

set linesize 1000

set liming on

select count(*) from t;

因为，索引不能存储空记录。如果索引列有空记录，那依据索引来统计表的记录数，不会用到索引。

明确索引列为空(只查询非空列），即可让count（*）用到索引：

set autotrace on

set linesize 10

set liming on

select count(*) from t where object_id is not null;

查看T表的列是否为空：

desc t；

由于T表是create建立起来，所以当前列的属性全部允许为空。

当把列的属性改为非空就可以直接用到索引。主键也必须走索引。

更改object_id的属性，修改为不允许为空：

alter table t modify object_id not null;

之后发现count（*）用到索引。

由于，索引含有列值，如果一个表只有一个列，则走索引会更慢。

2.SUM/AVG优化

drop table t purge;

create table t as select * from dba_objects;

create index idx1_object_id on t(object_id);

SUM/AVG用不到索引（因为列允许为空）

set autotrace on

set linesize 1000

set liming on

select sum(object_id) from t;

SUM、AVG、COUNT综合写法试验：

select sum(object_id),avg(object_id),count(*) from t where object_id is not null;

三个聚合语句在一起，只需一次扫描索引块，就同时解决三个问题，同单个问题所用时间一样。

3.MAX/MIN优化

select max(object_id) from t;

max/min 无论列是否为空都可以用索引（也可能不用）。

更由于max索引之走索引块最右边最后一个块，所以逻辑读极少（无论表的大小）。

min和max同时写的优化（空值导致用不到索引）：

set autotrace on

set linesize 1000

set min(object_id),max(object_id) from t;

增加is not null,同时写min、max的优化（无法用index full scan（min/max））：

select min（object_id),max(object_id) from t where object_id is not null;

虽然走了索引，扫描方式却是 index fast full scan 而非index full scan(min/max)

4.索引回表读（table access by index rowid)

drop table t purge;

create table t as select * from dba_objects;

create index idx1_object_id on t(object_id);

set autotrace traceonly

set linesize 1000

set liming on

select * from t where object_id<=5；

* 使得扫描索引定位索引块后，重新定位整个t表

如果返回列与索引列不同：

select object_id,object_name from t where object_id<=5;

建立组合索引消除回表：

create index idx_un_objid_objname on t(object_id,object_name);

聚合因子，建有序和无序的表各一张：

drop table t_colocated purge;

create table t_colocated (id number,col2 varchar2(100));

begin

for i in 1..100000

loop

insert into t_colocated(id,col2)

values(i,rpad(dbms_random.random,95,'*'));

end loop;

end;

/

alter table t_colocated add constraint pk_t_colocated primary key(id);

drop table t_disorganized purge;

create table t_disorganized

as

select id,col2

from t_colocated

order by col2;

alter table t_disorganized add constraint pk_t_disorg primary key(id);

t_colocated 中 1到100000顺序插入,聚合因子比较低。

t_disorganized随机插入，聚合因子比较高

分析聚合因子：

set linesize 1000

select index_name,

blevel,

leaf_blocks,

num_rows,

distinct_keys,

clustering_factor

from user_ind_statistics

where table_name in('T_COLOCATED','T_DISORGANIZED');

其中CLUSTERING_FACTOR：表明有多少临近的索引条目指到不同的数据块。

[此贴子已经被作者于2015-06-28 01:30:13编辑过]

王硕 · 发表于 2015-4-6 23:04:02

5.2.1.7活用索引有序

1.order by 排序优化

未排序前性能开销小：

set autotrace traceonly

set linesize 1000

drop table t purge;

create table t as select * from dba_objects;

set autotrace traceonly

select * from t where object_id>2;

排序后开销变大：

select * from object_id2 order by object_id;

5.2.1.8不可不说的主外键设计

第一：主键本身是一种索引，

第二：可以保证表中主键所在列的唯一性,

第三：可以有效地限制外键依赖的表的记录的完整性。

1.外键上的索引与性能

t_p设主键，t_c设外键。

drop table t_p cascade constraints purge;

drop table t_c cascade constraints purge;

create table t_p(id number,name varchar2(30));

alter table t_p add constraint t_p_id_pk primary key (id);

create table t_c (id number,fid number,name varchar2(30));

alter table t_c add constraint fk_t_c foreign key (fid) references t_p (id);

insert into t_p select rownum, table_name from all_tables;

insert into t_c select rownum, mod(rownum,1000)+1,object_name

from all_objects;

外键未建索引前的表连接性能分析：

set autotrace traceonly

set linesize 1000

select a.id,a.name,b.name from t_p a,t_c b where a.id=b.id and a.id=880;

外键建索引后的表连接性能分析：

create index ind_t_c_fid on t_c (fid);

select a.id,a.name,b.name from t_p a,t_c b where a.id=b.id and a.id=880;

逻辑读变低。

2.外键索引与锁的避免

外键有索引，没有死锁的存在：

会话1：select sid from v$mystat where rownum=1;

delete t_c where id=2;

会话2：select sid from v$mystat where rownum=1;

delete t_p where id=2000;

因为外键所在的列建有索引，避免了被锁。

3.主外键约束简单证明：

外键所在表的外键列取值必须在主表中的主键列有记录。

王硕 · 发表于 2015-4-6 23:04:14

5.2.1.9组合索引高效设计要领

1.组织索引能在适当的场合避免回表，动作为：table access by index rowid

2.例如

select count(*) from t where a=1;

select count(*) from t where b=2;

select count(*) from t where a=1 and b=2;

如果a=1 and b=2 的返回比前面单独a=1 或者b=2 的返回值少很多，在a ，b两个列建立组合索引就会高效很多，但不宜超过3个字段组合。

3.

（1）等值查询情况效率一样

drop table t purge;

create table t as select * from dba_objects;

create index idx1_object_id on t(object_id,object_type);

create index idx2_object_id on t(object_type,object_id);

set autotrace traceonly

set linesize 1000

select /*+index(t,idx1_object_id)*/ * from t where object_id=20 and object_type='TABLE';

select /*+index(t,idx2_object_id)*/ * from t where object_id=20 and object_type='TABLE';

consistent gets ,cost 代价相同，所以在等值查询下，组合索引的列无论哪一列在前，性能都一样。

（2）范围查询情况差异明显

set autotrace traceonly

set linesize 1000

select /*+index(t,idx1_object_id)*/ * from t where object_id>=20 and object_id<2000 and object_type='TABLE';

select /*+index(t,idx2_object_id)*/ * from t where object_id>=20 and object_id<2000 and object_type='TABLE';

逻辑读第一个为60，第二个为55,所以在组合索引中，当一列是范围查询，一列是等值查询的情况下，等值在前，范围在后更高效

王硕 · 发表于 2015-4-6 23:04:27

5.2.1.10索引的危害

1.索引越多插入明显慢得多

drop table t_no_idx purge;

drop table t_1_idx purge;

drop table t_2_idx purge;

drop table t_3_idx purge;

drop table t_n_idx purge;

create table t_no_idx as select * from dba_objects;

insert into t_no_idx select * from t_no_idx;

commit;

select count(*) from t_no_idx;

create table t_1_idx as select * from t_no_idx;

create index idx_1_1 on t_1_idx(object_id);

create table t_2_idx as select * from t_no_idx;

create index idx_2_1 on t_2_idx(object_id);

create index idx_2_2 on t_2_idx(object_name);

create table t_3_idx as select * from t_no_idx;

create index idx_3_1 on t_3_idx(object_id);

create index idx_3_2 on t_3_idx(object_name);

create index idx_3_3 on t_3_idx(object_type);

每张表记录都是1781312条。

set timing on

insert into t_no_idx select * from t_no_idx where rownum<=100000;

insert into t_1_idx select * from t_1_idx where rownum<=100000;

commit;

insert into t_2_idx select * from t_2_idx where rownum<=100000;

insert into t_3_idx select * from t_3_idx where rownum<=100000;

结果没有索引，插入最快，索引越多，插入越慢。更新记录就更新了索引，就要维护索引那种有序排列的结构，使得开销变大。

2.无序插入索引影响更大

set timing on

insert into t_no_idx select * from t_no_idx where rownum<=100000 order by dbms_random.random ;

commit;

insert into t_1_idx select * from t_1_idx where rownum<=100000 order by dbms_random.random ;

insert into t_2_idx select * from t_1_idx where rownum<=100000 order by dbms_random.random ;

insert into t_3_idx select * from t_3_idx where rownum<=100000 order by dbms_random.random ;

order by dbms_random.random 表示无序插入，会重组原来的索引，重新定义，所以变慢。

优化：如果需要插入的数据过大，可使表的索引失效，等插入完毕再生效索引。

3.delete对性能影响

对于DELETE语句，在海量数据库定位删除少数记录是，这个条件列很必要。但是过多的列有索引会影响明显，因为其他列的索引也会被更新，在经常删除大量数据的时候，危害加剧。

delete删除索引列后，索引块中的相关需删除记录只是被打上一个删除标志而已，其实并没有真正删除。

4.建索引动作引发排序及锁

在建立索引是需要把当前索引列的列植取出来，排序后依次插入快中形成索引块，加上锁为了避免此时列植被更新，导致顺序又变化了，影响建索引的工作。

要避免在库繁忙时建立索引。

王硕 · 发表于 2015-4-6 23:04:39

5.2.1.11合理控制索引数量

对需要跟踪的索引进行监控：

alter index 索引名 monitoring usage;

通过观察v$object_usage进行跟踪：(未监控索引时，v$object_usage不存在任何记录)

select * from b$object_usage;

drop table t purge;

create table t as select * from dba_objects;

create index idx_t_id on t(object_id);

create index idx_t_name on t (object_name);

select * from v$object_name

alter index idx_t_id monitoring usage;

alter index idx_t_name monitoring usage;

set linesize 166

col INDEX_NAME for a10

col TABLE_NAME for a10

col MONITORING for a10

col USED for a10

col START_MONITORING for a25

col END_MONITORING for a25

select * from v$object_usage;

USED =NO 表示从监控以来没被使用过

查询必使用object_id 列的索引：

select object_id from t where object_id =19;

select * from v$object_usage;

解除监控：

alter index idx_t_id nomonitoring usage;

王硕 · 发表于 2015-4-6 23:04:50

5.2.2.1统计条数

drop table t purge;

create table t as select * from dba_object；

insert into t select * from t;

update t set object_id=rownum;

commit;

set autotrace on

set linesize 1000

select count(*) from t;

在object_id 列建立索引，并设置该属性为非空，测试count（*）的执行计划：

create index idx_t_obj on t(object_id);

alter table T modify object_id not null;

set autotrace on

select count(*) from t;

对t表的STATUS列建一个位图索引：

create bitmap index idx_bitm_t_status on t(status);

select count(*) from t;

第一次查询无索引，全表扫描，代价10733，逻辑读48871；

第二次在object_id建立索引，通过索引快速全部扫描，代价1874，逻辑读8330；

第三次在status加位图索引，代价88，逻辑读95。

结论：count（*）的性能，在非空有btree索引下，一般用到该索引的性能远高于全表扫描。不过性能更高的却是有位图的索引。而已位图索引不需要考虑空值问题。

王硕 · 发表于 2015-4-6 23:05:02

5.2.2.2即席查询（多维度报表查询）

drop table t purge;

create table t

(name_id,

gender not null,

location not null,

age_group not null,

date

)

as

select rownum,

decode(ceil(dbms_random.values(0.2)),

1,'M',

2,'F')gender,

ceil(dbms_random.values(1,50)) location,

decode(ceil(dbms_random.values(0,3)),

1,'child',

2,'young',

3,'middle_age',

4,'old'),

rpad('*',20,'*')

from dual

在没有建立索引下：

set linesize 1000

set autotrace traceonly

select *

from t

where gender='M'

and location in (1,10,30)

and age_group='child';

cost为124

即席查询中，oracle不选择组合索引:

create idex idx_union on t(gender,location,age_group);

select *

from t

where gende='M'

and location in (1,10,30)

and age_group='child';

依然是全表扫描，cost为124

强制即席查询使用组合索引，情况更糟：

select /*+index(t,idx_union)*/ *

from t

where gender='M'

and location in (1,10,30)

and age_group='child';

代价为59960

在gender，location，age_group三个字段分别建立位图索引：

create bitmap index gender_idx on t(gender);

create birmap index location_idx on t(location);

create bitmap index age_group_idx on t(age_group);

select *

from t

where gender='M'

and location in (1,10,30)

and age_group='41 and over';

代价11。

王硕 · 发表于 2015-4-6 23:05:14

5.2.2.3位图索引不适合更新操作

建立第一个链接：

sqlplus ljb/ljb

select sid from v$mystat where rownum=1;

insert into t(name_id,gender,location,age_group,date) values(100001,'M',45,'child',rpad('*',20,'*'));

建立第二个链接：（不能插入）

sqlplus ljb/ljb

select sid from v$mystat where rownum=1;

insert into t(name_id,gender,location,age_group,date) values(100002,'M',46,'young',rpad('*',20,'*'));

建立第三个链接：（又能成功插入）

sqlplus ljb/ljb

select sid from v$mystat where rownum=1;

insert into t(name_id,gender,location,age_group,date) values(100003,'F',47,'Middle_age',rpad('*',20,'*'));

建立第四个链接：（不能插入）

sqlplus ljb/ljb

select sid from v$mystat where rownum=1;

insert into t(name_id,gender,location,age_group,date) values(100003,'F',48,'old',rpad('*',20,'*'));

第一次插入gender=’M'的记录，未提交前，再有新的session插入gender='M'的记录就会卡住，而插入gender='F'的记录就可以。不支持高并发。

删除location，age_group位图索引：

rollback；（4个会话）

drop index location_idx;

drop index age_group_idx;

测试锁：

第一个会话：

delete from t where gender='M' and location=25;

第二个会话：（插入带M的记录被阻挡，一下语句3个语句都会被阻止）

insert into t (name_id,gender,location,age_group,date) values(100001,'M',78,'young','TTT');

update t ser gender='M' where location=25;

delete from t where gender='M';

(以下不进行阻止，update 只要不更新位图索引所在的列即可）：

insert into t (name_id,gender,location,age_group,date) values

(100001,'M',78,'young','TTT');

delete from t where gender='F';

update t set location=100 where rowid not in (select rowid from t where gender='F' and location=25);

帐号		自动登录	找回密码
密码			注册