postgresql—-Gist索引详解数据库

GiST的意思是通用的搜索树(Generalized Search Tree)。它是一种平衡树结构的访问方法,在系统中作为一个基本模版,可以使用它实现任意索引模式。B-trees, R-trees和许多其它的索引模式都可以用GiST实现。

上面一段高能的官方解释有点难以理解，暂时也不需要使用Gist实现其他的索引模式，就简单的介绍下Gist索引如何使用，

与Btree索引比较的优缺点：

优点：

Gist索引适用于多维数据类型和集合数据类型，和Btree索引类似，同样适用于其他的数据类型。和Btree索引相比，Gist多字段索引在查询条件中包含索引字段的任何子集都会使用索引扫描，而Btree索引只有查询条件包含第一个索引字段才会使用索引扫描。

缺点：

Gist索引创建耗时较长，占用空间也比较大。

测试表

test=# create table tbl_index(a bigint,b timestamp without time zone,c varchar(12)); 
CREATE TABLE 
test=# insert into tbl_index (a,b,c)  select generate_series(1,3000000),clock_timestamp()::timestamp(0) without time zone,'got u'; 
INSERT 0 3000000

test=# /timing  
Timing is on.

创建Gist索引的前提是已经编译并安装了Gist的扩展，因为我源码编译时已经编译安装了所有的扩展，所以这里只需要在数据库中创建扩展即可。

test=# create extension btree_gist; 
CREATE EXTENSION 
Time: 774.131 ms

创建索引

test=# create index idx_gist_tbl_index_a_b on tbl_index using gist(a,b); 
CREATE INDEX 
Time: 168595.321 ms

示例1.使用字段a查询

test=# explain analyze select * from tbl_index where a=3000000; 
                                                        QUERY PLAN                                                          
--------------------------------------------------------------------------------------------------------------------------- 
 Gather  (cost=1000.00..21395.10 rows=1 width=22) (actual time=310.514..310.517 rows=1 loops=1) 
   Workers Planned: 2 
   Workers Launched: 2 
   ->  Parallel Seq Scan on tbl_index  (cost=0.00..20395.00 rows=0 width=22) (actual time=289.432..289.433 rows=0 loops=3) 
         Filter: (a = 3000000) 
         Rows Removed by Filter: 1000000 
 Planning time: 0.119 ms 
 Execution time: 310.631 ms 
(8 rows) 
 
Time: 311.505 ms

test=# explain analyze select * from tbl_index where a='3000000'; 
                                                            QUERY PLAN                                                              
----------------------------------------------------------------------------------------------------------------------------------- 
 Index Scan using idx_gist_tbl_index_a_b on tbl_index  (cost=0.29..8.30 rows=1 width=22) (actual time=0.104..0.105 rows=1 loops=1) 
   Index Cond: (a = '3000000'::bigint) 
 Planning time: 0.109 ms 
 Execution time: 0.297 ms 
(4 rows) 
 
Time: 1.124 ms

以上两条SQL语句的区别在于第一条SQL语句按照a的类型bigint去查询，而第二条SQL语句却将bigint转成char类型查询，但是结果显示char类型的查询（索引扫描）性能远高于bigint的查询（全表扫描）性能，怀疑是不是创建索引时将bigint转成char类型了（只是猜测），反正Gist索引查询最好使用char。

示例2.使用字段b查询

test=# explain analyze select * from tbl_index where b='2016-06-29 14:54:00'; 
                                                                  QUERY PLAN                                                          
          
------------------------------------------------------------------------------------------------------------------------------------- 
--------- 
 Bitmap Heap Scan on tbl_index  (cost=3373.54..10281.04 rows=171000 width=22) (actual time=37.200..53.564 rows=172824 loops=1) 
   Recheck Cond: (b = '2016-06-29 14:54:00'::timestamp without time zone) 
   Heap Blocks: exact=276 
   ->  Bitmap Index Scan on idx_gist_tbl_index_a_b  (cost=0.00..3330.79 rows=171000 width=0) (actual time=37.139..37.139 rows=172824  
loops=1) 
         Index Cond: (b = '2016-06-29 14:54:00'::timestamp without time zone) 
 Planning time: 0.343 ms 
 Execution time: 60.843 ms 
(7 rows) 
 
Time: 62.359 ms

该查询不包含第一个索引字段，但是仍使用索引扫描，而此条件下Btree索引只能全表扫描。

示例3.使用a and b查询

test=# explain analyze select * from tbl_index where a='3000000' and b='2016-06-29 14:54:00'; 
                                                            QUERY PLAN                                                              
----------------------------------------------------------------------------------------------------------------------------------- 
 Index Scan using idx_gist_tbl_index_a_b on tbl_index  (cost=0.29..8.31 rows=1 width=22) (actual time=0.114..0.115 rows=1 loops=1) 
   Index Cond: ((a = '3000000'::bigint) AND (b = '2016-06-29 14:54:00'::timestamp without time zone)) 
 Planning time: 0.376 ms 
 Execution time: 0.258 ms 
(4 rows) 
 
Time: 1.747 ms

示例4.使用a or b查询

test=# explain analyze select * from tbl_index where a='3000000' or b='2016-06-29 14:54:00'; 
                                                                     QUERY PLAN                                                       
                
------------------------------------------------------------------------------------------------------------------------------------- 
--------------- 
 Bitmap Heap Scan on tbl_index  (cost=3420.58..10755.60 rows=171001 width=22) (actual time=31.142..49.728 rows=172824 loops=1) 
   Recheck Cond: ((a = '3000000'::bigint) OR (b = '2016-06-29 14:54:00'::timestamp without time zone)) 
   Heap Blocks: exact=276 
   ->  BitmapOr  (cost=3420.58..3420.58 rows=171001 width=0) (actual time=31.083..31.083 rows=0 loops=1) 
         ->  Bitmap Index Scan on idx_gist_tbl_index_a_b  (cost=0.00..4.29 rows=1 width=0) (actual time=0.100..0.100 rows=1 loops=1) 
               Index Cond: (a = '3000000'::bigint) 
         ->  Bitmap Index Scan on idx_gist_tbl_index_a_b  (cost=0.00..3330.79 rows=171000 width=0) (actual time=30.981..30.981 rows=1 
72824 loops=1) 
               Index Cond: (b = '2016-06-29 14:54:00'::timestamp without time zone) 
 Planning time: 0.143 ms 
 Execution time: 57.193 ms 
(10 rows) 
 
Time: 58.067 ms

使用and和or查询虽然也是索引扫描，但是和Btree索引相比并没有性能提升。

比较Gist索引和Btree索引的创建耗时和大小

btree索引耗时：

test=# create index idx_btree_tbl_index_a_b on tbl_index using btree(a,b); 
CREATE INDEX 
Time: 5217.976 ms

Gist索引耗时从上面看到是168595.321 ms，是Btree索引耗时的32倍。

大小比较，结果显示Gist索引是Btree索引的3倍多。

test=# select relname,pg_size_pretty(pg_relation_size(oid)) from pg_class where relname like 'idx_%_tbl_index_a_b'; 
         relname         | pg_size_pretty  
-------------------------+---------------- 
 idx_gist_tbl_index_a_b  | 281 MB 
 idx_btree_tbl_index_a_b | 89 MB 
(2 rows) 
 
Time: 4.068 ms

原创文章，作者：ItWorker，如若转载，请注明出处：https://blog.ytso.com/4869.html

postgresql—-Gist索引详解数据库

相关推荐

发表回复