今天由于业务需要,需要在生产库上创建一个 db_link, 每天取源端库一张表一天数据,表大小为2G左右,考虑到这个数据每天只取一次,而且只取一天数据,于是准备在库上创建 db_link, 但在测试过程中发现性能问题。 在创建完 db_link 后,在库上执行 select * from view_xxx limit 1
时,发现十分钟都没有执行完, 而且数据库负载很高,达到 20, 同时 IO 等侍也很高,达到 20, 下面在测试环境下模拟下这种情形。
环境准备
主机: 8核 8G 源库 mpc_db_statistic 源库表 mpc_log_register(表大小: 193 M) 测试库 skytf 视图 view_mpc_log_register
准备在数据库 skytf 创建视图 view_mpc_log_register 取源库 mpc_db_statistic 的数据 mpc_log_register。
mpc_log_register 表结构
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Table "mpc_db_statistic.mpc_log_register" Column | Type | Modifiers ---------------+--------------------------------+---------------------------------------------------------------------- record_id | integer | not null default nextval('mpc_log_register_record_id_seq'::regclass) client_ip | character varying(64 ) | not null client_type | integer | default 0 reg_time | timestamp (0 ) without time zone | reg_account | character varying(32 ) | reg_userid | integer | default 0 update_time | timestamp (0 ) without time zone | default now() imsi | character varying | imei | character varying | manufacturers | character varying | model | character varying | appid | integer | scr_width | integer | scr_height | integer | sms_center | character varying | mem_size | integer | access_ip | character varying(32 ) | Indexes : "mpc_log_register_pkey" PRIMARY KEY, btree (record_id) "log_register_client_id_index" btree (client_ip)
创建视图
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
CREATE OR REPLACE VIEW view_mpc_log_register AS SELECT * FROM dblink('dbname=mpc_db_statistic host=localhost port=1921 user=mpc_db_statistic password=mpc_db_statistic' , 'select * from mpc_log_register' ) as t1 ( record_id integer , client_ip character varying (64 ) , client_type integer , reg_time timestamp (0 ) without time zone , reg_account character varying (32 ) , reg_userid integer , update_time timestamp (0 ) without time zone , imsi character varying , imei character varying , manufacturers character varying , model character varying , appid integer , scr_width integer , scr_height integer , sms_center character varying , mem_size integer , access_ip character varying (32 ) );
配置 pg_hba.conf
1
host mpc_db_statistic mpc_db_statistic 127.0 .0 .1 /32 md5
备注:这行需要加到 pg_hba.conf ;另外 # IPv4 local connections:
部分第一行,如果不加这行,那么在查询视图时可能会报以下ERROR:
1 2 3
ERROR: password is required DETAIL: Non-superuser cannot connect if the server does not request a password. HINT: Target servers authentication method must be changed .
查询视图测试
1 2 3 4 5 6 7
skytf=> select record_id,client_ip from view_mpc_log_register limit 1; NOTICE: identifier "dbname=mpc_db_statistic host=localhost port=1921 user=mpc_db_statistic password=mpc_db_statistic" will be truncated to "dbname=mpc_db_statistic host=localhost port=1921 user=mpc_db_st" record_id | client_ip -----------+--------------------- 235215 | 117.136.25.45:30119 (1 row) Time: 16697.482 ms
查询当前进程
1 2 3 4 5 6
postgres=# select datname,procpid,current_query from pg_stat_activity where current_query !='<IDLE>' ; datname | procpid | current_query ------------------+---------+-------------------------------------------------------------------------------------------- skytf | 29265 | select record_id,client_ip from view_mpc_log_register where record_id=1 limit 1 ; postgres | 29853 | select datname,procpid,current_query from pg_stat_activity where current_query !='<IDLE>' ; mpc_db_statistic | 29946 | select * from mpc_log_register
备注:一开始查询,可以看到有两个进程,一个是 skytf 库查询视图进程,另一个是 mpc_db_statistic库的全表扫描进程, 奇怪了,进程 29265 明明是根据索引字段 record_id 来查询的,为什么到了目标库就是全表扫描 select * from mpc_log_register
?
继续查询当前进程
1 2 3 4 5 6
postgres=# select datname,procpid,current_query from pg_stat_activity where current_query !='<IDLE>' ; datname | procpid | current_query ----------+---------+-------------------------------------------------------------------------------------------- skytf | 29265 | select record_id,client_ip from view_mpc_log_register where record_id=1 limit 1 ; postgres | 29853 | select datname,procpid,current_query from pg_stat_activity where current_query !='<IDLE>' ; (2 rows)
备注:过一会儿,发现目标库进程 29946 执行完毕,此时还有进程 29265 在跑。
查看内存使用
SQL执行过程中内存使用情况
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
[root@ ~] total used free shared buffers cached Mem: 16053 656 15396 0 8 501 -/+ buffers/cache: 146 15906 Swap: 16386 31 16355 [root@ ~] total used free shared buffers cached Mem: 16053 792 15260 0 8 501 -/+ buffers/cache: 282 15771 Swap: 16386 31 16355 [root@ ~] total used free shared buffers cached Mem: 16053 849 15203 0 8 501 -/+ buffers/cache: 339 15714 Swap: 16386 31 16355 [root@ ~] total used free shared buffers cached Mem: 16053 1012 15041 0 8 501 -/+ buffers/cache: 501 15551 Swap: 16386 31 16355 [root@ ~] total used free shared buffers cached Mem: 16053 1126 14926 0 8 501 -/+ buffers/cache: 616 15436 Swap: 16386 31 16355 [root@ ~] total used free shared buffers cached Mem: 16053 1166 14886 0 8 501 -/+ buffers/cache: 656 15396 Swap: 16386 31 16355 [root@ ~] total used free shared buffers cached Mem: 16053 1209 14843 0 8 501 -/+ buffers/cache: 699 15354 Swap: 16386 31 16355 [root@ ~] total used free shared buffers cached Mem: 16053 1318 14734 0 8 514 -/+ buffers/cache: 795 15257 Swap: 16386 31 16355 [root@ ~] total used free shared buffers cached Mem: 16053 1495 14558 0 8 546 -/+ buffers/cache: 940 15113 Swap: 16386 31 16355 [root@ ~] total used free shared buffers cached Mem: 16053 1585 14467 0 8 563 -/+ buffers/cache: 1014 15039 Swap: 16386 31 16355 [root@ ~] total used free shared buffers cached Mem: 16053 1672 14380 0 8 578 -/+ buffers/cache: 1085 14968 Swap: 16386 31 16355 [root@ ~] total used free shared buffers cached Mem: 16053 1762 14290 0 8 595 -/+ buffers/cache: 1158 14895 Swap: 16386 31 16355 [root@ ~] total used free shared buffers cached Mem: 16053 1858 14194 0 8 614 -/+ buffers/cache: 1235 14817 Swap: 16386 31 16355 [root@ ~] total used free shared buffers cached Mem: 16053 1952 14100 0 8 632 -/+ buffers/cache: 1311 14741 Swap: 16386 31 16355 [root@ ~] total used free shared buffers cached Mem: 16053 2044 14008 0 8 650 -/+ buffers/cache: 1385 14668 Swap: 16386 31 16355 [root@ ~] total used free shared buffers cached Mem: 16053 2140 13912 0 8 669 -/+ buffers/cache: 1462 14590 Swap: 16386 31 16355 [root@ ~] total used free shared buffers cached Mem: 16053 658 15394 0 8 501 -/+ buffers/cache: 148 15904 Swap: 16386 31 16355
备注:发现内存 used 值开始由 656M 一直增加到 2140 M, 当 SQL执行完后,又再次回到 658 M, 推测SQL通过 db_link 查询远端数据时,会将远端表先取到本地库内存里,这个步骤非常耗时,耗IO,同时本地库负载也比异常高。
结果出来了
1 2 3 4 5 6 7
skytf=> select record_id,client_ip from view_mpc_log_register limit 1; NOTICE: identifier "dbname=mpc_db_statistic host=localhost port=1921 user=mpc_db_statistic password=mpc_db_statistic" will be truncated to "dbname=mpc_db_statistic host=localhost port=1921 user=mpc_db_st" record_id | client_ip -----------+--------------------- 235215 | 117.136.25.45:30119 (1 row) Time: 16697.482 ms
备注:这张表不太大,只有193M,都花了 16 秒左右。
执行计划
1 2 3 4 5 6 7
skytf => explain select record_id,client_ip from view_mpc_log_register where record_id =1 limit 1; QUERY PLAN ----------------------------------------------------------------------- Limit (cost =0.00..2.50 rows =1 width =150) -> Function Scan on dblink t1 (cost =0.00..12.50 rows =5 width =150) Filter: (record_id = 1) (3 rows)
备注: 走的是函数索引。
大表测试
由于前面源表大小才 193M,效果不明显,后来拿了张2G大小的表做实验,通过视图取一条数据,后来数据库负载直接升到40左右,连数据库主机都 SSH 不上了。汗, PostgreSQL 的 db_link 得慎用。
总结
PostgreSQL 的 db_link 要慎用,没有特殊的需求不要使用,如果业务需要可以通过其它方式代替;
PostgreSQL 的 db_link 在使用过程中会先把对端表数据全表扫描到本地库,尽管查询视图时用到了索引字段,但依然不会走索引扫描,而是走 “Function Scan” ,从而带来严重的性能问题 (负载高,IO高)。
原创文章,作者:254126420,如若转载,请注明出处:https://blog.ytso.com/236434.html