PostgreSQL:浅谈 Fsync 参数

这两天有朋友在做性能测试,咨询 PG 的 fsync 参数对性能的影响。这个参数比较特殊,通常情况都需要打开,今天有空研究下这个参数的影响。

关于 Fsync 参数

如果 fsync 参数设置成 on ,PostgreSQL 服务通过调用系统fsync()或其它方式确保更新已经物理写到磁盘,这样就保证了数据库集群将在操作系统或者硬件崩溃的情况下恢复到一个一致的状态。虽然关闭这个参数有一定提升,通常情况下需要打开这个参数,除非您能经受掉电或硬件故障带来的数据丢失,否则不要关闭这参数,下面做两方面测试:1) 关闭 sync 参数观察对 SELECT/UPDATE 压力测试的影响;2)关闭参数,冷关闭主机测试数据库是否能正常启动。

场景一: Fsync = on

测试模型:一张 1000 万的表

1
2
3
4
5
drop table if exists test_sync;
create table test_sync(id int8,name text,creat_time timestamp(0) without time zone default clock_timestamp());
insert into test_sync(id,name) select n,n||'_test' from generate_series(1,10000000) n;
alter table test_sync add primary key(id);
vacuum analyze test_sync;

设置 fsync 参数

1
2
3
4
5
6
7
8
9
pg95@pg95:~/script/load_test> psql
psql (9.5alpha2)
Type "help" for help.

postgres=# show fsync;
fsync
-------
on
(1 row)

主键查询

1
2
3
4
[pg95@pg95 load_test]$ cat select_1.sql
setrandom v_id 1 10000000

select name from test_sync where id=:v_id;

pgbench

1
pgbench -c 64  -T 120  -n -N -M prepared -d francs -U francs -f select_1.sql > select_1.out  2>&1  &

pgbench 结果:

1
2
3
4
5
6
7
8
9
10
transaction type:  Custom query
scaling factor: 1
query mode: prepared
number of clients: 64
number of threads: 1
duration: 120 s
number of transactions actually processed: 2031338
latency average: 3.781 ms
tps = 16909.975656 (including connections establishing)
tps = 16930.166433 (excluding connections establishing)

主键更新

1
2
3
4
[pg95@pg95 load_test]$ cat update_1.sql
setrandom v_id 1 10000000

update test_sync set name='off' where id=:v_id;

pgbench

1
pgbench -c 64  -T 120  -n -N -M prepared -d francs -U francs -f update_1.sql > update_1.out  2>&1  &

pgbench 结果:

1
2
3
4
5
6
7
8
9
10
transaction type:  Custom query
scaling factor: 1
query mode: prepared
number of clients: 64
number of threads: 1
duration: 120 s
number of transactions actually processed: 713242
latency average: 10.768 ms
tps = 5939.386731 (including connections establishing)
tps = 5946.560775 (excluding connections establishing)

场景二: Fsync = off

设置 fsync 参数

1
2
3
4
5
6
7
8
9
pg95@pg95:~/script/load_test> psql
psql (9.5alpha2)
Type "help" for help.

postgres=# show fsync;
fsync
-------
off
(1 row)

为了测试准确重新刷以下脚本

1
2
3
4
5
drop table if exists test_sync;
create table test_sync(id int8,name text,creat_time timestamp(0) without time zone default clock_timestamp());
insert into test_sync(id,name) select n,n||'_test' from generate_series(1,10000000) n;
alter table test_sync add primary key(id);
vacuum analyze test_sync;

主键查询

1
2
3
4
[pg95@pg95 load_test]$ cat select_1.sql
setrandom v_id 1 10000000

select name from test_sync where id=:v_id;

pgbench

1
pgbench -c 64  -T 120  -n -N -M prepared -d francs -U francs -f select_1.sql > select_1.out  2>&1  &

pgbench 结果:

1
2
3
4
5
6
7
8
9
10
transaction type:  Custom query
scaling factor: 1
query mode: prepared
number of clients: 64
number of threads: 1
duration: 120 s
number of transactions actually processed: 2006681
latency average: 3.827 ms
tps = 16707.303006 (including connections establishing)
tps = 16727.661063 (excluding connections establishing)

主键更新

1
2
3
4
[pg95@pg95  load_test]$ cat update_1.sql
setrandom v_id 1 10000000

update test_sync set name='off' where id=:v_id;

pgbench

1
pgbench -c 64  -T 120  -n -N -M prepared -d francs -U francs -f update_1.sql > update_1.out  2>&1  &

pgbench 结果:

1
2
3
4
5
6
7
8
9
10
transaction type:  Custom query
scaling factor: 1
query mode: prepared
number of clients: 64
number of threads: 1
duration: 120 s
number of transactions actually processed: 1505343
latency average: 5.102 ms
tps = 12530.405426 (including connections establishing)
tps = 12545.757516 (excluding connections establishing)

Fsync = off 时模拟数据库异常

接下来测试当 fsync=off 的情况下数据库异常宕掉的情况,做测试前做好数据库备份。

  1. 跑 update 脚本,给数据库施加压力

    1
    2
    pg95@pg95 :~/script/load_test> pgbench -c 64  -T 300  -n -N -M prepared -d francs -U francs -f update_1.sql > update_1.out  2>&1  &
    [1] 52424
  2. 异常关闭数据库,强制关闭数据库主机,模拟掉电。

  3. 再次启动数据库,发现数据库无法启动,数据库日志如下:

数据库日志

1
2
3
4
5
6
7
2015-10-28  16:51:59.942 CST,,,10141,,56308caf.279d,1,,2015-10-28  16:51:59 CST,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
2015-10-28 16:51:59.944 CST,,,10143,,56308caf.279f,1,,2015-10-28 16:51:59 CST,,0,LOG,00000,"database system was interrupted; last known up at 2015-10-29 00:37:41 CST",,,,,,,,,""
2015-10-28 16:51:59.944 CST,,,10143,,56308caf.279f,2,,2015-10-28 16:51:59 CST,,0,LOG,00000,"invalid primary checkpoint record",,,,,,,,,""
2015-10-28 16:51:59.944 CST,,,10143,,56308caf.279f,3,,2015-10-28 16:51:59 CST,,0,LOG,00000,"invalid secondary checkpoint record",,,,,,,,,""
2015-10-28 16:51:59.944 CST,,,10143,,56308caf.279f,4,,2015-10-28 16:51:59 CST,,0,PANIC,XX000,"could not locate a valid checkpoint record",,,,,,,,,""
2015-10-28 16:51:59.944 CST,,,10141,,56308caf.279d,2,,2015-10-28 16:51:59 CST,,0,LOG,00000,"startup process (PID 10143) was terminated by signal 6: Aborted",,,,,,,,,""
2015-10-28 16:51:59.944 CST,,,10141,,56308caf.279d,3,,2015-10-28 16:51:59 CST,,0,LOG,00000,"aborting startup due to startup process failure",,,,,,,,,""

备注:数据库果然无法启动,这时你可能通过 pg_resetxlog 救活数据库,当然会丢失一部分事务;或者如果有备份,从备份中恢复数据库。

总结

把上面的测试结果汇总,如下图,可见,关闭 fsync 对 SELECT 无影响, 而 UPDATE 性能有较大提升,这个场景提升了 111%;当然关闭 fsync 参数的代价是巨大的,当数据库主机遭受操作系统故障或硬件故障时,数据库很有可能无法启动,并丢失数据,建议生产库不要关闭这参数。

fsync SELECT 场景TPS UPDATE 场景TPS
on 16930 5946
off 16727 12545

参考

Fsync (boolean)

原创文章,作者:1402239773,如若转载,请注明出处:https://blog.ytso.com/240103.html

(0)
上一篇 2022年2月15日
下一篇 2022年2月15日

相关推荐

发表回复

登录后才能评论