在 PostgreSQL-XC 体系中，数据分布有两种形式，即 Replication 或者 Distribution，这里简单描述下 PostgreSQL-XC 这两种数据分布，下面是实验过程。

Replication Or Distribution

1.1 Replication
表的每一行存在所有数据节点( datanode )中，即每个数据节点都有完整的表数据。

1.2 Distribution
表的每一行仅存在一个数据节点( datanode )中，即每个数据节点仅保留表的部分数据。

Replication 表测试

2.1创建 replication 表并插入数据

francs=> create table test_replication (id int4 primary key,name varchar(32)) distribute by replication;  
NOTICE:CREATE TABLE / PRIMARY KEY will create implicit index "test_replication_pkey"  for table "test_replication"  
CREATE TABLE   
francs=> insert into test_replication select generate_series(1,10000),'replication';  
INSERT 0  10000    
francs=>  select count(*)  from test_replication ;  
count  
-------  
10000  
(1 row)

2.2 到数据节点一验证数据

[pgxc@redhatB gtm_standby]$ psql -p 15431 francs francs  
psql (PGXC 1.0beta2, based on PG 9.1.3)  
Type  "help"  for help.  
  
francs=>  select count(*)  from test_replication ;  
WARNING: Do  not have a GTM snapshot available  
WARNING: Do  not have a GTM snapshot available  
count  
-------  
10000  
(1 row)

2.3到数据节点二验证数据

[pgxc@redhatB pg_root]$ psql -p 15432 francs francs  
psql (PGXC 1.0beta2, based on PG 9.1.3)  
Type  "help"  for help.  
  
francs=>  select count(*)  from test_replication ;  
WARNING: Do  not have a GTM snapshot available  
WARNING: Do  not have a GTM snapshot available  
count  
-------  
10000  
(1 row)

备注：可见 replication 表数据在每个数据节点都有完整数据( 如果在创建表时仅指定数据节点的情况除外)。

Distribute 表测试

Distribute 表数据分片方式有多种，包括 ROUND ROBIN， HASH ，MODULO，接下来以 hash ,rounnd robin 分片方式举例。

3.1 创建 hash 分区表并插入数据

francs=> create table test_hash (id int4 primary key ,name varchar(32)) distribute by hash(id);NOTICE:CREATE TABLE / PRIMARY KEY will create implicit index "test_hash_pkey"  for table "test_hash"  
CREATE TABLE  
francs=> insert into test_hash select generate_series(1,10000),'hash';  
INSERT 0  10000  
francs=>  select count(*)  from test_hash;  
count  
-------  
10000  
(1 row)

3.2 到数据节点一验证数据

[pgxc@redhatB gtm_standby]$ psql -p 15431 francs francs  
psql (PGXC 1.0beta2, based on PG 9.1.3)  
Type  "help"  for help.  
  
francs=>  select count(*)  from test_hash;  
WARNING: Do  not have a GTM snapshot available  
WARNING: Do  not have a GTM snapshot available  
count  
-------  
 5039  
(1 row)

3.2 到数据节点二验证数据

[pgxc@redhatB pg_root]$ psql -p 15432 francs francs  
psql (PGXC 1.0beta2, based on PG 9.1.3)  
Type  "help"  for help.  
  
francs=>  select count(*)  from test_hash;  
WARNING: Do  not have a GTM snapshot available  
WARNING: Do  not have a GTM snapshot available  
count  
-------  
 4961  
(1 row)

备注：从上面看出 distributed 表数据节点表只存部分数据，当然创建表的时候也可以指定数据节点。

执行计划比较

比较以下场景下复制表和分片表的执行计划。

查询单条记录场景

francs=> explain verbose select  *  from test_hash where id=1;  
QUERY PLAN   
----------------------------------------------------------------------------  
Data  Node  Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)  
 Output: test_hash.id, test_hash.name  
 Node/s: db_1 Remote query: SELECT id, name FROM test_hash WHERE (id =  1)  
(4 rows)  
  
  
francs=> explain verbose select  *  from test_replication where id=1;  
QUERY PLAN   
----------------------------------------------------------------------------  
Data  Node  Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)  
 Output: test_replication.id, test_replication.name  
 Node/s: db_1 Remote query: SELECT id, name FROM test_replication WHERE (id =  1)  
(4 rows)  
  
francs=> explain select  *  from test_hash where name='A';  
QUERY PLAN   
----------------------------------------------------------------------------  
Data  Node  Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)  
 Node/s: db_1, db_2(2 rows)

备注：只查询单条记录， replication 只扫描一个数据节点。而 distribute 表如果根据分区键查询，

扫描一个节点，如果根据非分区键查询，则需要扫描多个节点。

count(*) 场景

francs=> explain select count(*)  from test_hash;  
QUERY PLAN   
---------------------------------------------------------------------------------------------  
Aggregate (cost=2.50..2.51 rows=1 width=0)  
 -> Materialize (cost=0.00..0.00 rows=0 width=0)  
 -> Data  Node  Scan on "__REMOTE_GROUP_QUERY__" (cost=0.00..0.00 rows=1000 width=0)  
 Node/s: db_1, db_2  
(4 rows)

备注：distributd 表 count 语句扫描所有数据节点。

francs=> explain select count(*)  from test_replication;  
QUERY PLAN   
----------------------------------------------------------------------------  
Data  Node  Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)  
 Node/s: db_1  
(2 rows)

备注：replication 表 count 语句只扫描一个数据节点；

round robin分片方式

francs=> create table test_round (id int4,name varchar(32)) distribute by round robin;CREATE TABLE  
  
  
francs=> insert into test_round select generate_series(1,10000),'a';  
INSERT 0  10000  
  
  
francs=> explain select  *  from test_round where id=1;  
QUERY PLAN   
----------------------------------------------------------------------------  
Data  Node  Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)  
Node/s: db_1, db_2(2 rows)

备注： Round Robin 方式会将表数据分散到各个数据节点，但查询数据时由于不知道分片规则，故需要遍历所有数据节点。

总结

5.1 replication表

replication 表查询时只需要读任一个数据节点；
replication 表更改数据时，需要同时对所有数据节点进行，代价较大；
replication 适用于读比较繁忙的静态数据表。

5.2 distribute 表

单独查询或者写一条记录时，如果根据分区键查询，只需要扫描一个数据节点 ( Round Robin 分片方式除外)；
单独查询或者写一条记录时，如果根据非分区键，需要扫描所有数据节点。
如果查询需要扫描多个数据节点，性能会有所降低；

手册解释

REPLICATION Each row of the table will be replicated into all the datanode of the Postgres-XC database cluster.

HASH ( column_name ) Each row of the table will be placed based on the hash value of the specified column. Following type
is allowed as distribution column: INT8, INT2, OID, INT4, BOOL, INT2VECTOR, OIDVECTOR, CHAR, NAME, TEXT,
BPCHAR, BYTEA, VARCHAR, FLOAT4, FLOAT8, NUMERIC, CASH, ABSTIME, RELTIME, DATE, TIME, TIMESTAMP, TIMESTAMPTZ,
INTERVAL, and TIMETZ.

参考

http://postgres-xc.sourceforge.net/docs/1_0/sql-createtable.html

原创文章，作者：506227337，如若转载，请注明出处：https://blog.ytso.com/237865.html

PostgreSQL-XC : Data Replication Or Distribution ?

Replication Or Distribution

Replication 表测试

Distribute 表测试

执行计划比较

查询单条记录场景

count(*) 场景

总结

手册解释

参考

发表回复

PostgreSQL-XC : Data Replication Or Distribution ?

Replication Or Distribution

Replication 表测试

Distribute 表测试

执行计划比较

查询单条记录场景

count(*) 场景

总结

手册解释

参考

相关推荐

发表回复