1.2 Distribution 表的每一行仅存在一个数据节点( datanode )中,即每个数据节点仅保留表的部分数据。
Replication 表测试
2.1创建 replication 表并插入数据
1 2 3 4 5 6 7 8 9 10
francs=> create table test_replication (id int4 primarykey,name varchar(32)) distribute by replication; NOTICE:CREATE TABLE / PRIMARYKEY will create implicit index"test_replication_pkey"for table "test_replication" CREATE TABLE francs=> insertinto test_replication select generate_series(1,10000),'replication'; INSERT010000 francs=> selectcount(*) from test_replication ; count ------- 10000 (1 row)
2.2 到数据节点一验证数据
1 2 3 4 5 6 7 8 9 10 11
[pgxc@redhatB gtm_standby]$ psql -p 15431 francs francs psql (PGXC 1.0beta2, based on PG 9.1.3) Type "help" for help. francs=> select count(*) from test_replication ; WARNING: Do not have a GTM snapshot available WARNING: Do not have a GTM snapshot available count ------- 10000 (1 row)
2.3到数据节点二验证数据
1 2 3 4 5 6 7 8 9 10 11
[pgxc@redhatB pg_root]$ psql -p 15432 francs francs psql (PGXC 1.0beta2, based on PG 9.1.3) Type "help" for help. francs=> select count(*) from test_replication ; WARNING: Do not have a GTM snapshot available WARNING: Do not have a GTM snapshot available count ------- 10000 (1 row)
Distribute 表数据分片方式有多种,包括 ROUND ROBIN, HASH ,MODULO,接下来以 hash ,rounnd robin 分片方式举例。
3.1 创建 hash 分区表并插入数据
1 2 3 4 5 6 7 8 9
francs=> create table test_hash (id int4 primarykey ,name varchar(32)) distribute by hash(id);NOTICE:CREATE TABLE / PRIMARYKEY will create implicit index"test_hash_pkey"for table "test_hash" CREATE TABLE francs=> insertinto test_hash select generate_series(1,10000),'hash'; INSERT010000 francs=> selectcount(*) from test_hash; count ------- 10000 (1 row)
3.2 到数据节点一验证数据
1 2 3 4 5 6 7 8 9 10 11
[pgxc@redhatB gtm_standby]$ psql -p 15431 francs francs psql (PGXC 1.0beta2, based on PG 9.1.3) Type "help" for help. francs=> select count(*) from test_hash; WARNING: Do not have a GTM snapshot available WARNING: Do not have a GTM snapshot available count ------- 5039 (1 row)
3.2 到数据节点二验证数据
1 2 3 4 5 6 7 8 9 10 11
[pgxc@redhatB pg_root]$ psql -p 15432 francs francs psql (PGXC 1.0beta2, based on PG 9.1.3) Type "help" for help. francs=> select count(*) from test_hash; WARNING: Do not have a GTM snapshot available WARNING: Do not have a GTM snapshot available count ------- 4961 (1 row)
francs=> explain verbose select * from test_hash where id=1; QUERY PLAN ---------------------------------------------------------------------------- Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0) Output: test_hash.id, test_hash.name Node/s: db_1 Remote query: SELECT id, name FROM test_hash WHERE (id = 1) (4 rows) francs=> explain verbose select * from test_replication where id=1; QUERY PLAN ---------------------------------------------------------------------------- Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0) Output: test_replication.id, test_replication.name Node/s: db_1 Remote query: SELECT id, name FROM test_replication WHERE (id = 1) (4 rows) francs=> explain select * from test_hash where name='A'; QUERY PLAN ---------------------------------------------------------------------------- Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0) Node/s: db_1, db_2(2 rows)
francs=> explain select count(*) from test_hash; QUERY PLAN --------------------------------------------------------------------------------------------- Aggregate (cost=2.50..2.51 rows=1 width=0) -> Materialize (cost=0.00..0.00 rows=0 width=0) -> Data Node Scan on "__REMOTE_GROUP_QUERY__" (cost=0.00..0.00 rows=1000 width=0) Node/s: db_1, db_2 (4 rows)
备注:distributd 表 count 语句扫描所有数据节点。
1 2 3 4 5 6
francs=> explain select count(*) from test_replication; QUERY PLAN ---------------------------------------------------------------------------- Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0) Node/s: db_1 (2 rows)
备注:replication 表 count 语句只扫描一个数据节点;
round robin分片方式
1 2 3 4 5 6 7 8 9 10 11 12
francs=> create table test_round (id int4,name varchar(32)) distribute byround robin;CREATE TABLE francs=> insertinto test_round select generate_series(1,10000),'a'; INSERT010000 francs=> explainselect * from test_round where id=1; QUERY PLAN ---------------------------------------------------------------------------- Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0) Node/s: db_1, db_2(2 rows)
备注: Round Robin 方式会将表数据分散到各个数据节点,但查询数据时由于不知道分片规则,故需要遍历所有数据节点。
总结
5.1 replication表
replication 表查询时只需要读任一个数据节点;
replication 表更改数据时,需要同时对所有数据节点进行,代价较大;
replication 适用于读比较繁忙的静态数据表。
5.2 distribute 表
单独查询或者写一条记录时,如果根据分区键查询,只需要扫描一个数据节点 ( Round Robin 分片方式除外);
单独查询或者写一条记录时,如果根据非分区键,需要扫描所有数据节点。
如果查询需要扫描多个数据节点,性能会有所降低;
手册解释
REPLICATION Each row of the table will be replicated into all the datanode of the Postgres-XC database cluster.
HASH ( column_name ) Each row of the table will be placed based on the hash value of the specified column. Following type is allowed as distribution column: INT8, INT2, OID, INT4, BOOL, INT2VECTOR, OIDVECTOR, CHAR, NAME, TEXT, BPCHAR, BYTEA, VARCHAR, FLOAT4, FLOAT8, NUMERIC, CASH, ABSTIME, RELTIME, DATE, TIME, TIMESTAMP, TIMESTAMPTZ, INTERVAL, and TIMETZ.