greenplum 5.7 + create table + insert into详解数据库

os: centos 7.4
gp: gpdb-5.7.0

三台机器
node1 为master host
node2、node3为segment host

psql 登录 node1 master

$ psql -d peiybdb 
peiybdb=# select current_database(); 
 current_database  
------------------ 
 peiybdb 
(1 row) 
 
create table tmp_t0( 
c1 varchar(100), 
c2 varchar(100), 
c3 varchar(100) 
); 
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'c1' as the Greenplum Database data distribution key for this table. 
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. 
CREATE TABLE 
 
peiybdb=# /d+ tmp_t0 
                        Table "public.tmp_t0" 
 Column |          Type          | Modifiers | Storage  | Description  
--------+------------------------+-----------+----------+------------- 
 c1     | character varying(100) |           | extended |  
 c2     | character varying(100) |           | extended |  
 c3     | character varying(100) |           | extended |  
Has OIDs: no 
Distributed by: (c1) 

查看node1、node2、node3的文件目录大小

[gpadmin@node1 gpseg-1]$ pwd 
/u01/greenplum-data/gpseg-1 
[gpadmin@node1 gpseg-1]$ du -sh 
109M    . 
 
[gpadmin@node2 gpseg0]$ pwd 
/u01/greenplum-data/gpseg0 
[gpadmin@node2 gpseg0]$ du -sh 
109M    . 
 
[gpadmin@node3 gpseg1]$ pwd 
/u01/greenplum-data/gpseg1 
[gpadmin@node3 gpseg1]$ du -sh 
109M    .

插入数据

peiybdb=# select gp_opt_version(); 
                gp_opt_version                  
----------------------------------------------- 
 GPOPT version: 2.55.20, Xerces version: 3.1.2 
(1 row) 
 
 
set optimizer=on; 
set optimizer_enumerate_plans=on; 
set optimizer_minidump=always; 
set optimizer_enable_constant_expression_evaluation=off; 
set client_min_messages='debug5'; 
 
 
insert into tmp_t0 
(c1,c2,c3) 
select generate_series(1, 1000000) as c1, 
       md5(random()::text) as c2 , 
       md5(random()::text) as c3 
; 

node1 master上查询pg_stat_activity

peiybdb=# /x 
Expanded display is on. 
peiybdb=# select * from pg_stat_activity; 
-[ RECORD 1 ]----+------------------------------------------- 
datid            | 16384 
datname          | peiybdb 
procpid          | 10904 
sess_id          | 20 
usesysid         | 10 
usename          | gpadmin 
current_query    | insert into tmp_t0 
                 | (c1,c2,c3) 
                 | select generate_series(1, 1000000) as c1,  
                 |        md5(random()::text) as c2 ,  
                 |        md5(random()::text) as c3  
                 |  
waiting          | f 
query_start      | 2018-05-02 07:20:33.325637+00 
backend_start    | 2018-05-02 07:13:17.573451+00 
client_addr      | 192.168.56.1 
client_port      | 63217 
application_name |  
xact_start       | 2018-05-02 07:20:33.3212+00 
waiting_reason   |  
rsgid            | 0 
rsgname          |  
rsgqueueduration |  
-[ RECORD 2 ]----+------------------------------------------- 
datid            | 16384 
datname          | peiybdb 
procpid          | 10906 
sess_id          | 21 
usesysid         | 10 
usename          | gpadmin 
current_query    | <IDLE> 
waiting          | f 
query_start      | 2018-05-02 07:17:17.675179+00 
backend_start    | 2018-05-02 07:13:17.580506+00 
client_addr      | 192.168.56.1 
client_port      | 63218 
application_name |  
xact_start       |  
waiting_reason   |  
rsgid            | 0 
rsgname          |  
rsgqueueduration |  
-[ RECORD 3 ]----+------------------------------------------- 
datid            | 16384 
datname          | peiybdb 
procpid          | 11095 
sess_id          | 22 
usesysid         | 10 
usename          | gpadmin 
current_query    | select * from pg_stat_activity; 
waiting          | f 
query_start      | 2018-05-02 07:28:07.30111+00 
backend_start    | 2018-05-02 07:27:54.501407+00 
client_addr      |  
client_port      | -1 
application_name | psql 
xact_start       | 2018-05-02 07:28:07.30111+00 
waiting_reason   |  
rsgid            | 0 
rsgname          |  
rsgqueueduration | 

再次查看node1、node2、node3的文件目录大小

[gpadmin@node1 gpseg-1]$ du -sh 
109M    . 
 
[gpadmin@node2 gpseg0]$ du -sh 
224M    . 
 
[gpadmin@node3 gpseg1]$ du -sh 
224M    .

可以观察到 master 节点node1的数据文件大小并没有发生变化,segment的node2、node3的数据文件增长了不少。
主要就是由于 master 节点是用来存储定义,segment是用来存储数据的。

查看tmp_t0的定义

peiybdb=#  
peiybdb=# /d 
              List of relations 
 Schema |  Name  | Type  |  Owner  | Storage  
--------+--------+-------+---------+--------- 
 public | tmp_t0 | table | gpadmin | heap 
(1 row) 
 
peiybdb=# /d+ tmp_t0; 
                        Table "public.tmp_t0" 
 Column |          Type          | Modifiers | Storage  | Description  
--------+------------------------+-----------+----------+------------- 
 c1     | character varying(100) |           | extended |  
 c2     | character varying(100) |           | extended |  
 c3     | character varying(100) |           | extended |  
Has OIDs: no 
Distributed by: (c1)

Distributed by 这个就是tmp_t0表的分布列,表的分布列一定要合理,能够降数据比较均匀的分布到各个segment节点上。
检索数据时能够在多个节点并发处理数据。

原创文章,作者:Maggie-Hunter,如若转载,请注明出处:https://blog.ytso.com/tech/database/3876.html

(0)
上一篇 2021年7月16日 18:24
下一篇 2021年7月16日 18:24

相关推荐

发表回复

登录后才能评论