GreenPlum 问题一例 ERROR: Interconnect timeout: Unable to complete setup of all connections within time limit.

虚拟机 GreenPlum 数据仓库搭建好以后,可以执行建库,建表,数据插入语句,就是不能执行”select”操作报以下ERROR,这个问题折腾了好几天,后来在德哥的帮忙下,终于解决了,原来自己在一个配置上疏忽了。

问题现象

不能执行查询SQL,如下:
1.1创建测试表,并插入数据

1
2
3
4
5
6
7
8
9
10
11
12
13
warehouse=# create table test_1 (id  integer , name varchar(32)) distributed by (id);
CREATE TABLE
warehouse=# /d test_1
Table "public.test_1"
Column | Type | Modifiers
--------+-----------------------+-----------
id | integer |
name | character varying(32) |
Distributed by: (id)
warehouse=# insert into test_1 values (1,'francs');
INSERT 0 1
warehouse=# insert into test_1 values (2,'fpZhou');
INSERT 0 1

1.2查询报错

1
2
3
warehouse=# select * From test_1;
ERROR: Interconnect timeout: Unable to complete setup of all connections within time limit.
DETAIL: Completed 0 of 2 incoming and 0 of 0 outgoing connections. gp_interconnect_setup_timeout = 20 seconds.

处理过程

2.1检查Master节点,未发现明显异常

2.2检查Segment节点,发现以下ERROR

1
2011-04-14 21:14:05.535064 CST,"warehouse","warehouse",p20619,th-1208420672,"192.168.1.50","41829",2011-04-14 21:08:26 CST,1279,con6,cmd16,seg0,slice1,,x1279,sx1,"LOG","58M01",Interconnect could not connect to seg-1 127.0.0.1:61572 pid=9845; will retry. Connection refused (connect errno 111)",,,,,,"select * From test_1

从上面的信息来看“seg-1 127.0.0.1” , seg-1 指的是 master 节点,怎么解析成 “127.0.0.1”了, 其实这里我早应该发现,这是关键信息。

2.3尝试调整参数 gp_interconnect_setup_timeout 将参数 gp_interconnect_setup_timeout 调大到 120s 后,依然无效。

2.4 GP整体状态检查,未发现异常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
[greenplum@gpmaster opt]$ gpstate -s
20110414:20:12:43:gpstate:gpmaster:greenplum-[INFO]:-Obtaining GPDB array type, [Brief], please wait...
20110414:20:13:01:gpstate:gpmaster:greenplum-[INFO]:-Obtaining GPDB array type, [Brief], please wait...
20110414:20:13:20:gpstate:gpmaster:greenplum-[INFO]:-Spawning parallel processes batch [1], please wait...
....
20110414:20:13:40:gpstate:gpmaster:greenplum-[INFO]:-Waiting for parallel processes batch [1], please wait...
...............
20110414:20:14:03:gpstate:gpmaster:greenplum-[INFO]:-Master Configuration & Status
20110414:20:14:03:gpstate:gpmaster:greenplum-[INFO]:------------------------------
20110414:20:14:04:gpstate:gpmaster:greenplum-[INFO]:-Master host = gpmaster
20110414:20:14:05:gpstate:gpmaster:greenplum-[INFO]:-Master postgres process ID = 20992
20110414:20:14:06:gpstate:gpmaster:greenplum-[INFO]:-Master data directory = /opt/gp_data/gp-1
20110414:20:14:06:gpstate:gpmaster:greenplum-[INFO]:-Database name = template1
20110414:20:14:07:gpstate:gpmaster:greenplum-[INFO]:-Master port = 5432
20110414:20:14:07:gpstate:gpmaster:greenplum-[INFO]:-Master current role = dispatch
20110414:20:14:08:gpstate:gpmaster:greenplum-[INFO]:-Greenplum array configuration type = Standard
20110414:20:14:09:gpstate:gpmaster:greenplum-[INFO]:-Greenplum initsystem version = 2|3.3.6.1 build 1
20110414:20:14:10:gpstate:gpmaster:greenplum-[INFO]:-Greenplum current version = PostgreSQL 8.2.13 (Greenplum Database 3.3.6.1 build 1) on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44) compiled on Apr 2 2010 16:39:49
20110414:20:14:10:gpstate:gpmaster:greenplum-[INFO]:-Postgres version = 8.2.13
20110414:20:14:11:gpstate:gpmaster:greenplum-[INFO]:-Greenplum fault mode = readonly
20110414:20:14:11:gpstate:gpmaster:greenplum-[INFO]:-Greenplum mirroring status = on
20110414:20:14:12:gpstate:gpmaster:greenplum-[INFO]:-Greenplum master standby = No master standby configured
20110414:20:14:12:gpstate:gpmaster:greenplum-[INFO]:-Greenplum standby master state =
20110414:20:14:13:gpstate:gpmaster:greenplum-[INFO]:-Segment instance status
20110414:20:14:13:gpstate:gpmaster:greenplum-[INFO]:------------------------------------------------
20110414:20:14:14:gpstate:gpmaster:greenplum-[INFO]:-Parallel process exit status
20110414:20:14:14:gpstate:gpmaster:greenplum-[INFO]:------------------------------------------------
20110414:20:14:15:gpstate:gpmaster:greenplum-[INFO]:-Total processes marked as completed = 4
20110414:20:14:15:gpstate:gpmaster:greenplum-[INFO]:-Total processes marked as killed = 0
20110414:20:14:15:gpstate:gpmaster:greenplum-[INFO]:-Total processes marked as failed = 0
20110414:20:14:16:gpstate:gpmaster:greenplum-[INFO]:------------------------------------------------
20110414:20:14:17:gpstate:gpmaster:greenplum-[INFO]:------------------------------------------------
20110414:20:14:17:gpstate:gpmaster:greenplum-[INFO]:-Segment Instance Status Report
20110414:20:14:17:gpstate:gpmaster:greenplum-[INFO]:------------------------------------------------
20110414:20:14:18:gpstate:gpmaster:greenplum-[INFO]:------------------------------------------------
20110414:20:14:18:gpstate:gpmaster:greenplum-[INFO]:-Segment instance hostname = gpnode1
20110414:20:14:19:gpstate:gpmaster:greenplum-[INFO]:-Segment instance port = 50001
20110414:20:14:19:gpstate:gpmaster:greenplum-[INFO]:-Segment instance datadir = /opt/gp_data/data/gp0 Type = Primary
20110414:20:14:20:gpstate:gpmaster:greenplum-[INFO]:-Master reports status as = Valid
20110414:20:14:20:gpstate:gpmaster:greenplum-[INFO]:-Total errors (remote) = 0
20110414:20:14:20:gpstate:gpmaster:greenplum-[INFO]:-File postmaster.pid = Found
20110414:20:14:21:gpstate:gpmaster:greenplum-[INFO]:-PID from postmaster.pid file = 18851
20110414:20:14:21:gpstate:gpmaster:greenplum-[INFO]:-Lock files in /tmp status = Found
20110414:20:14:22:gpstate:gpmaster:greenplum-[INFO]:-Active PID = 18851
20110414:20:14:22:gpstate:gpmaster:greenplum-[INFO]:-Instance status = Active
20110414:20:14:23:gpstate:gpmaster:greenplum-[INFO]:------------------------------------------------
20110414:20:14:23:gpstate:gpmaster:greenplum-[INFO]:-Segment instance hostname = gpnode1
20110414:20:14:24:gpstate:gpmaster:greenplum-[INFO]:-Segment instance port = 60001
20110414:20:14:24:gpstate:gpmaster:greenplum-[INFO]:-Segment instance datadir = /opt/gp_data/mdata/gp1 Type = Mirror
20110414:20:14:24:gpstate:gpmaster:greenplum-[INFO]:-Master reports status as = Valid
20110414:20:14:25:gpstate:gpmaster:greenplum-[INFO]:-Total errors (remote) = 0
20110414:20:14:25:gpstate:gpmaster:greenplum-[INFO]:-File postmaster.pid = Found
20110414:20:14:26:gpstate:gpmaster:greenplum-[INFO]:-PID from postmaster.pid file = 18865
20110414:20:14:26:gpstate:gpmaster:greenplum-[INFO]:-Lock files in /tmp status = Found
20110414:20:14:26:gpstate:gpmaster:greenplum-[INFO]:-Active PID = 18865
20110414:20:14:27:gpstate:gpmaster:greenplum-[INFO]:-Mirror status = Passive
20110414:20:14:27:gpstate:gpmaster:greenplum-[INFO]:-Instance status = Valid
20110414:20:14:28:gpstate:gpmaster:greenplum-[INFO]:------------------------------------------------
20110414:20:14:29:gpstate:gpmaster:greenplum-[INFO]:-Segment instance hostname = gpnode2
20110414:20:14:29:gpstate:gpmaster:greenplum-[INFO]:-Segment instance port = 50001
20110414:20:14:29:gpstate:gpmaster:greenplum-[INFO]:-Segment instance datadir = /opt/gp_data/data/gp1 Type = Primary
20110414:20:14:30:gpstate:gpmaster:greenplum-[INFO]:-Master reports status as = Valid
20110414:20:14:30:gpstate:gpmaster:greenplum-[INFO]:-Total errors (remote) = 0
20110414:20:14:31:gpstate:gpmaster:greenplum-[INFO]:-File postmaster.pid = Found
20110414:20:14:31:gpstate:gpmaster:greenplum-[INFO]:-PID from postmaster.pid file = 17141
20110414:20:14:32:gpstate:gpmaster:greenplum-[INFO]:-Lock files in /tmp status = Found
20110414:20:14:32:gpstate:gpmaster:greenplum-[INFO]:-Active PID = 17141
20110414:20:14:32:gpstate:gpmaster:greenplum-[INFO]:-Instance status = Active
20110414:20:14:33:gpstate:gpmaster:greenplum-[INFO]:------------------------------------------------
20110414:20:14:33:gpstate:gpmaster:greenplum-[INFO]:-Segment instance hostname = gpnode2
20110414:20:14:34:gpstate:gpmaster:greenplum-[INFO]:-Segment instance port = 60001
20110414:20:14:34:gpstate:gpmaster:greenplum-[INFO]:-Segment instance datadir = /opt/gp_data/mdata/gp0 Type = Mirror
20110414:20:14:35:gpstate:gpmaster:greenplum-[INFO]:-Master reports status as = Valid
20110414:20:14:35:gpstate:gpmaster:greenplum-[INFO]:-Total errors (remote) = 0
20110414:20:14:36:gpstate:gpmaster:greenplum-[INFO]:-File postmaster.pid = Found
20110414:20:14:36:gpstate:gpmaster:greenplum-[INFO]:-PID from postmaster.pid file = 17153
20110414:20:14:36:gpstate:gpmaster:greenplum-[INFO]:-Lock files in /tmp status = Found
20110414:20:14:37:gpstate:gpmaster:greenplum-[INFO]:-Active PID = 17153
20110414:20:14:38:gpstate:gpmaster:greenplum-[INFO]:-Mirror status = Passive
20110414:20:14:38:gpstate:gpmaster:greenplum-[INFO]:-Instance status = Valid

2.5网络检查,发现网络不好

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[greenplum@gpmaster ~]$ gpchecknet -f all_host_file -d test/
/opt/greenplum-db/./bin/gpchecknet -f all_host_file -d test/
-------------------
-- NETPERF TEST
-------------------
====================
== RESULT
====================
Netperf bisection bandwidth test
gpnode1 -> gpnode2 = 6.090000
gpnode2 -> gpmaster = 7.730000
gpnode2 -> gpnode1 = 6.960000
gpmaster -> gpnode2 = 6.530000
Summary:
sum = 27.31 MB/sec
min = 6.09 MB/sec
max = 7.73 MB/sec
avg = 6.83 MB/sec
median = 6.96 MB/sec
[Warning] connection between gpnode1 and gpnode2 is no good
[Warning] connection between gpmaster and gpnode2 is no good

2.6虚拟机用的是 HOST-ONLY 网络连接,尝试将网络连接改成桥接,依然无效
2.7后来实在搞不定,在德哥的帮助下,终于确定了以下问题
2.8ping gpmaster 主机发现问题

1
2
3
4
5
6
7
[root@gpmaster ~]# ping gpmaster
PING gpmaster (127.0.0.1) 56(84) bytes of data.
64 bytes from gpmaster (127.0.0.1): icmp_seq=1 ttl=64 time=2.55 ms
64 bytes from gpmaster (127.0.0.1): icmp_seq=2 ttl=64 time=0.073 ms
64 bytes from gpmaster (127.0.0.1): icmp_seq=3 ttl=64 time=0.040 ms
64 bytes from gpmaster (127.0.0.1): icmp_seq=4 ttl=64 time=0.049 ms
64 bytes from gpmaster (127.0.0.1): icmp_seq=5 ttl=64 time=0.039 ms

ping gpmaster 发现,gpmaster 解析为 127.0.0.1, 应该是 192.168.1.50 才对
2.9 检查 /etc/hosts

2.9.1 原来的 /etc/hosts内容

1
2
3
4
5
# that require network functionality will fail.
127.0.0.1 gpmaster localhost.localdomain localhost
192.168.1.50 gpmaster
192.168.1.51 gpnode1
192.168.1.52 gpnode2

发现,gpmaster 解析成 127.0.0.1 了,先读的第一行。

2.9.2修改后的 /etc/hosts

1
2
3
4
5
6
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
192.168.1.50 gpmaster
192.168.1.51 gpnode1
192.168.1.52 gpnode2

2.9.3再次 ping gpmaster

1
2
3
4
5
6
7
[root@gpmaster ~]# ping gpmaster
PING gpmaster (192.168.1.50) 56(84) bytes of data.
64 bytes from gpmaster (192.168.1.50): icmp_seq=1 ttl=64 time=2.59 ms
64 bytes from gpmaster (192.168.1.50): icmp_seq=2 ttl=64 time=0.066 ms
64 bytes from gpmaster (192.168.1.50): icmp_seq=3 ttl=64 time=0.038 ms
64 bytes from gpmaster (192.168.1.50): icmp_seq=4 ttl=64 time=0.051 ms
64 bytes from gpmaster (192.168.1.50): icmp_seq=5 ttl=64 time=0.044 ms

这时,gpmaster 的IP解析为 “192.168.1.50”。

2.9.4退出 session 后,重新测试

1
2
3
4
5
6
warehouse=# select * From test_1;
id | name
----+--------
1 | francs
2 | fpZhou
(2 rows)

这次终于OK了,数据可以查询了。

总结

这个问题折腾了好几天,结果问题发生在 /etc/hosts上,还是在高手帮忙的情况下解决的,应该好好反思下,OS层面的知识比较薄弱,对基本知识应该加强。

/etc/hosts 解析是从上到下解析的,当解析文件 /etc/hosts 时,第一次匹配成功则返回。

原创文章,作者:506227337,如若转载,请注明出处:https://blog.ytso.com/236389.html

(0)
上一篇 2022年1月24日
下一篇 2022年1月24日

相关推荐

发表回复

登录后才能评论