sql语句:
select s_name,s_address
from
supplier,nation
where s_suppkey in (select ps_suppkey from
(selectps_suppkey,case when ps_availqty > 0.5 * sum(l_quantity) over (partition by l_partkey, l_suppkey)
then 1 end vw_col
from
partsupp ,lineitem
where
l_partkey = ps_partkey
and l_suppkey = ps_suppkey
and l_shipdate >= date ‘1994-01-01’
and l_shipdate < date ‘1994-01-01’ + interval ‘1’ year
and
ps_partkey in (
select
p_partkey
from
part
where
p_name like ‘midnight%’
)
) ttt
where
ttt.vw_col is not null
)
and s_nationkey = n_nationkey
and n_name = ‘JORDAN’
order by
s_name;
当gcluster_hash_redistribute_join_optimize=0;时,sql计划如下
+—-+———————+————-+———————-+—————————————————————–+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+—-+———————+————-+———————-+—————————————————————–+
| 04 | [RESULT] | Step | <03> | |
| | | ORDER | | ORDER BY s_name ASC |
| 03 | [GATHER] | INNER JOIN | | (s_nationkey = n_nationkey) |
| | | SCAN | supplier[s_suppkey] | s_suppkey IN ([SubQuery1]) |
| | | SCAN | nation[REP] | (n_name{S} = ‘JORDAN’) |
| 02 | [BROADCAST] | SubQuery2 | ttt | |
| | | Step | <01> | |
| | | WHERE | | (vw_col IS NOT NULL) |
| | | AGG | | |
| 01 | [REDIST(l_partkey)] | INNER JOIN | | (l_partkey = ps_partkey) AND (l_suppkey = ps_suppkey) |
| | | Step | <00> | |
| | | SCAN | lineitem[l_orderkey] | (l_shipdate{S} >= cast(‘1994-01-01’ as date)) |
| | | | | (l_shipdate{S} < date_add(cast(‘1994-01-01’ as date), INTERVA.. |
| 00 | [BROADCAST] | SCAN | partsupp[ps_partkey] | ps_partkey IN ([SubQuery3]) |
+—-+———————+————-+———————-+—————————————————————–+
当gcluster_hash_redistribute_join_optimize=1;时,sql计划如下
+—-+———————+————–+———————-+—————————————————————-+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+—-+———————+————–+———————-+—————————————————————-+
| 03 | [RESULT] | Step | <02> | |
| | | ORDER | | ORDER BY s_name ASC |
| 02 | [GATHER] | INNER JOIN | | (s_nationkey = n_nationkey) |
| | | SCAN | supplier[s_suppkey] | s_suppkey IN ([SubQuery1]) |
| | | SCAN | nation[REP] | (n_name{S} = ‘JORDAN’) |
| 01 | [BROADCAST] | SubQuery2 | ttt | |
| | | INNER JOIN | | (l_partkey = ps_partkey) AND (l_suppkey = ps_suppkey) |
| | | SCAN | partsupp[ps_partkey] | ps_partkey IN ([SubQuery3]) |
| | | Step | <00> | |
| | | WHERE | | (vw_col IS NOT NULL) |
| | | AGG | | |
| 00 | [REDIST(l_partkey)] | SCAN | lineitem[l_orderkey] | (l_shipdate{S} >= cast(‘1994-01-01’ as date)) |
| | | | | (l_shipdate{S} < date_add(cast(‘1994-01-01’ as date), INTERV.. |
+—-+———————+————–+———————-+—————————————————————-+
测试中发现,参数=1时,用时36s,参数=0时,用时13s,
查看trace发现=1时,重分布耗时长
2024-10-21 18:32:45.760 [M: 29G, 0B,D: 0B] [DC:112391, 0] ResultSender: send 227541418 rows.
2024-10-21 18:32:45.760 [M: 29G, 0B,D: 0B] [DC:112391, 0] output result done.
2024-10-21 18:32:45.792 [M: 29G, 0B,D: 0B] [DC:112391, 0] SUMMARY
2024-10-21 18:32:45.792 [M: 29G, 0B,D: 0B] [DC:112391, 0] elapsed time: 00:00:15.890
当遇到重分布耗时长的sql时,可以修改该参数,以获取最佳性能
原创文章,作者:kirin,如若转载,请注明出处:https://blog.ytso.com/tech/bigdata/318037.html