GaussDB(DWS)运维 -- values子句做MERGE数据源导致SQL执行不下推的改写方案

现网做实时接入的时候，有的时候会使用MERGE INTO语句实现类似UPSERT的功能。这种场景下MERGE INTO语句的USING部分的数据位VALUES子句，为了后续的SQL语句中描述方便，需要对VALUES子句的输出命名别名。USING子句的书写方式可能导致MERGE INTO语句的执行不下推，本文就针对因此导致的不下推的场景，对USING子句的SQL语句进行改写，一遍整个SQL语句可以下推

预置条件

CREATE TABLE t1(name text, id INT) DISTRIBUTE BY HASH(id);

原始语句

MERGE INTO t1 USING (
    SELECT *
    FROM (VALUES ('json', 1), ('sam', 2)) AS val(name, id)
) tmp ON (t1.id = tmp.id)
WHEN MATCHED THEN
    UPDATE SET t1.name = tmp.name
WHEN NOT MATCHED THEN
    INSERT (name, id) VALUES(tmp.name, tmp.id);

SQL语句不下推，导致执行低效

postgres=# EXPLAIN VERBOSE MERGE INTO t1 USING (
postgres(#     SELECT *
postgres(#     FROM (VALUES ('json', 1), ('sam', 2)) AS val(name, id)
postgres(# ) tmp ON (t1.id = tmp.id)
postgres-# WHEN MATCHED THEN
postgres-#     UPDATE SET t1.name = tmp.name
postgres-# WHEN NOT MATCHED THEN
postgres-#     INSERT (name, id) VALUES(tmp.name, tmp.id);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
id |                       operation                       | E-rows | E-distinct | E-width | E-costs
----+-------------------------------------------------------+--------+------------+---------+---------
1 | ->  Merge on public.t1                                |      2 |            |      54 | 0.08
2 |    ->  Nested Loop Left Join (3, 4)                   |      2 |            |      54 | 0.08
3 |       ->  Values Scan on "*VALUES*"                   |      2 |            |      36 | 0.03
4 |       ->  Data Node Scan on t1 "_REMOTE_TABLE_QUERY_" |      2 |            |      18 | 0.00
SQL Diagnostic Information
------------------------------------------------------------
SQL is not plan-shipping
reason: Type of Record in non-real table can not be shipped
Predicate Information (identified by plan id)
-------------------------------------------------
1 --Merge on public.t1
Node expr: : $10
2 --Nested Loop Left Join (3, 4)
Join Filter: (t1.id = "*VALUES*".column2)
Targetlist Information (identified by plan id)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
1 --Merge on public.t1
Node/s: All datanodes
Remote query: UPDATE ONLY public.t1 SET name = $7, id = $8 WHERE t1.ctid = $5 AND t1.xc_node_id = $6
Node/s: All datanodes
Remote query: INSERT INTO public.t1 (name, id) VALUES ($9, $10)
2 --Nested Loop Left Join (3, 4)
Output: "*VALUES*".column1, "*VALUES*".column2, t1.name, t1.id, t1.ctid, t1.xc_node_id, "*VALUES*".column1, t1.id, "*VALUES*".column1, "*VALUES*".column2
3 --Values Scan on "*VALUES*"
Output: "*VALUES*".column1, "*VALUES*".column2
4 --Data Node Scan on t1 "_REMOTE_TABLE_QUERY_"
Output: t1.name, t1.id, t1.ctid, t1.xc_node_id
Node/s: All datanodes
Remote query: SELECT name, id, ctid, xc_node_id FROM ONLY public.t1 WHERE true
====== Query Summary =====
--------------------------
Parser runtime: 0.079 ms
Planner runtime: 1.392 ms
Unique SQL Id: 1657855173
(40 rows)

改写方案

MERGE INTO t1 USING (
WITH val(name, id) AS(
VALUES ('json', 1), ('sam', 2)
)
SELECT * FROM val
) tmp ON (t1.id = tmp.id)
WHEN MATCHED THEN
UPDATE SET t1.name = tmp.name
WHEN NOT MATCHED THEN
INSERT (name, id) VALUES(tmp.name, tmp.id);

改写后下推

postgres=# EXPLAIN VERBOSE MERGE INTO t1 USING (
postgres(#     WITH val(name, id) AS(
postgres(#         VALUES ('json', 1), ('sam', 2)
postgres(#     )
postgres(#     SELECT * FROM val
postgres(# ) tmp ON (t1.id = tmp.id)
postgres-# WHEN MATCHED THEN
postgres-#     UPDATE SET t1.name = tmp.name
postgres-# WHEN NOT MATCHED THEN
postgres-#     INSERT (name, id) VALUES(tmp.name, tmp.id);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
id |                  operation                   | E-rows | E-distinct | E-memory | E-width | E-costs
----+----------------------------------------------+--------+------------+----------+---------+---------
1 | ->  Streaming (type: GATHER)                 |      1 |            |          |      54 | 1.56
2 |    ->  Merge on public.t1                    |      2 |            |          |      54 | 1.15
3 |       ->  Streaming(type: REDISTRIBUTE)      |      2 |            | 2MB      |      54 | 1.15
4 |          ->  Nested Loop Left Join (5, 7)    |      2 |            | 1MB      |      54 | 1.11
5 |             ->  Subquery Scan on tmp         |      2 |            | 1MB      |      36 | 0.08
6 |                ->  Values Scan on "*VALUES*" |     24 |            | 1MB      |      36 | 0.03
7 |             ->  Seq Scan on public.t1        |      2 |            | 1MB      |      18 | 1.01
Predicate Information (identified by plan id)
---------------------------------------------
4 --Nested Loop Left Join (5, 7)
Join Filter: (t1.id = tmp.id)
5 --Subquery Scan on tmp
Filter: (Hash By tmp.id)
Targetlist Information (identified by plan id)
----------------------------------------------------------------------------------------------------------------------------------------------------
1 --Streaming (type: GATHER)
Node/s: All datanodes
3 --Streaming(type: REDISTRIBUTE)
Output: tmp.name, tmp.id, t1.name, t1.id, t1.ctid, t1.xc_node_id, tmp.name, tmp.id, (CASE WHEN (t1.ctid IS NULL) THEN tmp.id ELSE t1.id END)
Distribute Key: (CASE WHEN (t1.ctid IS NULL) THEN tmp.id ELSE t1.id END)
Spawn on: All datanodes
Consumer Nodes: All datanodes
4 --Nested Loop Left Join (5, 7)
Output: tmp.name, tmp.id, t1.name, t1.id, t1.ctid, t1.xc_node_id, tmp.name, tmp.id, CASE WHEN (t1.ctid IS NULL) THEN tmp.id ELSE t1.id END
5 --Subquery Scan on tmp
Output: tmp.name, tmp.id
6 --Values Scan on "*VALUES*"
Output: "*VALUES*".column1, "*VALUES*".column2
7 --Seq Scan on public.t1
Output: t1.name, t1.id, t1.ctid, t1.xc_node_id
Distribute Key: t1.id
====== Query Summary =====
-------------------------------
System available mem: 3112960KB
Query Max mem: 3112960KB
Query estimated mem: 6336KB
Parser runtime: 0.107 ms
Planner runtime: 1.185 ms
Unique SQL Id: 780461632
(44 rows)

原创文章，作者：奋斗，如若转载，请注明出处：https://blog.ytso.com/tech/bigdata/317269.html

GaussDB(DWS)运维 — values子句做MERGE数据源导致SQL执行不下推的改写方案

发表回复

GaussDB(DWS)运维 — values子句做MERGE数据源导致SQL执行不下推的改写方案

相关推荐

发表回复