PostgreSQL: UNION 和 UNION ALL 操作的不同

今天读到老外的一篇关于 UNION 和 UNION ALL 操作对比的文章觉得很受用,原文 http://www.cybertec.at/common-mistakes-union-vs-union-all/ , 有必要实验下。

UNION 操作一般用来合并多个结果集,尽管如此,仍然有细节需要注意,接着看以下测试。

PostgreSQL 9.3 中测试

1.1 测试

1
2
3
4
5
6
7
8
9
[pg93@redhatB ~]$ psql francs francs
psql (9.3beta1)
Type "help" for help.

francs=> select 1 union select 1;
?column?
----------
1
(1 row)

备注:这里预期是 2 条记录,结果却为 1 条。

1.2 执行计划

1
2
3
4
5
6
7
8
9
10
11
12
francs=> explain analyze select 1 union select 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Unique (cost=0.05..0.06 rows=2 width=0) (actual time=0.062..0.070 rows=1 loops=1)
-> Sort (cost=0.05..0.06 rows=2 width=0) (actual time=0.058..0.061 rows=2 loops=1)
Sort Key: (1)
Sort Method: quicksort Memory: 17kB
-> Append (cost=0.00..0.04 rows=2 width=0) (actual time=0.006..0.017 rows=2 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.005 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003 rows=1 loops=1)
Total runtime: 0.121 ms
(8 rows)

备注:根据以上,知道先进行排序,然后有个 Unique 操作去掉重复的行。

1.3 pg 手册中的解释

[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
Using the operators UNION, INTERSECT, and EXCEPT, the output of more than one SELECT statement can be combined to form a single result set. The UNION operator returns all rows that are in one or both of the result sets. The INTERSECT operator returns all rows that are strictly in both result sets. The EXCEPT operator returns the rows that are in the first result set but not in the second. In all three cases, duplicate rows are eliminated unless ALL is specified. The noise word DISTINCT can be added to explicitly specify eliminating duplicate rows. Notice that DISTINCT is the default behavior here, even though ALL is the default for SELECT itself. (See UNION Clause, INTERSECT Clause, and EXCEPT Clause below.)

UNION Clause
The result of UNION does not contain any duplicate rows unless the ALL option is specified. ALL prevents elimination of duplicates. (Therefore, UNION ALL is usually significantly quicker than UNION; use ALL when you can.) DISTINCT can be written to explicitly specify the default behavior of eliminating duplicate rows.

备注: 上面解释得很清楚, UNION 默认使用 DISTINCT 模式,会删除重复的行。

1.4 union distinct

1
2
3
4
5
francs=> select 1 union distinct select 1;
?column?
----------
1
(1 row)

1.5 union all

1
2
3
4
5
6
francs=> select 1 union all select 1;
?column?
----------
1
1
(2 rows)

备注:”union distinct” 会删除重复的行, “union all” 是预期的结果,不会删除重复的行。

Oracle 10g 中测试

2.1 测试

1
2
3
4
5
15:38:03 SKYTF@skytf> select 1 from dual union select 1 from dual;

1
----------
1

备注: union 操作和 PG 的测试结果一样。

2.2 执行计划

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
15:38:13 SKYTF@skytf> set autotrace on;
15:38:30 SKYTF@skytf> select 1 from dual union select 1 from dual;

1
----------
1

执行计划
----------------------------------------------------------
Plan hash value: 4080566713

-----------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 6 (67)| 00:00:01 |
| 1 | SORT UNIQUE | | 2 | 6 (67)| 00:00:01 |
| 2 | UNION-ALL | | | | |
| 3 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |
| 4 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------


统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
0 consistent gets
0 physical reads
0 redo size
401 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed

备注:从 PLAN 中看出,oracle 的 union 和 pg 的 union 操作内部处理过程是类似的。

2.3 union all

1
2
3
4
5
6
15:44:32 SKYTF@skytf> select 1 from dual union all select 1 from dual;

1
----------
1
1

备注: union all 操作和 PG 的测试结果一样。

2.4 oracle 手册中的解释

UNION ALL Example
The UNION operator returns only distinct rows that appear in either result, while the UNION ALL operator returns all rows. The UNION ALL operator does not eliminate duplicate selected rows:

总结

  • PG 和 ORACLE 对 UNION 的处理是一样的,即 union 操作会删除重复的行,而 union all 操作不会删除重复的行。
  • 由于 union 操作会对多个结果集进行排序然后删除重复的行,因此效率会比 union all 低很多。

参考

原创文章,作者:Maggie-Hunter,如若转载,请注明出处:https://blog.ytso.com/238004.html

(0)
上一篇 2022年1月30日 07:14
下一篇 2022年1月30日

相关推荐

发表回复

登录后才能评论