本例以hive为例,只考虑到实现,与性能无关
首先创建表
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
CREATE TABLE orders (orderid INT,createtime STRING,modifiedtime STRING,status STRING) row format delimited fields terminated by '/t'CREATE TABLE ods_orders_inc (orderid INT,createtime STRING,modifiedtime STRING,status STRING) PARTITIONED BY (day STRING)row format delimited fields terminated by '/t'CREATE TABLE dw_orders_his (orderid INT,createtime STRING,modifiedtime STRING,status STRING,dw_start_date STRING,dw_end_date STRING) row format delimited fields terminated by '/t' ; |
首先全量更新,我们先到2016-08-20为止的数据。
初始化,先把2016-08-20的数据初始化进去
|
1
2
3
4
|
INSERT overwrite TABLE ods_orders_inc PARTITION (day = '2016-08-20')SELECT orderid,createtime,modifiedtime,statusFROM ordersWHERE createtime < '2016-08-21' and modifiedtime <'2016-08-21'; |
刷到dw中
|
1
2
3
4
5
6
|
INSERT overwrite TABLE dw_orders_hisSELECT orderid,createtime,modifiedtime,status,createtime AS dw_start_date,'9999-12-31' AS dw_end_dateFROM ods_orders_incWHERE day = '2016-08-20'; |
如下结果
|
1
2
3
4
5
|
select * from dw_orders_his;OK1 2016-08-20 2016-08-20 创建 2016-08-20 9999-12-312 2016-08-20 2016-08-20 创建 2016-08-20 9999-12-313 2016-08-20 2016-08-20 创建 2016-08-20 9999-12-31 |
原创文章,作者:carmelaweatherly,如若转载,请注明出处:https://blog.ytso.com/tech/opensource/194326.html