0
回答
hive 一timestamp列时间向后飘移1小时,请问原因

1 hive 版本
[hadoop@gxdh01 ~]$ hive --version
Hive 1.1.0-cdh5.7.5
Subversion file:///data/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hive-1.1.0-cdh5.7.5 -r Unknown
Compiled by jenkins on Wed Nov 2 11:53:22 PDT 2016
From source with checksum 0e57e48866fa2d9aa5d36cca7bffae5e

2 问题描述
 当我做一个inner join的内关联查询,同一列向后时间漂移1小时
(1) 单表查询是没有问题的
hive> SELECT COMMUNITY_ID, BUILDENDDATE
    >   FROM DW_ODS.ODS_COMMUNITY A
    >  WHERE A.DAY_ID = '20170324'
    >    AND COMMUNITY_ID = 5086560
    > ;
Query ID = hadoop_20170328085656_331bf931-3ad9-4162-9a1a-89e46586ccef
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1490177280986_5130, Tracking URL = http://gxdh01:8088/proxy/application_1490177280986_5130/
Kill Command = /opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/lib/hadoop/bin/hadoop job  -kill job_1490177280986_5130
Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 0
2017-03-28 08:56:16,231 Stage-1 map = 0%,  reduce = 0%
2017-03-28 08:56:22,409 Stage-1 map = 75%,  reduce = 0%, Cumulative CPU 6.4 sec
2017-03-28 08:56:24,460 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 10.07 sec
MapReduce Total cumulative CPU time: 10 seconds 70 msec
Ended Job = job_1490177280986_5130
MapReduce Jobs Launched:
Stage-Stage-1: Map: 4   Cumulative CPU: 10.07 sec   HDFS Read: 494905939 HDFS Write: 28 SUCCESS
Total MapReduce CPU Time Spent: 10 seconds 70 msec
OK
5086560 1971-01-01 00:00:00
Time taken: 14.011 seconds, Fetched: 1 row(s)
hive>

you can see BUILDENDDATE's value is 1971-01-01 00:00:00

(2) 关联查询出现问题

hive> SELECT T1.COMMUNITY_ID, T1.BUILDENDDATE
    >   FROM (SELECT *
    >           FROM DW_ODS.ODS_COMMUNITY A
    >          WHERE A.DAY_ID = '20170324'
    >            AND COMMUNITY_ID = 5086560) T1
    >  INNER JOIN (SELECT ADDRESS_ID
    >                FROM DW_ODS.ODS_STREET_NUMBER SN
    >               WHERE DAY_ID = '20170324') T2
    >     ON T1.COMMUNITY_ID = T2.ADDRESS_ID;
Query ID = hadoop_20170328085858_57223eaf-6b40-407d-8ec9-4212f1db0668
Total jobs = 1
Stage-1 is selected by condition resolver.
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 12
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1490177280986_5131, Tracking URL = http://gxdh01:8088/proxy/application_1490177280986_5131/
Kill Command = /opt/cloudera/parcels/CDH-5.7.5-1.cdh5.7.5.p0.3/lib/hadoop/bin/hadoop job  -kill job_1490177280986_5131
Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 12
2017-03-28 08:58:16,658 Stage-1 map = 0%,  reduce = 0%
2017-03-28 08:58:22,817 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 6.99 sec
2017-03-28 08:58:23,843 Stage-1 map = 75%,  reduce = 0%, Cumulative CPU 18.22 sec
2017-03-28 08:58:24,867 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 27.43 sec
2017-03-28 08:58:29,994 Stage-1 map = 100%,  reduce = 83%, Cumulative CPU 52.4 sec
2017-03-28 08:58:31,018 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 58.12 sec
MapReduce Total cumulative CPU time: 58 seconds 120 msec
Ended Job = job_1490177280986_5131
MapReduce Jobs Launched:
Stage-Stage-1: Map: 8  Reduce: 12   Cumulative CPU: 58.12 sec   HDFS Read: 743128560 HDFS Write: 84 SUCCESS
Total MapReduce CPU Time Spent: 58 seconds 120 msec
OK
5086560 1971-01-01 01:00:00
5086560 1971-01-01 01:00:00
5086560 1971-01-01 01:00:00
Time taken: 19.316 seconds, Fetched: 3 row(s)
hive>

可以看到 BUILDENDDATE这列值增加了一个小时,原值是 1971-01-01 00:00:00,现在在关联后查询是1971-01-01 01:00:00

请问有遇到过的么?为什么?

 

<无标签>
举报
hrb_qiuyb
发帖于1年前 0回/230阅
顶部