hadoop - Hive How to get non-grouped columns for each group of grouped results? -
i have table similar following.
|name | grp | dt ------------------------------ |foo | | 2016-01-01 |bar | | 2016-01-02 |hai | b | 2016-01-01 |bai | b | 2016-01-02 |baz | c | 2016-01-01
for each group, want find name
dt
recent. in other words, max(dt), group grp, , associate name dt max of group output:
|name | grp | dt ------------------------------ |bar | | 2016-01-02 |bai | b | 2016-01-02 |baz | c | 2016-01-01
in oracle, following query works , clean (taken here):
select o.name, o.grp, o.dt tab o left join tab b on o.grp = b.grp , o.dt < b.dt b.dt null
however fails [error 10017]: line 4:43 both left , right aliases encountered in join 'service_effective_from'
question quoting documentation, learn cannot use inequality operator in join statement:
only equality joins, outer joins, , left semi joins supported in hive. hive not support join conditions not equality conditions difficult express such conditions map/reduce job.
what clean solution obtaining in hive, given cannot use inequality operator in join condition?
the following works , taken here, don't find clean:
select o.name, ogrp, o.dt tab o join ( select grp, max(dt) dt tab group grp ) b on o.grp = b.grp , o.dt = b.dt
as aside, takes 164 seconds on environment comparable test table 4 rows.