hadoop - Hive How to get non-grouped columns for each group of grouped results? -


i have table similar following.

|name  |   grp   |  dt     ------------------------------    |foo   |       |  2016-01-01 |bar   |       |  2016-01-02 |hai   |    b    |  2016-01-01 |bai   |    b    |  2016-01-02 |baz   |    c    |  2016-01-01 

for each group, want find name dt recent. in other words, max(dt), group grp, , associate name dt max of group output:

|name  |   grp   |  dt     ------------------------------ |bar   |       | 2016-01-02 |bai   |   b     | 2016-01-02 |baz   |   c     | 2016-01-01 

in oracle, following query works , clean (taken here):

select o.name, o.grp, o.dt  tab o      left join tab b          on o.grp = b.grp , o.dt < b.dt b.dt null 

however fails [error 10017]: line 4:43 both left , right aliases encountered in join 'service_effective_from' question quoting documentation, learn cannot use inequality operator in join statement:

only equality joins, outer joins, , left semi joins supported in hive. hive not support join conditions not equality conditions difficult express such conditions map/reduce job.

what clean solution obtaining in hive, given cannot use inequality operator in join condition?

the following works , taken here, don't find clean:

select o.name, ogrp, o.dt tab o     join (         select grp, max(dt) dt         tab         group grp     ) b         on o.grp = b.grp , o.dt = b.dt 

as aside, takes 164 seconds on environment comparable test table 4 rows.


Popular posts from this blog

php - How should I create my API for mobile applications (Needs Authentication) -

5 Reasons to Blog Anonymously (and 5 Reasons Not To)

Google AdWords and AdSense - A Dynamic Small Business Marketing Duo