Hive学习之semi join

Hive不支持where子句中的子查询,SQL常用的exist in子句在Hive中是不支持的。比如以下语句在Hive中执行会报错:

select a.key, a.value
from a
where a.key in (select b.key from b);

semi join可以完美解决这一问题,可以改写为:

select a.key, a.value
from a left semi join b on (a.key = b.key);

注意,semi join的写法并不等同于以下写法。当b表中key存在重复记录时,以下语句会查出多条记录。

select a.key, a.value
from a left outer join b on (a.key = b.key)
where b.key is not null;

a left semi join b on (a.key = b.key):当a表中key值在b表中存在时保留该记录,否则就不保留。