hive for processing and hbase for low latency read

less than 1 minute read

creating table which can be accessed both by hive and hbase, this is done in cases where we require quick (low latency) searches and faster processing of data. In this case we can use hive for processing and hbase for quick searches with same underlying table.

steps:

  1. create table in hive
    create table integration (a int, b string);
    
  2. load data in hive table
    load data local inpath '/path/to/data/data.csv'
    overwrite into table integration;
    
  3. create hive-hbase table
    create table hive_table(key int, value string) stored by
    'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with
    SERDEPROPERTIES ("hbase.table.name"="hbase_table");
    
  4. insert data in hive-hbase table
    insert overwrite table hive_table select * from integration;
    
  5. verify data from hive
    select * from hive_table;
    
  6. verify data from hbase
    scan 'hbase_table'
    

Note: in hive we can have duplicate keys but in hbase we can have only unique keys so when we load data in the table which is hive-hbase then it updates the same key with new timestamp and by default latest one will be shown.

Comments