hive - Hadoop - Create external table from multiple directories in HDFS -
i have external table reads data hdfs location (/user/hive/warehouse/tablex) files , created external table in hive.
now, let's assume there's pre-partitioning of data , previous files spitted in several directories specific name convention <dir_name>_<incnumber> e.g.
/user/hive/warehouse/split/ ./dir_1/files... ./dir_2/files... ./dir_n/files...
how can create external table keeps track of files in split folder?
do need create external table partitioned on each sub-folder (dir_x)?
also that, needed kind of hive or shell script can create/add partition each sub-directory?
you have create external table partitioned dir_x access files in multiple folders.
create external table sample_table( col1 string, col2 string, col3 string, col4 string) partitioned (dir string) row format delimited fields terminated '\t' stored textfile location '/user/hive/warehouse/split';
then add partition regular partitioned table
alter table sample_table add partition(dir='dir_1') location '/user/hive/warehouse/split/dir_1'; alter table sample_table add partition(dir='dir_2') location '/user/hive/warehouse/split/dir_2';
this approach work. there issue approach. if time in future decide add new folder(e.g. dir_100) hive warehouse path, have drop , recreate sample_table , re add partitions sample_table again using alter table statement. haven't worked hive 10 months now, not sure if there better approach. if not issue, can use approach.