hive - Hadoop - Create external table from multiple directories in HDFS -


i have external table reads data hdfs location (/user/hive/warehouse/tablex) files , created external table in hive.

now, let's assume there's pre-partitioning of data , previous files spitted in several directories specific name convention <dir_name>_<incnumber> e.g.

/user/hive/warehouse/split/   ./dir_1/files...   ./dir_2/files...   ./dir_n/files... 

how can create external table keeps track of files in split folder?

do need create external table partitioned on each sub-folder (dir_x)?

also that, needed kind of hive or shell script can create/add partition each sub-directory?

you have create external table partitioned dir_x access files in multiple folders.

create external table sample_table( col1 string,                                     col2 string,                                     col3 string,                                     col4 string) partitioned (dir string) row format delimited fields terminated '\t' stored textfile location '/user/hive/warehouse/split'; 

then add partition regular partitioned table

alter table sample_table add partition(dir='dir_1') location '/user/hive/warehouse/split/dir_1'; alter table sample_table add partition(dir='dir_2') location '/user/hive/warehouse/split/dir_2'; 

this approach work. there issue approach. if time in future decide add new folder(e.g. dir_100) hive warehouse path, have drop , recreate sample_table , re add partitions sample_table again using alter table statement. haven't worked hive 10 months now, not sure if there better approach. if not issue, can use approach.


Popular posts from this blog

Apache NiFi ExecuteScript: Groovy script to replace Json values via a mapping file -

python 3.x - PyQt5 - Signal : pyqtSignal no method connect -