hive - Hadoop - Create external table from multiple directories in HDFS -


i have external table reads data hdfs location (/user/hive/warehouse/tablex) files , created external table in hive.

now, let's assume there's pre-partitioning of data , previous files spitted in several directories specific name convention <dir_name>_<incnumber> e.g.

/user/hive/warehouse/split/   ./dir_1/files...   ./dir_2/files...   ./dir_n/files... 

how can create external table keeps track of files in split folder?

do need create external table partitioned on each sub-folder (dir_x)?

also that, needed kind of hive or shell script can create/add partition each sub-directory?

you have create external table partitioned dir_x access files in multiple folders.

create external table sample_table( col1 string,                                     col2 string,                                     col3 string,                                     col4 string) partitioned (dir string) row format delimited fields terminated '\t' stored textfile location '/user/hive/warehouse/split'; 

then add partition regular partitioned table

alter table sample_table add partition(dir='dir_1') location '/user/hive/warehouse/split/dir_1'; alter table sample_table add partition(dir='dir_2') location '/user/hive/warehouse/split/dir_2'; 

this approach work. there issue approach. if time in future decide add new folder(e.g. dir_100) hive warehouse path, have drop , recreate sample_table , re add partitions sample_table again using alter table statement. haven't worked hive 10 months now, not sure if there better approach. if not issue, can use approach.


Popular posts from this blog

php - How should I create my API for mobile applications (Needs Authentication) -

5 Reasons to Blog Anonymously (and 5 Reasons Not To)

Google AdWords and AdSense - A Dynamic Small Business Marketing Duo