java - hbase how to choose pre split strategies and how its affect your rowkeys -

i trying pre split hbase table. 1 hbaseadmin java api create hbase table function of startkey, endkey , number of regions. here's java api use hbaseadmin void createtable(htabledescriptor desc, byte[] startkey, byte[] endkey, int numregions)

is there recommendation on choosing startkey , endkey based on dataset?

my approach lets have 100 records in dataset. want data divided approximately in 10 regions each have approx 10 records. find startkey scan '/mytable', {limit => 10} , pick last rowkey startkey , scan '/mytable', {limit => 90} , pick last rowkey endkey.

does approach find startkey , rowkey looks ok or there better practice?

edit tried following approaches pre split empty table. 3 didn't work way used it. think need salt key equal distribution.

ps> displaying region info

byte[][] splits = new regionsplitter.hexstringsplit().split(10); hbaseadmin.createtable(tabledescriptor, splits);

this gives regions boundaries like:

{     "startkey":"-infinity",     "endkey":"11111111",     "numberofrows":3628951, }, {     "startkey":"11111111",     "endkey":"22222222", }, {        "startkey":"22222222",     "endkey":"33333333", }, {     "startkey":"33333333",     "endkey":"44444444", }, {     "startkey":"88888888",     "endkey":"99999999", }, {     "startkey":"99999999",     "endkey":"aaaaaaaa", }, {     "startkey":"aaaaaaaa",     "endkey":"bbbbbbbb", }, {     "startkey":"eeeeeeee",     "endkey":"infinity", }

this useless rowkeys of composite form 'deptid|month|roleid|regionid' , doesn't fit above boundaries.

byte[][] splits = new regionsplitter.uniformsplit().split(10); hbaseadmin.createtable(tabledescriptor, splits)

this has same issue:

{     "startkey":"-infinity",     "endkey":"\\x19\\x99\\x99\\x99\\x99\\x99\\x99\\x99", } {     "startkey":"\\x19\\x99\\x99\\x99\\x99\\x99\\x99\\     "endkey":"33333332", } {     "startkey":"33333332",     "endkey":"l\\xcc\\xcc\\xcc\\xcc\\xcc\\xcc\\xcb", } {     "startkey":"\\xe6ffffffa",     "endkey":"infinity", }

3) tried supplying start key , end key , got following useless regions.

hbaseadmin.createtable(tabledescriptor, bytes.tobytes("04120|200808|805|1999"),                                bytes.tobytes("01253|201501|805|1999"), 10); {     "startkey":"-infinity",     "endkey":"04120|200808|805|1999", } {     "startkey":"04120|200808|805|1999",     "endkey":"000ptp\\xdc200w\\xd07\\x9c805|1999", } {     "startkey":"000ptp\\xdc200w\\xd07\\x9c805|1999",     "endkey":"000ptq<200wp6\\xbc805|1999", } {     "startkey":"001\\x11\\x15\\x13\\x1c201\\x15\\x902\\x5c805|1999",     "endkey":"01253|201501|805|1999", } {     "startkey":"01253|201501|805|1999",     "endkey":"infinity", }

first question : out of experience hbase, not aware hard rule creating number of regions, start key , end key.

but underlying thing is,

with rowkey design, data should distributed across regions , not hotspotted (36.1. hotspotting)

however, if define fixed number of regions mentioned 10. there may not 10 after heavy data load. if reaches, limit, number of regions again split.

in way of creating table hbase admin documentation says, creates new table specified number of regions. start key specified become end key of first region of table, , end key specified become start key of last region of table (the first region has null start key , last region has null end key).

moreover, prefer creating table through script presplits 0-10 , design rowkey such salted , sitting on 1 of region servers avoid hotspotting.

edit : if want implement own regionsplit can implement , provide own implementation org.apache.hadoop.hbase.util.regionsplitter.splitalgorithm , override

public byte[][] split(int numberofsplits)

second question : understanding : want find startrowkey , end rowkey inserted data in specific table... below ways.

if want find start , end rowkeys scan '.meta' table understand how start rowkey , end rowkey..
you can access ui http://hbasemaster:60010 if can see how rowkeys spread across each region. each region start , rowkeys there.
to know how keys organized, after pre splitting table , inserting in hbase... use firstkeyonlyfilter

for example : scan 'yourtablename', filter => 'firstkeyonlyfilter()' displays 100 rowkeys.

if have huge data (not 100 rows mentioned) , want take dump of rowkeys can use below out side shell..

echo "scan 'yourtablename', filter => 'firstkeyonlyfilter()'" | hbase shell > rowkeys.txt

Search This Blog

Employment

java - hbase how to choose pre split strategies and how its affect your rowkeys -

Popular posts from this blog

Apache NiFi ExecuteScript: Groovy script to replace Json values via a mapping file -

python 3.x - PyQt5 - Signal : pyqtSignal no method connect -

audio - What is the sound ID for the "Glass" sound in iOS? -