How are number of iterations and number of partitions releated in Apache spark Word2Vec? -
according mllib.feature.word2vec - spark 1.3.1 documentation [1]:
def setnumiterations(numiterations: int): word2vec.this.type
sets number of iterations (default: 1), should smaller or equal number of partitions.
def setnumpartitions(numpartitions: int): word2vec.this.type
sets number of partitions (default: 1). use small number accuracy.
but in pull request [2]:
to make our implementation more scalable, train each partition separately , merge model of each partition after each iteration. make model more accurate, multiple iterations may needed.
questions:
how parameters numiterations & numpartitions effect internal working of algorithm?
is there trade-off between setting number of partitions , number of iterations considering following rules ?
more accuracy -> more iteration a/c [2]
more iteration -> more partition a/c [1]
more partition -> less accuracy