How are number of iterations and number of partitions releated in Apache spark Word2Vec? -


according mllib.feature.word2vec - spark 1.3.1 documentation [1]:

def setnumiterations(numiterations: int): word2vec.this.type 

sets number of iterations (default: 1), should smaller or equal number of partitions.

def setnumpartitions(numpartitions: int): word2vec.this.type 

sets number of partitions (default: 1). use small number accuracy.

but in pull request [2]:

to make our implementation more scalable, train each partition separately , merge model of each partition after each iteration. make model more accurate, multiple iterations may needed.

questions:

  • how parameters numiterations & numpartitions effect internal working of algorithm?

  • is there trade-off between setting number of partitions , number of iterations considering following rules ?

    • more accuracy -> more iteration a/c [2]

    • more iteration -> more partition a/c [1]

    • more partition -> less accuracy


Popular posts from this blog

php - How should I create my API for mobile applications (Needs Authentication) -

5 Reasons to Blog Anonymously (and 5 Reasons Not To)

Google AdWords and AdSense - A Dynamic Small Business Marketing Duo