Memory error - Hierarchical Dirichlet Process, HDP gensim
I am running Hierarchical Dirichlet Process, HDP using gensim in Python but as my corpus is too large it is throwing me following error:
model = gensim.models.HdpModel(corpus, id2word=corpus.id2word, chunksize=50000)
File "/usr/cluster/contrib/Enthought/Canopy_64/User/lib/python2.7/site-packages/gensim/models/", line 210, in __init__
File "/usr/cluster/contrib/Enthought/Canopy_64/User/lib/python2.7/site-packages/gensim/models/", line 245, in update
File "/usr/cluster/contrib/Enthought/Canopy_64/User/lib/python2.7/site-packages/gensim/models/", line 313, in update_chunk
self.update_lambda(ss, word_list, opt_o)
File "/usr/cluster/contrib/Enthought/Canopy_64/User/lib/python2.7/site-packages/gensim/models/", line 415, in update_lambda
rhot * self.m_D * sstats.m_var_beta_ss / sstats.m_chunksize
I have loaded my corpus using following statement:
corpus = gensim.corpora.MalletCorpus('chunk5000K_records.mallet')
And the data which I used to load corpus has 5 million records. And this is working for me when I am loading only 50K records. So I have added chunksize option HdpModel but it is still giving me an error.
Please let me know how I can solve this issue. And I am running this on High Performance Computing so I think there should be a solution to resolve this issue as this cluster has really big size memory and disk capacity.
Topic gensim lda topic-model nlp python
Category Data Science