
 Description:
A Matlab program which implements the beam sampler for an infinite hidden Markov model with multinomial output. Easy to extend to other output distributions. Also included is a collapsed Gibbs sampler for comparison.
 Changes to previous version:
Since 0.4: Removed the need for Stirling numbers. Fixed a bug in the backtwards sampling stage.
 BibTeX Entry: Download
 Corresponding Paper BibTeX Entry: Download
 Supported Operating Systems: Agnostic
 Data Formats: Matlab
 Tags: Hmm, Nonparametric Bayes
 Archive: download here
Other available revisons

Version Changelog Date 0.5 Since 0.4: Removed dependency from lightspeed (now using statistics toolbox). Updated to newer matlab versions.
July 21, 2010, 23:41:24 0.4 Since 0.4: Removed the need for Stirling numbers. Fixed a bug in the backtwards sampling stage.
August 24, 2009, 11:27:05 0.3 Initial Announcement on mloss.org.
July 20, 2009, 11:44:40
Comments

 Jurgen Van Gael (on November 12, 2009, 10:34:08)
The iHMM software depends on Tom Minka's lightspeed toolbox. The latest version of lightspeed has some issues, I will fix this this weekend.
Cheers, Jurgen

 Jurgen Van Gael (on July 21, 2010, 23:42:29)
My weekend was a bit longer than expected. Hope things work out now ;)

 Cao Thao (on July 22, 2010, 04:27:40)
Dear J. Van. Gael,
Does iHMM can be run in outModel is 'ar1', when I test this with generated data is [Y, STrue] = HmmGenerateData(1,T,pi,A,E,'ar1'); ... [S stats] = iHmmSampleBeam(Y, hypers, 500, 1, 1, ceil(rand(1,T) * 10));
There is error: ??? Attempted to access sample.Phi(1,0.0114177); index must be a positive integer or logical.
Error in ==> iHmmSampleBeam at 125 dyn_prog(k,1) = sample.Phi(k, Y(1)) * dyn_prog(k,1);
I think it may require the data is interger... Could you help me to solve this problem? Thank you very much, Cao Thao

 Jurgen Van Gael (on July 22, 2010, 09:43:16)
Hi Cao,
No, the iHMMSampleBeam runs on discrete data and the iHMMNormalSampleBeam runs on normally distributed data. It's not too complicated to adapt the iHMMNormalSample beam to include the AR1 dependency though. It is essentially just changing the line which evaluates the likelihood in the dynamic programming section.
Cheers, Jurgen

 kamel aitmohand (on September 20, 2010, 13:14:26)
Is there a simple way to extend the code so as to have:  mixture of Gausssians instead of one single Gaussian per state  learn from multiple sequences of data instead one one single sequence

 Jurgen Van Gael (on September 22, 2010, 15:27:33)
Hi Kemal,
Yes this should be fairly easy. It would just require introducing an extra mixture model parameter sampling step at the end of the beam iteration loop. Hope this helps ...

 kamel aitmohand (on September 28, 2010, 11:44:09)
Hi Jurgen, thank you for replying. So, I have to call the "SampleNormalMeans" function as many times as the number of Gaussians in the mixture. Is it correct?
What about making the Gaussian means multidimensional (for multidimensional data)? Is it possible and how? Last question : Is the algorithm efficient for large dimensionality data (~200300 dimensions)?

 Jurgen Van Gael (on September 28, 2010, 16:43:21)
Hi Kamel,
No, you'll have to replace SampleNormalMeans with your own function for resampling the parameters of a mixture model.
You could rewrite SampleNormalMeans for multidimensional data, that should not be too complicated and would retain the same structure as the current code. As for high dimensions, I think the biggest issue will be that you'll have to store a 300 dimensional covariance matrix and if you have a lot of states that might require a fair amount of RAM.

 Sumithra Surendralal (on February 11, 2014, 17:32:32)
Hi Jurgen,
Could you give me a reference for the calcualation of the joint loglikelihood in 'iHMMJointLogLikelihood.m'? I'm trying to understand it.
Thanks, Sumithra

 David Pfau (on October 3, 2014, 22:57:56)
I'm not sure that I understand the resampling of the pi stick lengths at line 104 in iHmmSampleBeam.m
If I understand correctly, you have the stick for each pi that corresponds to the "unseen" state, and you need to break that in two once you've instantiated a new state. We can call this pi{k}{u}, corresponding to beta{u}, the remainder of the beta stick that also corresponds to all unseen states.
Having already broken beta{u} into beta{u} and beta{u+1}, you need to break pi{k}{u} as well for all states k. The fraction of pi{k}{u} that goes to the first stick is distributed as Beta(alpha * beta{u}, alpha * beta{u+1}), right? Then what's with the sum over all the other stick lengths going on in line 104?

 Federica cloname (on September 24, 2016, 00:27:27)
Dear Jurgen,
I'm trying to implement iHMM with your code. My dataset is of about 100 strings of length varying from 1 to 130.
How can I modify the functions to make them run on all the dataset at the same time? (i.e. getting an estimate for the number of hidden states and transition and emission probabilities that would take in consideration all the strings)
Thank you, Federica
Leave a comment
You must be logged in to post comments.
Couldn't run the files. Missing implementation for the function randgamma and dirichlet_sample. What files am I missing?