org.knowceans.sandbox.gauss
Class SimpleIgmmGibbsSampler

java.lang.Object
  extended by org.knowceans.sandbox.gauss.SimpleIgmmGibbsSampler

public class SimpleIgmmGibbsSampler
extends java.lang.Object

SimpleIgmmGibbsSampler implements the infinite Gaussian mixture model as a Gibbs sampler. The algorithm runs with a priori hyperparameters on the Gaussian and Dirichlet Process distributions.

It is a preliminary to the completely non-parametric IGMM proposed by Rasmussen (NIPS-12) and has the following model:

 Mixture model:
 x       = sum_j^k pi_j * N(mu_j, 1/s_j)
 Data points:
 x | c_j ~ N(mu_j, 1/s_j)
 Component means:
 mu_j    ~ N(lambda, 1/r)
 lambda  mean expectation
 r       mean precision
 Component precisions:
 s_j     ~ Gamma(beta, 1/w)
 beta    precision shape
 w       precision scale
 Component weights:
 pi_j    ~ Dirichlet(alpha/k) and via infinite limit:
         ~ Stick(.)
 alpha   Dirichlet concentration
 
Note that Rasmussen defines Gamma(a,b) as having mean b, not a*b, which is reflected in the method sampleGammaDist().

Author:
heinrich

Field Summary
private  double alpha
          crp prior
private  double beta
          mean of s_j
private  int burnIn
          burn-in period
private  int[] cc
          state (component) for each data point
 int debugLevel
          debug level for output (5=info, 1=error)
private  int growstep
          array grow step
private  int iterations
          max iterations
private  int k
          number of components
private  double lambda
          mean of mu_j
private  double[] muj
          component means
private  double muunrep
          mean of unrepresented components
private  double muy
          mean of the data
private  int n
          data size
private  double[] nn
          occupation numbers for each component (double for usage of double-valued methods)
private  double r
          precision of mu_j
private  boolean randomScan
          random scan or systematic scan
private  double sigmasqy
          variance of the data
private  double[] sj
          inverse component variances (precisions)
private  double sunrep
          precision of unrepresented components
private  int thinInterval
          sampling lag
private  double w
          precision of s_j
private  double[] ysum
          component data sum
private  double[] yy
          vector of univariate data points
 
Constructor Summary
SimpleIgmmGibbsSampler(double[] data)
          Initialise the Gibbs sampler with data.
 
Method Summary
private  void addComponent()
          handle size of componentwise structures.
 void configure(int iterations, int burnIn, int thinInterval)
          set sampling conditions
private  void debug(int level, java.lang.String string)
          print debug information
(package private)  double[] getMean()
          get the mean of the components
(package private)  double[] getStdDev(double[] mean)
          get the standard deviation of the components
(package private)  double[] getWeights()
          get the mixture weights of the components
private  void gibbs()
          Main method: Select initial state ?
(package private)  void initialState()
          Initialisation: starts with one class and assigns data-dependent piors (which Rasmussen justifies in his paper).
static void main(java.lang.String[] args)
          Driver with example data.
private  void removeComponent(int j)
          removes one component from the model
(package private)  int sampleC(int i)
          sample component association to data point i with likelihood.
(package private)  int sampleCrpC(int i)
          sample component association to data point i using Chinese restaurant process including likelihood term.
 double sampleGammaDist(double a, double b)
          Gamma distribution with mean as a parameter b (normally mean = a*b)
(package private)  double sampleMu(int j)
          sample the means.
(package private)  double sampleMuUnrep()
          sample from prior on s for unrepresented classes
(package private)  double sampleNormalDist(double mu, double sigmaSquared)
          Normal distribution with variance as parameter instead of standard deviation.
(package private)  double sampleS(int j)
          sample precision for component j.
(package private)  double sampleSUnrep()
          sample from prior on mu for unrepresented classes
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

yy

private double[] yy
vector of univariate data points


thinInterval

private int thinInterval
sampling lag


burnIn

private int burnIn
burn-in period


iterations

private int iterations
max iterations


growstep

private int growstep
array grow step


muy

private double muy
mean of the data


sigmasqy

private double sigmasqy
variance of the data


alpha

private double alpha
crp prior


lambda

private double lambda
mean of mu_j


beta

private double beta
mean of s_j


r

private double r
precision of mu_j


w

private double w
precision of s_j


muj

private double[] muj
component means


muunrep

private double muunrep
mean of unrepresented components


sj

private double[] sj
inverse component variances (precisions)


sunrep

private double sunrep
precision of unrepresented components


k

private int k
number of components


nn

private double[] nn
occupation numbers for each component (double for usage of double-valued methods)


cc

private int[] cc
state (component) for each data point


n

private int n
data size


ysum

private double[] ysum
component data sum


randomScan

private boolean randomScan
random scan or systematic scan


debugLevel

public int debugLevel
debug level for output (5=info, 1=error)

Constructor Detail

SimpleIgmmGibbsSampler

public SimpleIgmmGibbsSampler(double[] data)
Initialise the Gibbs sampler with data.

Parameters:
data -
Method Detail

configure

public void configure(int iterations,
                      int burnIn,
                      int thinInterval)
set sampling conditions

Parameters:
iterations -
burnIn -
thinInterval -

initialState

void initialState()
Initialisation: starts with one class and assigns data-dependent piors (which Rasmussen justifies in his paper).


addComponent

private void addComponent()
handle size of componentwise structures.

Note: We use arrays for components more readable syntax and possibly speed loss during all cast operations when accessing a Vector. Therefore all loops over components should explicitly use k, not, e.g., mu.length. The problem with this approach is that it is hard to remove unoccupied classes


removeComponent

private void removeComponent(int j)
removes one component from the model

Parameters:
j -

getMean

double[] getMean()
get the mean of the components

Returns:

getStdDev

double[] getStdDev(double[] mean)
get the standard deviation of the components

Parameters:
mean -
Returns:

getWeights

double[] getWeights()
get the mixture weights of the components

Returns:

sampleC

int sampleC(int i)
sample component association to data point i with likelihood. Eq. (13) and (1)

Parameters:
i -
Returns:

sampleCrpC

int sampleCrpC(int i)
sample component association to data point i using Chinese restaurant process including likelihood term. Eq. (17)

Parameters:
i -
Returns:

sampleGammaDist

public double sampleGammaDist(double a,
                              double b)
Gamma distribution with mean as a parameter b (normally mean = a*b)

Parameters:
a -
b -
Returns:

sampleNormalDist

double sampleNormalDist(double mu,
                        double sigmaSquared)
Normal distribution with variance as parameter instead of standard deviation.

Parameters:
a -
b -
Returns:

sampleMu

double sampleMu(int j)
sample the means.

TODO: possibility to update means whenever component associations are changed --> no need to calc mu[j] here

Parameters:
j -
Returns:

sampleMuUnrep

double sampleMuUnrep()
sample from prior on s for unrepresented classes

Returns:

sampleSUnrep

double sampleSUnrep()
sample from prior on mu for unrepresented classes

Returns:

sampleS

double sampleS(int j)
sample precision for component j. Eq. (8)

Parameters:
j -
Returns:

gibbs

private void gibbs()
Main method: Select initial state ? Repeat a large number of times: 1. Select an element 2. Update conditional on other elements. If appropriate, output summary for each run.

Parameters:
k -
probs -
mean -
sigma -

debug

private void debug(int level,
                   java.lang.String string)
print debug information

Parameters:
level - debug level (5=info to 1=error)
string -

main

public static void main(java.lang.String[] args)
Driver with example data.

Parameters:
args -