org.knowceans.dirichlet.atm
Class AtmGibbsSampler

java.lang.Object
  extended by org.knowceans.dirichlet.lda.LdaGibbsSampler
      extended by org.knowceans.dirichlet.atm.AtmGibbsSampler
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
AtmGibbsQuerySampler

public class AtmGibbsSampler
extends LdaGibbsSampler

Gibbs sampler for estimating the best assignments of topics for words and authors of documents in a corpus. The algorithm is introduced in Steyvers, M.; Smyth, P.; Rosen-Zvi, M. & Griffiths, T. "Probabilistic Author-Topic models for information discovery". Proc. ACM SIGKDD, 2004.

Author:
heinrich
See Also:
Serialized Form

Field Summary
protected  AtmMarkovState atmstate
          State variables of the Lda gibbs sampler.
private static long serialVersionUID
           
 
Fields inherited from class org.knowceans.dirichlet.lda.LdaGibbsSampler
backupIteration, conf, dispcol, numstats, phisum, rand, state, thetasum
 
Constructor Summary
protected AtmGibbsSampler()
          For subclasses who know what they do...
  AtmGibbsSampler(AmqCorpus corpus, AtmMarkovState state, ExtLdaConfiguration conf, java.util.Random rand)
          Initialise the sampler with an existing state.
  AtmGibbsSampler(AmqCorpus corpus, ExtLdaConfiguration conf)
          Initialise the corpus with
  AtmGibbsSampler(AmqCorpus corpus, ExtLdaConfiguration conf, java.util.Random rand)
          Initialise the corpus with
protected AtmGibbsSampler(AtmMarkovState state, ExtLdaConfiguration conf, java.util.Random rand)
          Initialise the sampler with an existing state.
  AtmGibbsSampler(int[][] documents, int V, int[][] authors, int A, double alpha, double beta, int K, int iterations)
          Initialise the Gibbs sampler with data and standard values.
 
Method Summary
 AtmMarkovState getState()
          Get the current state of the markov chain.
protected  void gibbs()
          Main method: Select initial state ?
 int[][] gibbsAtm(int[][] w, int V, int[][] ad, int A, int[][] z, int K, int[][] x, double alpha, double beta, int iter)
          Native implementation of the Gibbs sampling procedure.
 void gibbsAtmHeap(AtmMarkovState s, ExtLdaConfiguration c)
          Native gibbs sampling on the jvm heap
 int[][] gibbsAtmHeap(int[][] w, int[][] ad, int[][] z, int[][] x, int[][] nw, int[] nwsum, int[][] nd, int[] ndsum, double alpha, double beta, int iter)
          Native gibbs sampling on the jvm heap.
protected  void initialState()
          Initialisation: Random assignments with equal probabilities
static void main(java.lang.String[] args)
           
 void run()
          Run the sampler after initialisation.
protected  int[] sampleAtmFullConditional(AtmMarkovState s, int m, int n)
          Sample an actor--topic pair (x_i, z_i) from the full conditional distribution: p(x_i = q,z_i = j|z_-i, w, x_i, a_d, r_d) = (cwt_mj + beta)/(cwtsum_j + W * beta) * (cat_qjr + alpha)/(catsum_qr + K * alpha)
protected  void sampleCorpus(AtmMarkovState s)
          Sample once through the corpus and update the corresponding state.
 void saveState(java.lang.String file)
          Saves the current state of the markov chain and the parameters to a file.
 
Methods inherited from class org.knowceans.dirichlet.lda.LdaGibbsSampler
getPhi, getTheta, gibbs, gibbsHeap, gibbsHeap, load, output, sampleCorpus, sampleLdaFullConditional, save, updateParams, updatePhi, updateTheta, writeParameters
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

serialVersionUID

private static final long serialVersionUID
See Also:
Constant Field Values

atmstate

protected AtmMarkovState atmstate
State variables of the Lda gibbs sampler. The contract is to set this pointer equal to the pointer on LdaMarkovState LdaGibbsSampler.state.

Constructor Detail

AtmGibbsSampler

protected AtmGibbsSampler()
For subclasses who know what they do...


AtmGibbsSampler

public AtmGibbsSampler(int[][] documents,
                       int V,
                       int[][] authors,
                       int A,
                       double alpha,
                       double beta,
                       int K,
                       int iterations)
Initialise the Gibbs sampler with data and standard values. (For backwards compatibility).

Parameters:
documents -
V -
authors -
A -
alpha -
beta -
K -
iterations -

AtmGibbsSampler

public AtmGibbsSampler(AmqCorpus corpus,
                       ExtLdaConfiguration conf)
Initialise the corpus with

Parameters:
corpus -
conf -

AtmGibbsSampler

public AtmGibbsSampler(AmqCorpus corpus,
                       ExtLdaConfiguration conf,
                       java.util.Random rand)
Initialise the corpus with

Parameters:
corpus -
conf -
rand -

AtmGibbsSampler

protected AtmGibbsSampler(AtmMarkovState state,
                          ExtLdaConfiguration conf,
                          java.util.Random rand)
Initialise the sampler with an existing state.

Parameters:
corpus -
conf -
rand -

AtmGibbsSampler

public AtmGibbsSampler(AmqCorpus corpus,
                       AtmMarkovState state,
                       ExtLdaConfiguration conf,
                       java.util.Random rand)
Initialise the sampler with an existing state.

Parameters:
corpus -
state -
conf -
rand -
Method Detail

initialState

protected void initialState()
Initialisation: Random assignments with equal probabilities

Overrides:
initialState in class LdaGibbsSampler

run

public void run()
Run the sampler after initialisation.

Overrides:
run in class LdaGibbsSampler

gibbs

protected void gibbs()
Main method: Select initial state ? Repeat a large number of times: 1. Select an element 2. Update conditional on other elements. If appropriate, output summary for each run.

Overrides:
gibbs in class LdaGibbsSampler

gibbsAtm

public int[][] gibbsAtm(int[][] w,
                        int V,
                        int[][] ad,
                        int A,
                        int[][] z,
                        int K,
                        int[][] x,
                        double alpha,
                        double beta,
                        int iter)
Native implementation of the Gibbs sampling procedure. In the same class to allow subclass access.

Parameters:
w - words
V - vocabulary size
ad - author document associations
A - author count
z - topic word associations
K - tobic count
x - author word associations
alpha -
beta -
iter -
Returns:
the new assinments z

gibbsAtmHeap

public int[][] gibbsAtmHeap(int[][] w,
                            int[][] ad,
                            int[][] z,
                            int[][] x,
                            int[][] nw,
                            int[] nwsum,
                            int[][] nd,
                            int[] ndsum,
                            double alpha,
                            double beta,
                            int iter)
Native gibbs sampling on the jvm heap.

Parameters:
w - [in] words
ad - [in] document authors
z - [in/out] topic associations
x - [in/out] author word associations
nw - [in/out] topic-word counts
nwsum - [in/out] summed topic-word counts (total words per topic)
nd - [in/out] author-topic counts
ndsum - [in] author words (total words per author)
alpha -
beta -
iter -
Returns:

gibbsAtmHeap

public void gibbsAtmHeap(AtmMarkovState s,
                         ExtLdaConfiguration c)
Native gibbs sampling on the jvm heap

Parameters:
s - [in/out] state
c - [in] configuration

saveState

public void saveState(java.lang.String file)
Saves the current state of the markov chain and the parameters to a file.

Overrides:
saveState in class LdaGibbsSampler
Parameters:
file -

sampleCorpus

protected void sampleCorpus(AtmMarkovState s)
Sample once through the corpus and update the corresponding state. The parameter is used to choose the state to be sampled from: query vs. corpus; choice of chain for multichain sampling.

Parameters:
s -

sampleAtmFullConditional

protected int[] sampleAtmFullConditional(AtmMarkovState s,
                                         int m,
                                         int n)
Sample an actor--topic pair (x_i, z_i) from the full conditional distribution: p(x_i = q,z_i = j|z_-i, w, x_i, a_d, r_d) = (cwt_mj + beta)/(cwtsum_j + W * beta) * (cat_qjr + alpha)/(catsum_qr + K * alpha)

Parameters:
s - state
m - document
n - word
Returns:
int[] in the order {z x}

main

public static void main(java.lang.String[] args)

getState

public AtmMarkovState getState()
Get the current state of the markov chain.

Overrides:
getState in class LdaGibbsSampler
Returns: