org.knowceans.dirichlet.lda
Class LdaGibbsQuerySampler

java.lang.Object
  extended by org.knowceans.dirichlet.lda.LdaGibbsSampler
      extended by org.knowceans.dirichlet.lda.LdaGibbsQuerySampler
All Implemented Interfaces:
java.io.Serializable

public class LdaGibbsQuerySampler
extends LdaGibbsSampler

LdaGibbsQuerySampler allows sampling from known markov states, i.e., the model of a corpus, which can be used to predict the topics of query documents.

Author:
gregor
See Also:
Serialized Form

Field Summary
private static long serialVersionUID
           
(package private)  LdaMarkovState stateq
          stateq contains the query documents.
(package private)  LdaMarkovState stateSave
          stateSave contains the saved markov state (after initially loading the state.
private  double[][] thetasumq
           
 
Fields inherited from class org.knowceans.dirichlet.lda.LdaGibbsSampler
backupIteration, conf, dispcol, numstats, phisum, rand, state, thetasum
 
Constructor Summary
LdaGibbsQuerySampler(LdaMarkovState state, ExtLdaConfiguration conf, boolean restorable)
          Initialise the gibbs sampler with a known markov state (for querying).
LdaGibbsQuerySampler(LdaMarkovState state, ExtLdaConfiguration conf, java.util.Random rand, boolean restorable)
          Initialise the gibbs sampler with a known markov state (for querying).
 
Method Summary
 double[][] getPredictiveTheta()
          Get the document--topic associations of the query documents.
 double[][] getSavedPhi()
          Get the backed up phi (without influence of the queries).
protected  void gibbs()
          Main method: Select initial state ?
private  void initialState(boolean restorable)
          Initialisation: initialise the sampler from a known state of the markov chain for querying the model.
 double[] query(int[] document)
          Initialise the sampler with a one-document query.
 double[][] query(int[][] query)
          Initialise the gibbs sampler with the query documents
 void restore()
          For restorable state operation, restore (reinitialise) the state to that of the markov chain at object creation time.
protected  void updateTheta()
          Add to the statistics the values of theta for the current state.
 
Methods inherited from class org.knowceans.dirichlet.lda.LdaGibbsSampler
getPhi, getState, getTheta, gibbs, gibbsHeap, gibbsHeap, initialState, load, main, output, run, sampleCorpus, sampleLdaFullConditional, save, saveState, updateParams, updatePhi, writeParameters
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

serialVersionUID

private static final long serialVersionUID
See Also:
Constant Field Values

stateq

LdaMarkovState stateq
stateq contains the query documents. Its nw and nwsum fields are shared with the corpus state.


stateSave

LdaMarkovState stateSave
stateSave contains the saved markov state (after initially loading the state. It is a complete copy of the


thetasumq

private double[][] thetasumq
Constructor Detail

LdaGibbsQuerySampler

public LdaGibbsQuerySampler(LdaMarkovState state,
                            ExtLdaConfiguration conf,
                            boolean restorable)
Initialise the gibbs sampler with a known markov state (for querying).

Parameters:
state -
conf -

LdaGibbsQuerySampler

public LdaGibbsQuerySampler(LdaMarkovState state,
                            ExtLdaConfiguration conf,
                            java.util.Random rand,
                            boolean restorable)
Initialise the gibbs sampler with a known markov state (for querying).

Parameters:
state -
conf -
rand -
restorable - whether the initial markov state can be restored using restore (see there).
Method Detail

initialState

private void initialState(boolean restorable)
Initialisation: initialise the sampler from a known state of the markov chain for querying the model.

Parameters:
restorable - whether the initial markov state can be restored using restore (see there).

query

public double[] query(int[] document)
Initialise the sampler with a one-document query.

Parameters:
document -
Returns:
document--topic associations for the query

query

public double[][] query(int[][] query)
Initialise the gibbs sampler with the query documents

Parameters:
query - word vectors
Returns:
document--topic associations (same as getPredictiveTheta())

restore

public void restore()
For restorable state operation, restore (reinitialise) the state to that of the markov chain at object creation time.

Because the association counts are influenced by the queries, the original state of the markov chain becomes "dirty". Therefore, this state can be backed up by enabling the argument restorable in the constructors.


updateTheta

protected void updateTheta()
Add to the statistics the values of theta for the current state.

Overrides:
updateTheta in class LdaGibbsSampler

getPredictiveTheta

public double[][] getPredictiveTheta()
Get the document--topic associations of the query documents.

Returns:

getSavedPhi

public double[][] getSavedPhi()
Get the backed up phi (without influence of the queries).

Returns:
phi multinomial mixture of topic words (K x V)

gibbs

protected void gibbs()
Description copied from class: LdaGibbsSampler
Main method: Select initial state ? Repeat a large number of times: 1. Select an element 2. Update conditional on other elements. If appropriate, output summary for each run.

Overrides:
gibbs in class LdaGibbsSampler