org.knowceans.sandbox.hlda
Class NestedCrpNode

java.lang.Object
  extended by org.knowceans.sandbox.hlda.NestedCrpNode

public class NestedCrpNode
extends java.lang.Object

NestedCrpNode models a nested Chinese restaurant process (CRP). Each level of the NCRP tree can be considered a multinomial over the assignment of z[level]. On the first level, the root is taken with probability one. On the second level, N branches have been formed by the CRP sampling on level 0, and correspondingly the topic assignment to one of the brances follows the occupation numbers.

Author:
heinrich

Field Summary
private  java.util.Vector<NestedCrpNode> children
          children = list of occupied tables in the nested CRP
 int depth
          depth L of the nested CRP
private  double gamma
          concentration parameter of the CRP
 int id
           
 int level
          level l in the nesting structure (root = 0)
static int maxid
           
 double newChildLikelihood
          probability for creation of a new child
(package private)  int[] nw
          c_m,ell number of times term i is assigned to this node (= "the topic indexed by c_m,ell" from the HLDA paper).
(package private)  int nwsum
          number of terms assigned to this node ("tree topic").
 int occupation
          occupation count (customers at the table).
 NestedCrpNode parent
          parent restaurant (root = null)
private  double[] phiBeta
          topic--word assignments for this node (phi with Griffiths; beta with Blei)
private  double[] phiBetaSum
          sum of phiBeta (used for averaging)
 double prob
          sampling probability for this node for the sampleHierarchical() routine.
 HldaGibbsSampler sampler
          backref to sampler (to avoid duplicate variables)
 
Constructor Summary
  NestedCrpNode(HldaGibbsSampler samp, int depth, double gamma)
          initialise a nested CRP root node = initialise a nested CRP.
protected NestedCrpNode(NestedCrpNode parent)
          initialise a CRP node deeper than root
 
Method Summary
private  void constructProb(int doc)
           
(package private)  double[] getPhiBeta()
          get the topic--word assignments for this node.
private  NestedCrpNode sampleCrp()
          returns the child node sampled from the CRP.
 NestedCrpNode[] sampleHierarchical()
          given tbe probability assignments in the nodes and the L-vector newprob for new CRP paths, sample a path hierarchically.
 NestedCrpNode[] samplePath()
          sample once from the nested CRP and update the occupation numbers
 NestedCrpNode[] samplePath(int doc)
          calculates the word conditionals, which, in conjunction with the CRP prior, yield the path probabilities.
private  NestedCrpNode sampleProb(boolean crp)
          Sample according to probability assignments in the prob field.
private  void updateCounts(NestedCrpNode[] path, int doc)
          updates the counts for the current sample.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

maxid

public static int maxid

sampler

public HldaGibbsSampler sampler
backref to sampler (to avoid duplicate variables)


level

public int level
level l in the nesting structure (root = 0)


id

public int id

depth

public int depth
depth L of the nested CRP


parent

public NestedCrpNode parent
parent restaurant (root = null)


occupation

public int occupation
occupation count (customers at the table). Is initially set to one (creation = first occupation)


prob

public double prob
sampling probability for this node for the sampleHierarchical() routine.


newChildLikelihood

public double newChildLikelihood
probability for creation of a new child


phiBeta

private double[] phiBeta
topic--word assignments for this node (phi with Griffiths; beta with Blei)


phiBetaSum

private double[] phiBetaSum
sum of phiBeta (used for averaging)


nw

int[] nw
c_m,ell number of times term i is assigned to this node (= "the topic indexed by c_m,ell" from the HLDA paper). TODO: put in sampler class and index from there with the c.


nwsum

int nwsum
number of terms assigned to this node ("tree topic").


children

private java.util.Vector<NestedCrpNode> children
children = list of occupied tables in the nested CRP


gamma

private double gamma
concentration parameter of the CRP

Constructor Detail

NestedCrpNode

public NestedCrpNode(HldaGibbsSampler samp,
                     int depth,
                     double gamma)
initialise a nested CRP root node = initialise a nested CRP.

Parameters:
depth - L of the tree structure
gamma - concentration parameter of the underlying CRP (expected occupation number per child node) XXX: strongly unbalanced through hierarchy, but this means that branching occurs first at the top and later at the bottom when more data arrives!

NestedCrpNode

protected NestedCrpNode(NestedCrpNode parent)
initialise a CRP node deeper than root

Parameters:
parent -
Method Detail

getPhiBeta

double[] getPhiBeta()
get the topic--word assignments for this node.

Returns:

samplePath

public NestedCrpNode[] samplePath(int doc)
calculates the word conditionals, which, in conjunction with the CRP prior, yield the path probabilities. The word conditional factors are stored in the hierarchy, so the sampling process directly can access them. Use only in root node.

Parameters:
doc -

updateCounts

private void updateCounts(NestedCrpNode[] path,
                          int doc)
updates the counts for the current sample.

Parameters:
path -
doc -

constructProb

private void constructProb(int doc)
Parameters:
doc -

samplePath

public NestedCrpNode[] samplePath()
sample once from the nested CRP and update the occupation numbers

Returns:
node objects

sampleHierarchical

public NestedCrpNode[] sampleHierarchical()
given tbe probability assignments in the nodes and the L-vector newprob for new CRP paths, sample a path hierarchically. This works for the word conditional in the HLDA model because it can be factored over levels.

Returns:

sampleProb

private NestedCrpNode sampleProb(boolean crp)
Sample according to probability assignments in the prob field.

Parameters:
crp - true if the prior term should be used.
Returns:

sampleCrp

private NestedCrpNode sampleCrp()
returns the child node sampled from the CRP.

Returns: