org.knowceans.util
Class FastMultinomial

java.lang.Object
  extended by org.knowceans.util.FastMultinomial

public abstract class FastMultinomial
extends java.lang.Object

FastMultinomial implements a fast multinomial sampler that exploits heavy-tail distribution of the probability masses. It assumes that the weights are based on a product term and uses a series of bounds on the normalisation constant of the sampling distribution that allows to calculate the terms iteratively. The algorithm was described for a fast LDA Gibbs sampler in Porteous, Newman and Welling (KDD 2008) and is here used more generically.

The core data structure is the weights array abc that contains a multinomial factor distribution in each of its rows. Operation requires to keep an index of sorted elements of abc, for which the dominant element should be chosen with a call to indexsort(abc[dominantindex]). Further, a norm of all of the rows of abc should be calculated using sumcube(abc[i]), not including the root operation. Finally, samples can be taken from the distribution, updating the state of the sampler in all three parts, abc, the norm and the sort index. There are two sampling methods, one that returns the sample as an index of the sorting index array (useful to maintain the sampling order using reorder()) and one that returns the actual sample index in abc.

The static functions are proof of concept that does not save much computing time because they don't pre-compute the weights beforehand. Actual speed is gained by subclassing and calculating the weights in getWeights.

Author:
gregor

Constructor Summary
FastMultinomial(int I, int K, java.util.Random rand)
          set up a fast multinomial using a subclass implementation for the
 
Method Summary
static double cube(double x)
          cube of the argument
static double cuberoot(double x)
          cube root of argument
abstract  void getWeights(int k, double[] weights)
          Get the weight for dimension k for each factor.
static void indexsort(double[] x, int[][] idx)
          sort idx according to reverse order of elements of x, leaving x untouched.
 double initnorm(double[] abci)
          Calculate exponentiated norm sum of x for the weights arr.
static void main(java.lang.String[] args)
           
 double reducenorm(double[] abcl2kExpnorm, double[] abcl1)
          This function is supposed to do two things: remove the current element abc from the exponentiated norm and calculate the norm of the remaining elements by the root.
static void reorder(double[] x, int[] idx, int kinc, int kdec)
          reorder the index of a sorted array after element kinc had been incremented and kdec decremented (referring to the indices in idx).
static int sample(double[][] abc, double[] abcnorm, int[] idx, java.util.Random rand)
          Fast sampling from a distribution with weights in factors abc, each element of which is a vector of the size of the sampling space K.
static int sample0(double[][] weights, java.util.Random rand)
          Standard sampling from a distribution with weights in factors abc, each element of which is a vector of the size of the sampling space K.
static int sample00(double[][] weights, java.util.Random rand)
          Standard sampling from a distribution with weights in factors abc, each element of which is a vector of the size of the sampling space K.
static int sample2(double[][] abc, double[] abcnorm, int[] idx, java.util.Random rand)
           
 int sampleIdx()
          Fast sampling from a distribution with weights in factors abc, each element of which is a vector of the size of the sampling space K.
static int sampleIdx(double[][] weights, double[] abcnorm, int[] idx, java.util.Random rand)
          Fast sampling from a distribution with weights in factors abc, each element of which is a vector of the size of the sampling space K.
static int sampleIdx2(double[][] weights, double[] abcnorm, int[] idx, java.util.Random rand)
          Fast sampling from a distribution with weights in factors abc, each element of which is a vector of the size of the sampling space K.
static double sumcube(double[] x)
          calculate sum of cubes of argument
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FastMultinomial

public FastMultinomial(int I,
                       int K,
                       java.util.Random rand)
set up a fast multinomial using a subclass implementation for the

Parameters:
I - number of factor distributions
K - number of topics
pownorm - cube norms of factors [I]
idx - descending order of weight factors
rand - random sampler
Method Detail

main

public static void main(java.lang.String[] args)

sampleIdx

public int sampleIdx()
Fast sampling from a distribution with weights in factors abc, each element of which is a vector of the size of the sampling space K. Further, the cube norm of each of the factors is given in pownorm (without the inverse exponential operation), as well as the indexes of the values of abc ordered decending (by one of the factors).

Parameters:
I - number of factor distributions
K - number of topics
pownorm - cube norms of factors [I]
idx - descending order of weight factors
rand - random sampler
Returns:
a sample as index of idx, use sample() to get the original index of abc

reducenorm

public double reducenorm(double[] abcl2kExpnorm,
                         double[] abcl1)
This function is supposed to do two things: remove the current element abc from the exponentiated norm and calculate the norm of the remaining elements by the root. This function is implemented using the cube root and should be subclassed for other norms.

Parameters:
abcl2kExpnorm - [in/out] exponentiated norms from l to K - 1 in each row
abcl1 - [in] weights to be reduced = abc[][l-1]
Returns:

initnorm

public double initnorm(double[] abci)
Calculate exponentiated norm sum of x for the weights arr. This implementation uses cube norm. Typically, this function is called for each factor once before fast sampling can start. It requires that weights for all k are calculated initially, whereas for the fast sampler this is not necessary.

Parameters:
abci - factor
Returns:
norm (exp)

getWeights

public abstract void getWeights(int k,
                                double[] weights)
Get the weight for dimension k for each factor. This is subclassed and contains the actual weights calculations whose number should be reduced by the sorted sampling.

Parameters:
k -
weights - get the weights

sample

public static int sample(double[][] abc,
                         double[] abcnorm,
                         int[] idx,
                         java.util.Random rand)
Fast sampling from a distribution with weights in factors abc, each element of which is a vector of the size of the sampling space K. Further, the cube norm of each of the factors is given in pownorm (without the inverse exponential operation), as well as the indexes of the values of abc ordered decending (by one of the factors).

Parameters:
abc - weight factors of multinomial masses [I x K]
pownorm - cube norms of factors [I]
idx - descending order of weight factors
rand - random sampler
Returns:
a sample as index of abc

sample2

public static int sample2(double[][] abc,
                          double[] abcnorm,
                          int[] idx,
                          java.util.Random rand)

sample0

public static int sample0(double[][] weights,
                          java.util.Random rand)
Standard sampling from a distribution with weights in factors abc, each element of which is a vector of the size of the sampling space K.

Parameters:
weights - weight factors of multinomial masses [I x K]
rand - random sampler
Returns:
a sample as index of weights

sample00

public static int sample00(double[][] weights,
                           java.util.Random rand)
Standard sampling from a distribution with weights in factors abc, each element of which is a vector of the size of the sampling space K.

Parameters:
weights - weight factors of multinomial masses [I x K]
rand - random sampler
Returns:
a sample as index of weights

sampleIdx

public static int sampleIdx(double[][] weights,
                            double[] abcnorm,
                            int[] idx,
                            java.util.Random rand)
Fast sampling from a distribution with weights in factors abc, each element of which is a vector of the size of the sampling space K. Further, the cube norm of each of the factors is given in pownorm (without the inverse exponential operation), as well as the indexes of the values of abc ordered decending (by one of the factors).

Parameters:
weights - weight factors of multinomial masses [I x K]
pownorm - cube norms of factors [I]
idx - indices of weights in descending order of weight values
rand - random sampler
Returns:
a sample as index of idx, use sample() to get the original index of abc

sampleIdx2

public static int sampleIdx2(double[][] weights,
                             double[] abcnorm,
                             int[] idx,
                             java.util.Random rand)
Fast sampling from a distribution with weights in factors abc, each element of which is a vector of the size of the sampling space K. Further, the cube norm of each of the factors is given in pownorm (without the inverse exponential operation), as well as the indexes of the values of abc ordered decending (by one of the factors).

Parameters:
weights - weight factors of multinomial masses [I x K]
pownorm - cube norms of factors [I]
idx - descending order of weight factors
rand - random sampler
Returns:
a sample as index of idx, use sample() to get the original index of abc

indexsort

public static void indexsort(double[] x,
                             int[][] idx)
sort idx according to reverse order of elements of x, leaving x untouched.

Parameters:
x - [in] unsorted array
idx - [in/out] index array reordered so x[idx[0..K-1]] has descending order.

reorder

public static void reorder(double[] x,
                           int[] idx,
                           int kinc,
                           int kdec)
reorder the index of a sorted array after element kinc had been incremented and kdec decremented (referring to the indices in idx). This is a minimal form of quicksort.

Parameters:
x - weights array
idx - indices from x to idx
kinc - element in idx just incremented
kdec - element in idx just decremented

cube

public static double cube(double x)
cube of the argument

Parameters:
x -
Returns:

cuberoot

public static double cuberoot(double x)
cube root of argument

Parameters:
x -
Returns:

sumcube

public static double sumcube(double[] x)
calculate sum of cubes of argument

Parameters:
x -
Returns:
sum x^3