org.knowceans.dirichlet.atm
Class AtmTopicSimilarities

java.lang.Object
  extended by org.knowceans.dirichlet.lda.LdaTopicSimilarities
      extended by org.knowceans.dirichlet.atm.AtmTopicSimilarities

public class AtmTopicSimilarities
extends LdaTopicSimilarities

FmLdaSimilarities calculates similarities between term and documents, both known and unknown. This is the interface for LDA queries, once the topics of an unknown string have been determined. This implementation supports both the symmetrised KL-divergence (= Jenson-Shannon distance) and a predictive likelihood.

By convention, conditional likelihoods are normalised along rows, i.e., p(col|row) = double[row][col]; If distributions are along columns, some methods provide a transposed flag.

Author:
heinrich

Field Summary
 
Fields inherited from class org.knowceans.dirichlet.lda.LdaTopicSimilarities
phi, phiPost, theta, thetaPost
 
Constructor Summary
AtmTopicSimilarities(AtmGibbsSampler lda, boolean terms, boolean authors, boolean pl, boolean js)
          Initialise topic similarities using an existing lda gibbs sampler, whose phi and theta values are shared.
AtmTopicSimilarities(java.lang.String atmbase, boolean terms, boolean authors, boolean pl, boolean js)
          Construct an LdaSimilaritiesCps object with path bases and action indicators for terms and documents processing
 
Method Summary
 org.knowceans.map.IndexRanking authorAuthors(int author, boolean mutLik, int max)
          Get the most similar authors for the author.
 org.knowceans.map.IndexRanking docDocs(int doc, boolean mutLik, int max)
          TODO: implement the search for documents.
 org.knowceans.map.IndexRanking docTerms(int doc, boolean mutLik, int max)
          Get the most similar terms for the doc.
 org.knowceans.map.IndexRanking queryAuthors(double[] topics, boolean mutLik, int max)
          Get the most similar documents for the query expressed as distribution over z.
 org.knowceans.map.IndexRanking queryDocs(double[] topics, boolean mutLik, int max)
          TODO: implement the search for documents.
 org.knowceans.map.IndexRanking[] queryTerms(double[][] topics, boolean mutLik, int max)
          Get the most similar terms for the queries expressed as array of distributions over z.
 org.knowceans.map.IndexRanking termDocs(int term, boolean mutLik, int max)
          Get the most similar docs for the term.
 org.knowceans.map.IndexRanking termTerms(int term, boolean mutLik, int max)
          Get the most similar terms for the term.
 
Methods inherited from class org.knowceans.dirichlet.lda.LdaTopicSimilarities
bestJsMatches, bestJsMatches, bestMutLikMatches, bestMutLikMatches, getPhi, getPhiPost, getTheta, getThetaPost, jsDistance, jsDistance, klDivergence, klDivergence, mutualLikelihood, mylog, posterior, queryDocs, queryTerms
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AtmTopicSimilarities

public AtmTopicSimilarities(java.lang.String atmbase,
                            boolean terms,
                            boolean authors,
                            boolean pl,
                            boolean js)
                     throws java.io.IOException
Construct an LdaSimilaritiesCps object with path bases and action indicators for terms and documents processing

Parameters:
atmbase - path base of lda parameter set (path + filename excluding extensions .phi.zip and .theta.zip)
terms - load term matrix (phi)
authors - load document matrix (theta)
pl - configure for use with predictive likelihoods (syn. mutual likelihood, because it appears to be symmetric)
js - configure for use with jenson shannon likelihood
Throws:
java.io.IOException

AtmTopicSimilarities

public AtmTopicSimilarities(AtmGibbsSampler lda,
                            boolean terms,
                            boolean authors,
                            boolean pl,
                            boolean js)
Initialise topic similarities using an existing lda gibbs sampler, whose phi and theta values are shared.

Parameters:
lda -
terms -
authors -
pl -
js -
Method Detail

queryAuthors

public org.knowceans.map.IndexRanking queryAuthors(double[] topics,
                                                   boolean mutLik,
                                                   int max)
Get the most similar documents for the query expressed as distribution over z.

Parameters:
topics - distribution over z. multiple elements
mutLik - use mutual / predictive likelihood (otherwise jensen-shannon)
max - maximum number of matches
Returns:

queryDocs

public org.knowceans.map.IndexRanking queryDocs(double[] topics,
                                                boolean mutLik,
                                                int max)
TODO: implement the search for documents.

Overrides:
queryDocs in class LdaTopicSimilarities
Parameters:
topics - distribution over z. multiple elements
mutLik - use mutual / predictive likelihood (otherwise jensen-shannon)
max - maximum number of matches
Returns:

queryTerms

public org.knowceans.map.IndexRanking[] queryTerms(double[][] topics,
                                                   boolean mutLik,
                                                   int max)
Get the most similar terms for the queries expressed as array of distributions over z.

Overrides:
queryTerms in class LdaTopicSimilarities
Parameters:
topics - distribution over z.
mutLik - use mutual / predictive likelihood (otherwise jensen-shannon)
max - maximum number of matches
Returns:

authorAuthors

public org.knowceans.map.IndexRanking authorAuthors(int author,
                                                    boolean mutLik,
                                                    int max)
Get the most similar authors for the author.

Parameters:
doc - document index
mutLik - use mutual / predictive likelihood (otherwise jensen-shannon)
max - maximum number of matches
Returns:

docDocs

public org.knowceans.map.IndexRanking docDocs(int doc,
                                              boolean mutLik,
                                              int max)
TODO: implement the search for documents.

Overrides:
docDocs in class LdaTopicSimilarities
Parameters:
doc - document index
mutLik - use mutual / predictive likelihood (otherwise jensen-shannon)
max - maximum number of matches
Returns:

termTerms

public org.knowceans.map.IndexRanking termTerms(int term,
                                                boolean mutLik,
                                                int max)
Get the most similar terms for the term.

Overrides:
termTerms in class LdaTopicSimilarities
Parameters:
doc - document index
mutLik - use mutual / predictive likelihood (otherwise jensen-shannon)
max - maximum number of matches
Returns:

docTerms

public org.knowceans.map.IndexRanking docTerms(int doc,
                                               boolean mutLik,
                                               int max)
Get the most similar terms for the doc.

Overrides:
docTerms in class LdaTopicSimilarities
Parameters:
doc - document index
mutLik - use mutual / predictive likelihood (otherwise jensen-shannon) *
max - maximum number of matches
Returns:

termDocs

public org.knowceans.map.IndexRanking termDocs(int term,
                                               boolean mutLik,
                                               int max)
Get the most similar docs for the term.

Overrides:
termDocs in class LdaTopicSimilarities
Parameters:
term - term index
mutLik - use mutual / predictive likelihood (otherwise jensen-shannon) *
max - maximum number of matches
Returns: