|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.knowceans.dirichlet.lda.LdaTopicSimilarities
org.knowceans.dirichlet.atm.AtmTopicSimilarities
public class AtmTopicSimilarities
FmLdaSimilarities calculates similarities between term and documents, both known and unknown. This is the interface for LDA queries, once the topics of an unknown string have been determined. This implementation supports both the symmetrised KL-divergence (= Jenson-Shannon distance) and a predictive likelihood.
By convention, conditional likelihoods are normalised along rows, i.e., p(col|row) = double[row][col]; If distributions are along columns, some methods provide a transposed flag.
| Field Summary |
|---|
| Fields inherited from class org.knowceans.dirichlet.lda.LdaTopicSimilarities |
|---|
phi, phiPost, theta, thetaPost |
| Constructor Summary | |
|---|---|
AtmTopicSimilarities(AtmGibbsSampler lda,
boolean terms,
boolean authors,
boolean pl,
boolean js)
Initialise topic similarities using an existing lda gibbs sampler, whose phi and theta values are shared. |
|
AtmTopicSimilarities(java.lang.String atmbase,
boolean terms,
boolean authors,
boolean pl,
boolean js)
Construct an LdaSimilaritiesCps object with path bases and action indicators for terms and documents processing |
|
| Method Summary | |
|---|---|
org.knowceans.map.IndexRanking |
authorAuthors(int author,
boolean mutLik,
int max)
Get the most similar authors for the author. |
org.knowceans.map.IndexRanking |
docDocs(int doc,
boolean mutLik,
int max)
TODO: implement the search for documents. |
org.knowceans.map.IndexRanking |
docTerms(int doc,
boolean mutLik,
int max)
Get the most similar terms for the doc. |
org.knowceans.map.IndexRanking |
queryAuthors(double[] topics,
boolean mutLik,
int max)
Get the most similar documents for the query expressed as distribution over z. |
org.knowceans.map.IndexRanking |
queryDocs(double[] topics,
boolean mutLik,
int max)
TODO: implement the search for documents. |
org.knowceans.map.IndexRanking[] |
queryTerms(double[][] topics,
boolean mutLik,
int max)
Get the most similar terms for the queries expressed as array of distributions over z. |
org.knowceans.map.IndexRanking |
termDocs(int term,
boolean mutLik,
int max)
Get the most similar docs for the term. |
org.knowceans.map.IndexRanking |
termTerms(int term,
boolean mutLik,
int max)
Get the most similar terms for the term. |
| Methods inherited from class org.knowceans.dirichlet.lda.LdaTopicSimilarities |
|---|
bestJsMatches, bestJsMatches, bestMutLikMatches, bestMutLikMatches, getPhi, getPhiPost, getTheta, getThetaPost, jsDistance, jsDistance, klDivergence, klDivergence, mutualLikelihood, mylog, posterior, queryDocs, queryTerms |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public AtmTopicSimilarities(java.lang.String atmbase,
boolean terms,
boolean authors,
boolean pl,
boolean js)
throws java.io.IOException
atmbase - path base of lda parameter set (path + filename excluding
extensions .phi.zip and .theta.zip)terms - load term matrix (phi)authors - load document matrix (theta)pl - configure for use with predictive likelihoods (syn. mutual
likelihood, because it appears to be symmetric)js - configure for use with jenson shannon likelihood
java.io.IOException
public AtmTopicSimilarities(AtmGibbsSampler lda,
boolean terms,
boolean authors,
boolean pl,
boolean js)
lda - terms - authors - pl - js - | Method Detail |
|---|
public org.knowceans.map.IndexRanking queryAuthors(double[] topics,
boolean mutLik,
int max)
topics - distribution over z. multiple elementsmutLik - use mutual / predictive likelihood (otherwise
jensen-shannon)max - maximum number of matches
public org.knowceans.map.IndexRanking queryDocs(double[] topics,
boolean mutLik,
int max)
queryDocs in class LdaTopicSimilaritiestopics - distribution over z. multiple elementsmutLik - use mutual / predictive likelihood (otherwise
jensen-shannon)max - maximum number of matches
public org.knowceans.map.IndexRanking[] queryTerms(double[][] topics,
boolean mutLik,
int max)
queryTerms in class LdaTopicSimilaritiestopics - distribution over z.mutLik - use mutual / predictive likelihood (otherwise
jensen-shannon)max - maximum number of matches
public org.knowceans.map.IndexRanking authorAuthors(int author,
boolean mutLik,
int max)
doc - document indexmutLik - use mutual / predictive likelihood (otherwise
jensen-shannon)max - maximum number of matches
public org.knowceans.map.IndexRanking docDocs(int doc,
boolean mutLik,
int max)
docDocs in class LdaTopicSimilaritiesdoc - document indexmutLik - use mutual / predictive likelihood (otherwise
jensen-shannon)max - maximum number of matches
public org.knowceans.map.IndexRanking termTerms(int term,
boolean mutLik,
int max)
termTerms in class LdaTopicSimilaritiesdoc - document indexmutLik - use mutual / predictive likelihood (otherwise
jensen-shannon)max - maximum number of matches
public org.knowceans.map.IndexRanking docTerms(int doc,
boolean mutLik,
int max)
docTerms in class LdaTopicSimilaritiesdoc - document indexmutLik - use mutual / predictive likelihood (otherwise
jensen-shannon) *max - maximum number of matches
public org.knowceans.map.IndexRanking termDocs(int term,
boolean mutLik,
int max)
termDocs in class LdaTopicSimilaritiesterm - term indexmutLik - use mutual / predictive likelihood (otherwise
jensen-shannon) *max - maximum number of matches
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||