org.knowceans.corpus
Interface ITermCorpus

All Known Subinterfaces:
IRandomAccessTermCorpus
All Known Implementing Classes:
AmqCorpus, LuceneCorpus, LuceneMapCorpus, TermCorpus

public interface ITermCorpus

ITermCorpus is the interface of a term corpus that provides lookup functionality for terms and documents (from id to matrix index) as well as access to term and word vectors of the corpus.

Author:
gregor

Method Summary
 java.util.Map<java.lang.Integer,java.lang.Integer> getDocTerms(int doc)
          Get the document terms as a frequency map id->frequency.
 int[][] getDocWords(java.util.Random rand)
          Get word vectors for corpus.
 int getNdocs()
          Number of documents in corpus
 int getNterms()
          Number of terms in corpus
 java.lang.String lookup(int term)
          look up term for id.
 int lookup(java.lang.String term)
          look up id for term
 java.lang.String lookupDoc(int doc)
          Get document name from id.
 int lookupDoc(java.lang.String doc)
          Get document id from name.
 

Method Detail

lookup

java.lang.String lookup(int term)
look up term for id.

Parameters:
term -
Returns:
term string or null if unknown.

lookup

int lookup(java.lang.String term)
look up id for term

Parameters:
term -
Returns:
term id or -1 if unknown.

lookupDoc

java.lang.String lookupDoc(int doc)
Get document name from id.

Returns:

lookupDoc

int lookupDoc(java.lang.String doc)
Get document id from name.

Returns:

getDocTerms

java.util.Map<java.lang.Integer,java.lang.Integer> getDocTerms(int doc)
Get the document terms as a frequency map id->frequency.

Returns:

getNdocs

int getNdocs()
Number of documents in corpus

Returns:

getNterms

int getNterms()
Number of terms in corpus

Returns:

getDocWords

int[][] getDocWords(java.util.Random rand)
Get word vectors for corpus. (To have this in the interface might be a bit constraining)

Parameters:
rand -
Returns: