|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.knowceans.corpus.NumCorpus
org.knowceans.corpus.LabelNumCorpus
public class LabelNumCorpus
Represents a corpus of documents, using numerical data only.
| Field Summary | |
|---|---|
static java.lang.String[] |
EXTENSIONS
|
| Fields inherited from interface org.knowceans.corpus.ILabelCorpus |
|---|
LAUTHORS, LCATEGORIES, LDOCS, LREFERENCES, LTAGS, LTERMS, LVOLS, LYEARS |
| Constructor Summary | |
|---|---|
LabelNumCorpus()
|
|
LabelNumCorpus(NumCorpus corp)
create label corpus from standard one |
|
LabelNumCorpus(java.lang.String dataFilebase)
|
|
LabelNumCorpus(java.lang.String dataFilebase,
boolean parmode)
|
|
LabelNumCorpus(java.lang.String dataFilebase,
int readlimit,
boolean parmode)
|
|
| Method Summary | |
|---|---|
int[][] |
getDocLabels(int kind)
loads and returns the document labels of given kind |
int |
getLabelsMaxN(int kind)
return the maximum number of labels in any document |
int |
getLabelsV(int kind)
get the number of distinct labels in the label field |
int |
getLabelsW(int kind)
get the number of tokens in the label field |
static void |
main(java.lang.String[] args)
test corpus reading and splitting |
void |
split(int order,
int split,
java.util.Random rand)
splits two child corpora of size 1/nsplit off the original corpus, which itself is left unchanged (except storing the splits). |
void |
write(java.lang.String pathbase)
write the corpus to to a file. |
| Methods inherited from class org.knowceans.corpus.NumCorpus |
|---|
getDoc, getDocParBounds, getDocs, getDocTermsFreqs, getDocWordParBounds, getDocWords, getDocWords, getNumDocs, getNumTerms, getNumTerms, getNumWords, getNumWords, getOrigDocIds, getTestCorpus, getTrainCorpus, mergeDocPars, read, reduce, setDoc, setDocs, toString |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface org.knowceans.corpus.ICorpus |
|---|
getDocWords, getDocWords, getNumDocs, getNumTerms, getNumWords |
| Field Detail |
|---|
public static final java.lang.String[] EXTENSIONS
| Constructor Detail |
|---|
public LabelNumCorpus()
public LabelNumCorpus(java.lang.String dataFilebase)
dataFilebase - (filename without extension)
public LabelNumCorpus(java.lang.String dataFilebase,
boolean parmode)
dataFilebase - (filename without extension)parmode - if true read paragraph corpus
public LabelNumCorpus(java.lang.String dataFilebase,
int readlimit,
boolean parmode)
dataFilebase - (filename without extension)readlimit - number of docs to reduce corpus when reading (-1 = unlimited)parmode - if true read paragraph corpuspublic LabelNumCorpus(NumCorpus corp)
corp - | Method Detail |
|---|
public int[][] getDocLabels(int kind)
getDocLabels in interface ILabelCorpuskind - of labels
public int getLabelsMaxN(int kind)
kind -
public int getLabelsW(int kind)
ILabelCorpus
getLabelsW in interface ILabelCorpuspublic int getLabelsV(int kind)
ILabelCorpus
getLabelsV in interface ILabelCorpus
public void split(int order,
int split,
java.util.Random rand)
NumCorpus
split in interface ISplitCorpussplit in class NumCorpusorder - number of partitionssplit - 0-based split of corpus returnedrand - random source (null for reusing existing splits)
public void write(java.lang.String pathbase)
throws java.io.IOException
NumCorpus
write in class NumCorpusjava.io.IOExceptionpublic static void main(java.lang.String[] args)
args -
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||