|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.knowceans.corpus.TermCorpus org.knowceans.corpus.AmqCorpus
public class AmqCorpus
ActorMediaCorpus implements an AMQ corpus, i.e., a document corpus with added functionality for authors and queriers. Cf. AuthorTermCorpus
Field Summary | |
---|---|
protected java.util.Vector<java.lang.String> |
allActors
all authors |
protected java.util.Vector<java.util.Vector<java.lang.Integer>> |
mediaActors
each document's authors |
protected java.util.Vector<java.lang.String> |
mediaComments
|
private int[] |
mediaRelationCounts
number of instances of each relation type |
protected java.util.Vector<java.lang.Integer> |
mediaRelations
assigns a relation to each medium. |
private int |
nactors
Number of actors. |
static int |
REL_AUTHOR
an authorship relation |
static int |
REL_QUERY
a query relation |
static int |
REL_RECOMMEND
a recommendation relation |
Fields inherited from class org.knowceans.corpus.TermCorpus |
---|
cats, curDoc, docCategories, docFreqs, docNames, docTerms, docTermsFiltered, ignoreFiltered, maxId, minDf, minDl, minTf, ndocs, nterms, ntermsTotal, nwords, OFFSET, progress, termFreqs, termIndex |
Constructor Summary | |
---|---|
AmqCorpus()
|
|
AmqCorpus(ICategories cats)
|
|
AmqCorpus(int mindf,
int mintf)
Initialiser with size filters. |
|
AmqCorpus(java.lang.String fileroot,
boolean readUnique)
create an actor-media corpus from files, which means for the corpus root name, all files are read: *.vocab, *.docs, *.actors, *.corpus. |
|
AmqCorpus(java.lang.String fileroot,
boolean readLowFreq,
ICategories cats)
create an actor-media corpus from files, which means for the corpus root name, all files are read: *.vocab, *.docs, *.actors, *.corpus. |
Method Summary | |
---|---|
void |
finaliseDocument(java.lang.String key,
java.util.Vector<java.lang.Integer> categories,
java.util.Vector<java.lang.String> authors,
int relation,
java.lang.String comment)
finalises the current document with a name (useful to identify documents), its categories (leave null if unused) and authors (leave null if unused). |
int[] |
getActorDocs(int author)
Get the documents related to the actor. |
java.util.Vector<java.lang.String> |
getActors()
|
int[] |
getDocActors(int doc)
Get the actors for the document |
int |
getMaxRelationIndex()
the highest relation index |
int[][] |
getMediaActors()
|
java.util.Vector<java.util.Vector<java.lang.Integer>> |
getMediaActorsVector()
|
int[] |
getMediaRelationCounts()
|
int[] |
getMediaRelations()
|
java.util.Vector<java.lang.Integer> |
getMediaRelationsVector()
|
int |
getNactors()
|
boolean |
hasRelations()
Check whether the corpus has actors and relation data. |
private java.util.Vector<java.lang.Integer> |
identifyActors(java.util.Vector<java.lang.String> actors)
Check whether actors are already in the list and assign the number, otherwise add to actors list. |
java.lang.String |
lookupActor(int id)
look up term for id. |
void |
readActorList(java.lang.String file)
read actor information from a file with format name, |
void |
readDocList(java.lang.String file)
reads the document list. |
(package private) void |
readQueryList(java.lang.String file)
read query information from a file with format name : \n query1 \n query2 etc., i.e., an actor followed by a list of query lines (which must not contain the " :" string). |
void |
readRelations(java.lang.String file)
As an alternative to reading relations from the documents list, this version works with a separate file with line format: rel_id : (actor_id)+, where the line number - 1 is the 0-based document index. |
void |
setActorList(java.util.Vector<java.lang.String> actors)
|
void |
writeActorList(java.lang.String file)
write the author list in a file with format id = lastname firstinitials : id (on each line) in alphabetical order. |
void |
writeDocList(java.lang.String file)
write the author list in a file with format id = firstname(s) ; lastname ; group (on each line). |
Methods inherited from class org.knowceans.corpus.TermCorpus |
---|
add, docCategoriesToString, docToString, finaliseDocument, getDocCategories, getDocNames, getDocTerms, getDocTerms, getDocTermsFiltered, getDocTermsFiltered, getDocWords, getNdocs, getNterms, getNtermsFiltered, getNwords, getNwordsFiltered, getTermIndex, lookup, lookup, lookupDoc, lookupDoc, parseQuery, readCorpus, readVocabulary, reorderCorpus, reorderCorpus0, writeCorpus, writeVocabulary |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int REL_AUTHOR
public static final int REL_QUERY
public static final int REL_RECOMMEND
protected java.util.Vector<java.util.Vector<java.lang.Integer>> mediaActors
protected java.util.Vector<java.lang.Integer> mediaRelations
protected java.util.Vector<java.lang.String> allActors
protected java.util.Vector<java.lang.String> mediaComments
private int[] mediaRelationCounts
private int nactors
Constructor Detail |
---|
public AmqCorpus()
public AmqCorpus(ICategories cats)
cats
- public AmqCorpus(java.lang.String fileroot, boolean readLowFreq, ICategories cats)
fileroot
- the root name of all files to be read into the corpusreadLowFreq
- public AmqCorpus(int mindf, int mintf)
minDf
- use minimum document frequency when reorderingmintf
- use minimum term frequency when reorderingpublic AmqCorpus(java.lang.String fileroot, boolean readUnique)
fileroot
- the root name of all files to be read into the corpusreadUnique
- Method Detail |
---|
public void finaliseDocument(java.lang.String key, java.util.Vector<java.lang.Integer> categories, java.util.Vector<java.lang.String> authors, int relation, java.lang.String comment)
private java.util.Vector<java.lang.Integer> identifyActors(java.util.Vector<java.lang.String> actors)
actors
-
public void setActorList(java.util.Vector<java.lang.String> actors)
public void writeActorList(java.lang.String file) throws java.io.IOException
file
-
java.io.IOException
public void readActorList(java.lang.String file) throws java.io.IOException
file
-
java.io.IOException
java.lang.NumberFormatException
public void readRelations(java.lang.String file)
This deletes all existing actor and relation information
file
- void readQueryList(java.lang.String file) throws java.io.IOException
readQueryList will ignore unknown terms and actors. Therefore this method MUST be called as the last loader method (i.e., after vocabulary, actors and corpus have been loaded). Queries further are restricted to non-unique terms (can be changed in the code).
file
-
java.io.IOException
java.lang.NumberFormatException
public void writeDocList(java.lang.String file) throws java.io.IOException
writeDocList
in class TermCorpus
file
-
java.io.IOException
public void readDocList(java.lang.String file) throws java.io.IOException
readDocList
in class TermCorpus
file
-
java.io.IOException
java.lang.NumberFormatException
public java.lang.String lookupActor(int id)
term
-
public int getNactors()
public int[] getMediaRelationCounts()
public java.util.Vector<java.util.Vector<java.lang.Integer>> getMediaActorsVector()
public int[][] getMediaActors()
public java.util.Vector<java.lang.String> getActors()
public java.util.Vector<java.lang.Integer> getMediaRelationsVector()
public int[] getMediaRelations()
public int getMaxRelationIndex()
public boolean hasRelations()
public int[] getActorDocs(int author)
author
-
public int[] getDocActors(int doc)
doc
-
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |