AmqCorpus

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.knowceans.corpus
Class AmqCorpus

java.lang.Object
  org.knowceans.corpus.TermCorpus
      org.knowceans.corpus.AmqCorpus

All Implemented Interfaces:: IRandomAccessTermCorpus, IRandomAccessTermCorpusFiltered, ITermCorpus, ITermCorpusFiltered

public class AmqCorpus
extends TermCorpus
extends TermCorpus

ActorMediaCorpus implements an AMQ corpus, i.e., a document corpus with added functionality for authors and queriers. Cf. AuthorTermCorpus

Author:: heinrich

Field Summary
`protected java.util.Vector<java.lang.String>`	`allActors` all authors
`protected java.util.Vector<java.util.Vector<java.lang.Integer>>`	`mediaActors` each document's authors
`protected java.util.Vector<java.lang.String>`	`mediaComments`
`private int[]`	`mediaRelationCounts` number of instances of each relation type
`protected java.util.Vector<java.lang.Integer>`	`mediaRelations` assigns a relation to each medium.
`private int`	`nactors` Number of actors.
`static int`	`REL_AUTHOR` an authorship relation
`static int`	`REL_QUERY` a query relation
`static int`	`REL_RECOMMEND` a recommendation relation

Fields inherited from class org.knowceans.corpus.TermCorpus
`cats, curDoc, docCategories, docFreqs, docNames, docTerms, docTermsFiltered, ignoreFiltered, maxId, minDf, minDl, minTf, ndocs, nterms, ntermsTotal, nwords, OFFSET, progress, termFreqs, termIndex`

Constructor Summary
`AmqCorpus()`
`AmqCorpus(ICategories cats)`
`AmqCorpus(int mindf, int mintf)` Initialiser with size filters.
`AmqCorpus(java.lang.String fileroot, boolean readUnique)` create an actor-media corpus from files, which means for the corpus root name, all files are read: .vocab, .docs, .actors, .corpus.
`AmqCorpus(java.lang.String fileroot, boolean readLowFreq, ICategories cats)` create an actor-media corpus from files, which means for the corpus root name, all files are read: .vocab, .docs, .actors, .corpus.

Method Summary
`void`	`finaliseDocument(java.lang.String key, java.util.Vector<java.lang.Integer> categories, java.util.Vector<java.lang.String> authors, int relation, java.lang.String comment)` finalises the current document with a name (useful to identify documents), its categories (leave null if unused) and authors (leave null if unused).
`int[]`	`getActorDocs(int author)` Get the documents related to the actor.
`java.util.Vector<java.lang.String>`	`getActors()`
`int[]`	`getDocActors(int doc)` Get the actors for the document
`int`	`getMaxRelationIndex()` the highest relation index
`int[][]`	`getMediaActors()`
`java.util.Vector<java.util.Vector<java.lang.Integer>>`	`getMediaActorsVector()`
`int[]`	`getMediaRelationCounts()`
`int[]`	`getMediaRelations()`
`java.util.Vector<java.lang.Integer>`	`getMediaRelationsVector()`
`int`	`getNactors()`
`boolean`	`hasRelations()` Check whether the corpus has actors and relation data.
`private java.util.Vector<java.lang.Integer>`	`identifyActors(java.util.Vector<java.lang.String> actors)` Check whether actors are already in the list and assign the number, otherwise add to actors list.
`java.lang.String`	`lookupActor(int id)` look up term for id.
`void`	`readActorList(java.lang.String file)` read actor information from a file with format name,
`void`	`readDocList(java.lang.String file)` reads the document list.
`(package private) void`	`readQueryList(java.lang.String file)` read query information from a file with format name : \n query1 \n query2 etc., i.e., an actor followed by a list of query lines (which must not contain the " :" string).
`void`	`readRelations(java.lang.String file)` As an alternative to reading relations from the documents list, this version works with a separate file with line format: rel_id : (actor_id)+, where the line number - 1 is the 0-based document index.
`void`	`setActorList(java.util.Vector<java.lang.String> actors)`
`void`	`writeActorList(java.lang.String file)` write the author list in a file with format id = lastname firstinitials : id (on each line) in alphabetical order.
`void`	`writeDocList(java.lang.String file)` write the author list in a file with format id = firstname(s) ; lastname ; group (on each line).

Methods inherited from class org.knowceans.corpus.TermCorpus
`add, docCategoriesToString, docToString, finaliseDocument, getDocCategories, getDocNames, getDocTerms, getDocTerms, getDocTermsFiltered, getDocTermsFiltered, getDocWords, getNdocs, getNterms, getNtermsFiltered, getNwords, getNwordsFiltered, getTermIndex, lookup, lookup, lookupDoc, lookupDoc, parseQuery, readCorpus, readVocabulary, reorderCorpus, reorderCorpus0, writeCorpus, writeVocabulary`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

REL_AUTHOR

public static final int REL_AUTHOR

an authorship relation

See Also:: Constant Field Values

REL_QUERY

public static final int REL_QUERY

a query relation

See Also:: Constant Field Values

REL_RECOMMEND

public static final int REL_RECOMMEND

a recommendation relation

See Also:: Constant Field Values

mediaActors

protected java.util.Vector<java.util.Vector<java.lang.Integer>> mediaActors

each document's authors

mediaRelations

protected java.util.Vector<java.lang.Integer> mediaRelations

assigns a relation to each medium. TODO: media with several relations (to several actors, consider the case of recommendation).

allActors

protected java.util.Vector<java.lang.String> allActors

all authors

mediaComments

protected java.util.Vector<java.lang.String> mediaComments

mediaRelationCounts

private int[] mediaRelationCounts

number of instances of each relation type

nactors

private int nactors

Number of actors.

Constructor Detail

AmqCorpus

public AmqCorpus()

AmqCorpus

public AmqCorpus(ICategories cats)

Parameters:: cats -

AmqCorpus

public AmqCorpus(java.lang.String fileroot,
                 boolean readLowFreq,
                 ICategories cats)

create an actor-media corpus from files, which means for the corpus root name, all files are read: *.vocab, *.docs, *.actors, *.corpus. If the readUnique flag is set, the *.vocab2 and *.corpus2 files are read as well, i.e., unique terms are included.

Parameters:: fileroot - the root name of all files to be read into the corpus; readLowFreq -

AmqCorpus

public AmqCorpus(int mindf,
                 int mintf)

Initialiser with size filters.

Parameters:: minDf - use minimum document frequency when reordering; mintf - use minimum term frequency when reordering

AmqCorpus

public AmqCorpus(java.lang.String fileroot,
                 boolean readUnique)

Parameters:: fileroot - the root name of all files to be read into the corpus; readUnique -

Method Detail

finaliseDocument

public void finaliseDocument(java.lang.String key,
                             java.util.Vector<java.lang.Integer> categories,
                             java.util.Vector<java.lang.String> authors,
                             int relation,
                             java.lang.String comment)

finalises the current document with a name (useful to identify documents), its categories (leave null if unused) and authors (leave null if unused).

identifyActors

private java.util.Vector<java.lang.Integer> identifyActors(java.util.Vector<java.lang.String> actors)

Check whether actors are already in the list and assign the number, otherwise add to actors list.

Parameters:: actors -
Returns:

setActorList

public void setActorList(java.util.Vector<java.lang.String> actors)

writeActorList

public void writeActorList(java.lang.String file)
                    throws java.io.IOException

write the author list in a file with format id = lastname firstinitials : id (on each line) in alphabetical order.

Parameters:: file -
Throws:: java.io.IOException

readActorList

public void readActorList(java.lang.String file)
                   throws java.io.IOException

read actor information from a file with format name,

Parameters:: file -
Throws:: java.io.IOException; java.lang.NumberFormatException

readRelations

public void readRelations(java.lang.String file)

As an alternative to reading relations from the documents list, this version works with a separate file with line format: rel_id : (actor_id)+, where the line number - 1 is the 0-based document index.

This deletes all existing actor and relation information

Parameters:: file -

readQueryList

void readQueryList(java.lang.String file)
             throws java.io.IOException

read query information from a file with format name : \n query1 \n query2 etc., i.e., an actor followed by a list of query lines (which must not contain the " :" string).

readQueryList will ignore unknown terms and actors. Therefore this method MUST be called as the last loader method (i.e., after vocabulary, actors and corpus have been loaded). Queries further are restricted to non-unique terms (can be changed in the code).

Parameters:: file -
Throws:: java.io.IOException; java.lang.NumberFormatException

writeDocList

public void writeDocList(java.lang.String file)
                  throws java.io.IOException

write the author list in a file with format id = firstname(s) ; lastname ; group (on each line).

Overrides:: writeDocList in class TermCorpus

Parameters:: file -
Throws:: java.io.IOException

readDocList

public void readDocList(java.lang.String file)
                 throws java.io.IOException

reads the document list. Format for author -- topic corpus is: docname : categories : authors : relation # comment

Overrides:: readDocList in class TermCorpus

Parameters:: file -
Throws:: java.io.IOException; java.lang.NumberFormatException

lookupActor

public java.lang.String lookupActor(int id)

look up term for id.

Parameters:: term -
Returns:: term string or null if unknown.

getNactors

public int getNactors()

Returns:

getMediaRelationCounts

public int[] getMediaRelationCounts()

Returns:

getMediaActorsVector

public java.util.Vector<java.util.Vector<java.lang.Integer>> getMediaActorsVector()

Returns:

getMediaActors

public int[][] getMediaActors()

Returns:

getActors

public java.util.Vector<java.lang.String> getActors()

Returns:

getMediaRelationsVector

public java.util.Vector<java.lang.Integer> getMediaRelationsVector()

Returns:

getMediaRelations

public int[] getMediaRelations()

Returns:

getMaxRelationIndex

public int getMaxRelationIndex()

the highest relation index

Returns:

hasRelations

public boolean hasRelations()

Check whether the corpus has actors and relation data. This can be used to check if the .docs file had sufficient data and load relations using

Returns:

getActorDocs

public int[] getActorDocs(int author)

Get the documents related to the actor.

Parameters:: author -
Returns:: int[document] // TODO: int[document][relation]

getDocActors

public int[] getDocActors(int doc)

Get the actors for the document

Parameters:: doc -
Returns:

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.knowceans.corpus Class AmqCorpus

REL_AUTHOR

REL_QUERY

REL_RECOMMEND

mediaActors

mediaRelations

allActors

mediaComments

mediaRelationCounts

nactors

AmqCorpus

AmqCorpus

AmqCorpus

AmqCorpus

AmqCorpus

finaliseDocument

identifyActors

setActorList

writeActorList

readActorList

readRelations

readQueryList

writeDocList

readDocList

lookupActor

getNactors

getMediaRelationCounts

getMediaActorsVector

getMediaActors

getActors

getMediaRelationsVector

getMediaRelations

getMaxRelationIndex

hasRelations

getActorDocs

getDocActors

org.knowceans.corpus
Class AmqCorpus