org.knowceans.corpus.util
Class CorpusIo

java.lang.Object
  extended by org.knowceans.util.ArrayIo
      extended by org.knowceans.corpus.util.CorpusIo

public class CorpusIo
extends org.knowceans.util.ArrayIo

InputOutput reads and outputs binary matrices

Author:
heinrich

Field Summary
(package private) static java.lang.String shades
           
 
Constructor Summary
CorpusIo()
           
 
Method Summary
static char charDouble(double d, double max)
          create a string representation whose grey value appears as an indicator of magnitude, using one digit per element.
private static double[][] read(java.lang.String filename)
           
(package private) static java.util.Vector<java.lang.String> readActorsList(java.lang.String file)
          reads the actors list from a file with format name : id (on each line)
(package private) static java.util.Vector<java.lang.String> readDocList(java.lang.String file)
          reads the document names from a file
static java.util.Vector<java.lang.String> readList(java.lang.String filename)
          Read a list from the associated file name, depending on the type of file, the format is interpreted.
(package private) static java.util.HashMap<java.lang.Integer,java.lang.String> readVocabulary(java.lang.String file)
          reads the vocabulary from a file with format id = termstring (on each line)
static void saveReadable(java.lang.String gibbsmodel)
           
static void saveShades(java.lang.String filename, double[][] a, double max, java.lang.String comment, boolean transposed, java.lang.String additional)
          save the matrix as a "shade" file, with optional additional information on the non-topics (that should be along rows).
 
Methods inherited from class org.knowceans.util.ArrayIo
closeInputStream, closeOutputStream, formatDouble, loadBinaryMatrix, openInputStream, openOutputStream, padSpace, readAscii, readDoubleMatrix, readDoubleVector, readFloatMatrix, readFloatVector, readIntMatrix, readIntVector, saveAscii, saveBinaryMatrix, writeDoubleMatrix, writeDoubleVector, writeFloatMatrix, writeFloatVector, writeIntMatrix, writeIntVector
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

shades

static java.lang.String shades
Constructor Detail

CorpusIo

public CorpusIo()
Method Detail

saveReadable

public static void saveReadable(java.lang.String gibbsmodel)

read

private static double[][] read(java.lang.String filename)
Parameters:
string -

charDouble

public static char charDouble(double d,
                              double max)
create a string representation whose grey value appears as an indicator of magnitude, using one digit per element.

Parameters:
d - value
max - maximum value
Returns:

saveShades

public static void saveShades(java.lang.String filename,
                              double[][] a,
                              double max,
                              java.lang.String comment,
                              boolean transposed,
                              java.lang.String additional)
save the matrix as a "shade" file, with optional additional information on the non-topics (that should be along rows). The method currently can save theta (with a ".doc" file in the additional argument) and phi (with the isPhi flag set and a ".vocab" file in the additional argument).

Parameters:
filename - target file name
a - matrix
max - normalisation
additional - information about non-topics (put to row starts)

readList

public static java.util.Vector<java.lang.String> readList(java.lang.String filename)
Read a list from the associated file name, depending on the type of file, the format is interpreted.

Parameters:
filename -
Returns:

readDocList

static java.util.Vector<java.lang.String> readDocList(java.lang.String file)
                                               throws java.io.IOException
reads the document names from a file

Throws:
java.io.IOException

readActorsList

static java.util.Vector<java.lang.String> readActorsList(java.lang.String file)
                                                  throws java.io.IOException
reads the actors list from a file with format name : id (on each line)

Throws:
java.io.IOException

readVocabulary

static java.util.HashMap<java.lang.Integer,java.lang.String> readVocabulary(java.lang.String file)
                                                                     throws java.lang.NumberFormatException,
                                                                            java.io.IOException
reads the vocabulary from a file with format id = termstring (on each line)

Throws:
java.lang.NumberFormatException
java.io.IOException