n  n    s
  w  t    u
  l  w    r
  e  o    c
  d  r    e
  g  k    w
  e       a
gregor :: arbylon . net

knowceans.org hosts resources for "knowledge networks".

A knowledge network is defined as a structure consisting of the content and the people within a "virtual" community, along with their different interrelations. The current focus is on probabilistic topic extraction from text to determine relationships among and between items and actors, including semantic similarity, expertise, interest, recommendation as well as interest/expertise matching and extraction of community-specific ontologies.

On this site, you currently find some general-purpose tools, cf. the sourceforge project knowceans with CVS resources. Contact me for feedback, more special solutions and cooperation.

[ consider this under construction :) ]

experimental software

Actor-media embedded search is a method to explore knowledge structures.

  • Freshmind, a knowledge network visualisation and editing tool, navigation is done by a combination of searching and browsing. Force-directed layout and view-layer data structure based on Touchgraph. Retrieval is based on Lucene, and graph structuring / similarity-based search uses a model similar to Latent Dirichlet Allocation (see below). ,
Latent Dirichlet Allocation is a powerful probabilistic method for topic extraction from text data.
Markov Clustering implements the MCL algorithm in Java and Matlab.
Statistics base classes are helpers to implement sampling-based algorithms in Java.
  • arms-java(version 20060516), provides a Java port of the adaptive rejection Metropolis sampler (ARMS), which can sample from virtually any univariate distribution.
  • Samplers and densities / likelihood functions of various probability distributions as well as a Java port of the Mersenne Twister random generator can be found in the package knowceans-tools.jar (see below).
Java helpers came into existence when I tried to shortcut re-occurring tasks.
  • NEW: knowceans-tools (version 20090727), many Java helper classes I frequently use: command line parser, runtime stop watch, perl-like regular expression usage (reduces Java coding), special invertible, regex and many-to-many implementations of the Map interface, data output formatters specialised to commandline output (like histograms and dot-encoded numbers) and many more.
  • knowceans-corpus, a text corpus extraction toolset.
  • BibFileMod is a Java class that takes a LaTeX document, collects from a list of BibTeX database files all references cited in the document and writes the result into a new BibTeX database. (This seemed more useful for organising my documents than bibtool from CTAN etc.)

optimised for firefox.

(c) 2004-6 arbylon.net