arbylon projects: knowceans

k n n s knowceans.org w t u l w r e o c d r e g k w e a r e gregor :: arbylon . net
knowceans.org hosts resources for "knowledge networks". A knowledge network is defined as a structure consisting of the content and the people within a "virtual" community, along with their different interrelations. The current focus is on probabilistic topic extraction from text to determine relationships among and between items and actors, including semantic similarity, expertise, interest, recommendation as well as interest/expertise matching and extraction of community-specific ontologies. On this site, you currently find some general-purpose tools, cf. the sourceforge project knowceans with CVS resources. Contact me for feedback, more special solutions and cooperation. [ consider this under construction :) ]
experimental software
Actor-media embedded search is a method to explore knowledge structures.	Freshmind, a knowledge network visualisation and editing tool, navigation is done by a combination of searching and browsing. Force-directed layout and view-layer data structure based on Touchgraph. Retrieval is based on Lucene, and graph structuring / similarity-based search uses a model similar to Latent Dirichlet Allocation (see below). See Touchgraph, See Lucene. ,
Latent Dirichlet Allocation is a powerful probabilistic method for topic extraction from text data.	lda-j (version 20050325) is a Java 1.5 port of David Blei's lda-c. See the javadoc See the C-implementation lda-c LdaGibbsSampler.java, a working "hack" of the MCMC algorithm for LDA in one Java class. See primer on parameter estimation for text lda.odc, a WinBUGS script to run LDA and an author-topic model with Gibbs sampling. See WinBUGS.
Markov Clustering implements the MCL algorithm in Java and Matlab.	knowceans-mcl(version 20060805), provides a Java implementation of Markov graph clustering (MCL), which finds hard clusters in a graph. See the javadoc. See Stijn van Dongens (2000) PhD thesis. See faster (but much more complex) C implementation.
Statistics base classes are helpers to implement sampling-based algorithms in Java.	arms-java(version 20060516), provides a Java port of the adaptive rejection Metropolis sampler (ARMS), which can sample from virtually any univariate distribution. See the javadoc. See the original C/fortran implementation by Wally Gilks. See the cvs on sourceforge project knowceans. Samplers and densities / likelihood functions of various probability distributions as well as a Java port of the Mersenne Twister random generator can be found in the package knowceans-tools.jar (see below).
Java helpers came into existence when I tried to shortcut re-occurring tasks.	NEW: knowceans-tools (version 20090727), many Java helper classes I frequently use: command line parser, runtime stop watch, perl-like regular expression usage (reduces Java coding), special invertible, regex and many-to-many implementations of the Map interface, data output formatters specialised to commandline output (like histograms and dot-encoded numbers) and many more. See the javadoc. See the cvs on sourceforge project knowceans. knowceans-corpus, a text corpus extraction toolset. snapshot javadoc snapshot knowceans-corpus-base (version 20060805) (these are base classes, on which the text mining algorithms in knowceans-corpus-dirichlet (not uploaded yet) depend. BibFileMod is a Java class that takes a LaTeX document, collects from a list of BibTeX database files all references cited in the document and writes the result into a new BibTeX database. (This seemed more useful for organising my documents than bibtool from CTAN etc.)
optimised for firefox.	(c) 2004-6 arbylon.net

k n n s knowceans.org w t u l w r e o c d r e g k w e a r e gregor :: arbylon . net

experimental software

k
n n s
knowceans.org
w t u
l w r
e o c
d r e
g k w
e a
r
e

gregor :: arbylon . net