The UCS toolkit. |
The UCS toolkit is a collection of libraries and scripts for the statistical analysis of cooccurrence data. Data sets – each one containing a list of word pairs together with their joint and marginal frequencies – are stored in a tabular format in plain (compressed) text files. They can be viewed, printed, manipulated in various ways, annotated with association scores from a wide range of built-in measures, ranked, and sorted with the UCS/Perl system. Additional functionality for the graphical evaluation of association measures in a collocation extraction task (cf. Evert & Krenn, 2001) is provided by the UCS/R system.
Download UCS version 0.5 (pre-release) (UCS-0.5-prerelease.tar.gz, 1.9M) - What's new?
On-line documentation:
UCS/Perl documentation -
UCS/R documentation
On-line tutorials:
UCS/Perl tutorial -
UCS/R tutorial -
Viktor Trón's UCS quickstart
(a one-minute guide for programmers)
Requirements
Expect
Pod::Perldoc (Perl versions prior to 5.8.1)
Tk::Pod (recommended)
Term::ReadKey (recommended)
a2ps (recommended)
NB: Future releases of the UCS toolkit are expected to require Perl version 5.8.0 or newer (for Unicode support) and may also require R version 1.9.0 or newer.
Supported and tested platforms
Copyright © 2004-2006 by Stefan Evert.
Footnote: The UCS toolkit has been designed for scientific research on the properties of statistical association measures and the relation between cooccurrences and collocations. In my terminology, this involves a close look at the data and a thorough understanding of the theoretical and methodological background. Flexibility is more important than either frills or speed. Therefore, the UCS system is not intended as a number cruncher that extracts and processes cooccurrences from several hundred million words of text in a few minutes. Nor is it a black box that accepts text files from a word processor and produces a list of collocation candidates at the push of a button.
Archive: UCS-0.4.tar.gz (1.7M) - UCS-0.3.2.tar.gz (1.6M) - UCS-0.3.1.tar.gz (465k) - UCS-0.3.tar.gz (463k) - UCS-0.2.tar.gz (440k)