UCS {UCS} | R Documentation |
UCS/R consists of a set of R libraries related to the visualisation of cooccurrence data and the evaluation of association measures. The current functionaliy includes: evaluation graphs for association measures (in terms of precision and recall), measures for inter-annotator agreement, and two population models for word frequency distributions.
source("/path/to/UCS/System/R/lib/ucs.R") ucs.library()
UCS/R is initialised by source
ing the file
‘ucs.R’ in the ‘lib/’ subdirectory of the UCS/R
directory tree. This will make the UCS/R documentation
available in the R process and provide the ucs.library
command, which is used to load individual UCS/R modules.
Enter ucs.library()
now to display a list of available modules
(see the ucs.library
manpage for details).
Currently, the following modules are available. The listing below also indicates the most important manpages for each module. Throughout the documentation, it is assumed that you are familiar with the UCS/Perl naming conventions and data set file format.
sfunc:
Special Mathematical Functions
Convenience interfaces to the Gamma function (Cgamma
),
the incomplete (and regularized) Gamma function and its inverse
(Igamma
, Rgamma
), the Beta function
(Cbeta
), the incomplete (and regularized) Beta
function and its inverse (Ibeta
, Rbeta
),
and binomial confidence intervals
(binom.conf.interval
).
All these functions are computed from the pgamma
and
pbeta
distributions (and the corresponding quantile
functions) in the standard library of R.
base
: Basic Functions for Loading and Managing
UCS data sets
This module provides functions for loading UCS data set
files (read.ds.gz
), listing annotated association
measures (ds.find.am
, am.key2var
),
ranking by association scores (order.by.am
,
add.ranks
), and computing
precision/recall tables for the evaluation of association measures
(precision.recall
).
The module also includes a listing of all built-in association
measures in the UCS/Perl system, including add-on
packages (builtin.ams
).
plots
: Evaluation Graphs for Association Measures
This module plots precision-, recall-, and precision-by-recall
graphs for the empirical evaluation of association measures (all
combined in a single function, evaluation.plot
).
The graphs are highly configurable, either locally in each function
call or by setting global default (ucs.par
).
The evaluation.plot
function supports confidence
intervals, significance tests for result differences, and evaluation
based on random samples (see Evert, 2004, Ch. 5).
iaa
: Measures of Inter-Annotator Agreement
Computes Cohen's kappa statistic with standard deviation (Fleiss,
Cohen & Everitt, 1969) or confidence interval for proportion of
truee agreement (Krenn, Evert & Zinsmeister, 2004) from a
2-by-2 contingency table (see
iaa.kappa
and iaa.pta
)
lexstats
: Interface to the lexstats
Software
These are the beginnings of a rudimentary interface to the
lexstats
software for the analysis of word frequency
distributions (Baayen, 2001). Currently, only the
read.spectrum
and spectrum.plot
functions are useful.
zm
: The Zipf-Mandelbrot (ZM) Population Model
This module implements a simple population model for word frequency
distributions (Baayen, 2001) based on the Zipf-Mandelbrot law. See
(Evert, 2004a) for details. Relevant help pages are
zm
, EV
, EVm
,
write.lexstats
, and lnre.goodness.of.fit
.
fzm
: The Finite Zipf-Mandelbrot (fZM)
Population Model
This module implements the finite Zipf-Mandelbrot model, an
extension of the ZM model (Evert, 2004a). Relevant help pages are
fzm
, EV
, EVm
,
write.lexstats
, and lnre.goodness.of.fit
.
The command help(package=UCS)
will give you a full index of
available UCS/R help pages. Use help.search()
for
full-text search.
The correct source
path for the file ‘ucs.R’ can be set
automatically with the UCS/R tool ucs-config
. Simply
insert the statement
source("ucs.R")on a separate line in your R script file (say, ‘my-script.R’) and run the shell command
ucs-config my-script.R
Baayen, R. Harald (2001). Word Frequency Distributions. Kluwer, Dordrecht.
Evert, Stefan (2004). The Statistics of Word Cooccurrences: Word Pairs and Collocations. PhD Thesis, IMS, University of Stuttgart.
Evert, Stefan (2004a). A simple LNRE model for random character sequences. In Proceedings of JADT 2004, Louvain-la-Neuve, Belgium, pages 411–422.
Fleiss, Joseph L.; Cohen, Jacob; Everitt, B. S. (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72(5), 323–327.
Krenn, Brigitte; Evert, Stefan; Zinsmeister, Heike (2004). Determining intercoder agreement for a collocation identification task. In preparation.
ucs.library
, the UCS/R tutorial
(‘tutorial.R’ in the ‘script/’ subdirectory) and the
UCS/Perl documentation.