by Viktor Trón
UCS crucially relies on Perl and the R (a language for statistical computing). UCS/Perl uses R as a backend: important statistical functions provided by R are available through a Perl module.
UCS will carp about any further missing dependencies.
tar xzvf UCS-0.3.2.tar.gz
cd UCS
perl System/install.perl
export UCS=`System/bin/ucs-config --base-dir`
export PATH=$PATH:$UCS/bin
ucsdoc ProgramName|ModuleName
Tk::Pod
at installation) use:
ucsdoc -tk ProgramName|ModuleName
.ds
)
Fundamental objects of the UCS toolkit are frequency data extracted from a given corpus for a given type of cooccurrences. Examples are
.ds
), see [ucsfile].
Data sets files are processed in gzipped form (.ds.gz
)
Examples are in DataSet/Distrib/
ucs-info -v glaw.ds.gz
zmore $UCS/DataSet/Distrib/dickens.ds.gz
ucs-print -i dickens.ds.gz
ucs-print dickens.ds.gz
ucs-select f FROM glaw.ds.gz TO ranks.ds.gz
YourExtractionTool
)
printing the instances (in the format ITEM1 TAB ITEM2 NEWLINE
representing a pair token) to
standard out, you can construct your data set with
YourExtractionTool | ucs-make-tables -v
Example script extracting A+N cooccurrences from
IMS Corpus Workbench (CWB).
With the CWB/Perl modules and the demo corpus installed, one can re-create the Dickens data set with
$UCS/Perl/tools/ucs-adj-n-from-cwb.perl penn DICKENS
| ucs-make-tables -v -f 3 my-dickens.ds.gz
bigrams.cnt
was created with NSP's count.pl
tool, create the UCS data set from it with
$UCS/Perl/tools/nsp2ucs.perl -v bigrams.cnt bigrams.ds.gz
ucs-summarize -v
ucs-sort -v dickens.ds.gz BY f+ -r INTO sorted.ds.gz
ucs-add -v am.t.score am.log.likelihood TO dickens.ds.gz INTO scores.ds.gz
ucs-add -v 'r.%' TO scores.ds.gz INTO ranks.ds.gz
ucs-select -v --count FROM ranks.ds.gz WHERE '%f% >= 10'
ucs-select -v '*' 'r.%' FROM ranks.ds.gz WHERE '%l2% =~ /ness$/' | ucs-sort by l2 l1 | ucs-print -i
ucs-join -v fr-pnv.ds.gz pnv.adb.gz
ucs-join -v fr-pnv.ds.gz WITH b.figur b.fvg FROM pnv.adb.gz INTO fr-annotated.ds.gz
ucs-add -v -m 'b.TP := %b.figur% or %b.fvg%'
TO fr-annotated.ds.gz INTO fr-annotated.ds.gz
ucs-select -v --count FROM fr-annotated.ds.gz
WHERE '%b.TP% and %r.log.likelihood% <= 500'