fzm {UCS} | R Documentation |
Object constructor for a finite Zipf-Mandelbrot (fZM) LNRE model with parameters α, A and B (see Evert, 2004a for details). Either the parameters are specified explicitly, or one or more of them can be estimated from an observed frequency spectrum.
fzm(alpha, A, B) fzm(alpha, A, N, V) fzm(alpha, N, V, spc, m.max=15, stepmax=10, debug=FALSE) fzm(N, V, spc, m.max=15, stepmax=10, debug=FALSE)
alpha |
a number in the range (0,1), the shape parameter
α of the fZM model. alpha can automatically be
estimated from N , V , and spc . |
A |
a small positive number A << 1, the parameter
A of the fZM model. A can automatically be estimated
from N , V , and spc . |
B |
a large positive number B >> 1, the parameter
B of the fZM model. B can automatically be estimated
from N and V . |
N |
the sample size, i.e. number of observed tokens |
V |
the vocabulary size, i.e. the number of observed types |
spc |
a vector of non-negative integers representing the class
sizes V_m of the observed frequency spectrum. The vector is
usually read from a file in lexstats format with the
read.spectrum function. |
m.max |
the number of ranks from spc that will be used to
estimate the α parameter |
stepmax |
maximal step size of the nlm function used for
parameter estimation. It should not be necessary to change the
default value. |
debug |
if TRUE , print debugging information during the
parameter estimation process. This feature can be useful to find
out why parameter estimation fails. |
The fZM model with parameters α in (0,1) and C > 0 is defined by the type density function
g(p) := C * p^(-alpha - 1)
for A <= p <= B. The normalisation constant C is determined from the other parameters by the condition
integral_A^B p * g(p) dp = 1
The parameters α and A are estimated simultaneously
by nonlinear minimisation (nlm
) of a multinomial chi-squared
statistic for the observed against the expected frequency spectrum.
Note that this is different from the multivariate chi-squared test
used to measure the goodness-of-fit of the final model (Baayen, 2001,
Sec. 3.3).
An object of class "fzm"
with the following components:
alpha |
value of the α parameter |
A |
value of the A parameter |
B |
value of the B parameter |
C |
value of the normalisation constant C |
N |
number of observed tokens (if specified) |
V |
number of observed types (if specified) |
spc |
observed frequency spectrum (if specified) |
This object print
s a short summary, including the population
size S and a comparison of the first ranks of the observed and
expected frequency spectrum (if available).
Baayen, R. Harald (2001). Word Frequency Distributions. Kluwer, Dordrecht.
Evert, Stefan (2004a). A simple LNRE model for random character sequences. In Proceedings of JADT 2004, Louvain-la-Neuve, Belgium, pages 411–422.
zm
, EV
, EVm
,
write.lexstats
, read.spectrum
, and
spectrum.plot