zm {UCS} | R Documentation |

Object constructor for a Zipf-Mandelbrot (ZM) LNRE model with
parameters *α* and *C* (Evert, 2004a).
Either the parameters are specified explicitly, or one or both of them
can be estimated from an observed frequency spectrum.

zm(alpha=NULL, C=NULL, N=NULL, V=NULL, spc=NULL, m.max=15, stepmax=10, debug=FALSE)

zm(alpha, C) zm(alpha, N, V) zm(N, V, spc, m.max=15, stepmax=10, debug=FALSE)

`alpha` |
a number in the range (0,1), the shape parameter
α of the ZM model. `alpha` can automatically be
estimated from `N` , `V` , and `spc` . |

`C` |
a positive number, the parameter C of the ZM model.
`C` can automatically be estimated from `N` and `V` . |

`N` |
the sample size, i.e. number of observed tokens |

`V` |
the vocabulary size, i.e. the number of observed types |

`spc` |
a vector of non-negative integers representing the class
sizes V_m of the observed frequency spectrum. The vector is
usually read from a file in `lexstats` format with the
`read.spectrum` function. |

`m.max` |
the number of ranks from `spc` that will be used to
estimate the α parameter |

`stepmax` |
maximal step size of the `nlm` function used for
parameter estimation. It should not be necessary to change the
default value. |

`debug` |
if `TRUE` , print debugging information during the
parameter estimation process. This feature can be useful to find
out why parameter estimation fails. |

The ZM model with parameters *α \in (0,1)* and *C > 0* is
defined by the type density function

*
g(p) := C * p^(-alpha - 1)*

for *0 <= p <= B*, where the upper bound *B*
is determined from *C* by the normalisation condition

*
integral_0^Inf p * g(p) dp = 1*

The parameter *α* is estimated by nonlinear minimisation
(`nlm`

) of a multinomial chi-squared statistic
for the observed against the expected frequency spectrum. Note that
this is different from the multivariate chi-squared test used to
measure the goodness-of-fit of the final model (Baayen, 2001,
Sec. 3.3).

See Evert (2004, Ch. 4) for further mathematical details, especially concerning the expected vocabulary size, frequency spectrum and conditional parameter distribution, as well as their variances.

An object of class `"zm"`

with the following components:

`alpha` |
value of the α parameter |

`B` |
value of the upper bound B (a normalisation device) |

`C` |
value of the C parameter |

`N` |
number of observed tokens (if specified) |

`V` |
number of observed types (if specified) |

`spc` |
observed frequency spectrum (if specified) |

`print`

s a short summary, including a comparison
of the first ranks of the observed and expected frequency spectrum (if
available).
Baayen, R. Harald (2001). *Word Frequency Distributions.* Kluwer,
Dordrecht.

Evert, Stefan (2004). *The Statistics of Word Cooccurrences: Word
Pairs and Collocations.* PhD Thesis, IMS, University of Stuttgart.

Evert, Stefan (2004a). A simple LNRE model for random character
sequences. In *Proceedings of JADT 2004*, Louvain-la-Neuve,
Belgium, pages 411–422.

`fzm`

, `EV`

, `EVm`

,
`VV`

, `VVm`

,
`write.lexstats`

, `lnre.goodness.of.fit`

,
`read.spectrum`

, and `spectrum.plot`

[Package *UCS* version 0.5 Index]