<<

NAME

UCS::AM::HTest - More association measures based on hypothesis tests

SYNOPSIS

  use UCS;
  use UCS::AM::HTest;

  @htest_AMs = UCS::AM_Keys();

  # z.score.pv
  # z.score.corr.pv
  # t.score.pv
  # chi.squared.tt
  # chi.squared.tt.pv
  # chi.squared.corr.tt
  # chi.squared.corr.tt.pv
  # chi.squared.pv
  # chi.squared.corr.pv
  # log.likelihood.tt
  # log.likelihood.tt.pv
  # log.likelihood.pv
  # binomial.pv
  # multinomial.likelihood.pv
  # hypergeometric.likelihood.pv
  # binomial.likelihood.pv
  # Poisson.likelihood.pv
  # Poisson.likelihood.Perl.pv

DESCRIPTION

This module contains some further association measures based on statistical hypothesis tests, most of which are variants of measures defined in the UCS::AM module. There are also several likelihood measures, which compute the probability of the observed contingency table rather than applying a full hypothesis test. The association measures defined in this module are intended mainly for a detailed comparative study of the properties of the significance-of-association class of AMs. Casual users should stick with the variants found in the UCS::AM module.

The following section gives a full listing of the association measures defined in the UCS::AM::HTest module with short explanations. Please refer to http://www.collocations.de/AM/ for the full equations and references. When the module is imported, the additional measures are registered with the UCS core library (see the UCS manpage for details on how to access registered association measures).

The association scores of measures with the suffix .pv can be interpreted as probabilities (i.e. the likelihood of the observed data or the p-value of a statistical hypothesis test). Such probabilities are given as negative base 10 logarithms, ranging from 0 to +inf (+inf is represented by the return value of the built-in inf function (see the UCS::Expression::Func manpage). Measures with the suffix .tt (for two-tailed) are derived from two-sided statistical hypothesis tests. One-sided versions of these tests are provided under the same name without the suffix.

ASSOCIATION MEASURES

z.score.pv

The significance (one-sided p-value) corresponding to z.score, obtained from the distribution function of the standard normal distribution. (The z.score measure computes a z-score for the observed cooccurrence frequency O11 compared to the expected frequency E11; see the UCS::AM manpage for details.)

z.score.corr.pv

The significance (one-sided p-value) corresponding to z.score.corr, a z-score for O11 against E11 with Yates' continuity correction applied.

t.score.pv

The significance (one-sided p-value) corresponding to t.score, obtained from the distribution function of the standard normal distribution. Since the number of degrees of freedom is very large, the t-distribution of the test statistic is practically identical to the standard normal distribution (t-distribution with df=inf). (The t.score measure is an application of Student's t-test to the comparison of O11 against E11; see the UCS::AM manpage for details.)

chi.squared.tt

Pearson's chi-squared test for independence of rows and columns in a 2x2 contingency table. The equation used in this implementation is derived from the homogeneity version of the chi-squared test (for equality of the success probabilities of two independent binomial distributions), and is fully equivalent to that of the independence test. Note that Pearson's chi-squared test is two-sided.

chi.squared.tt.pv

The significance (two-sided p-value) corresponding to chi.squared.tt, obtained from the chi-squared distribution with one degree of freedom.

chi.squared.corr.tt

Pearson's chi-squared test for independence of rows and columns in a 2x2 contingency table, with Yates' continuity correction applied (two-sided test).

chi.squared.corr.tt.pv

The significance (two-sided p-value) corresponding to chi.squared.corr.tt.

chi.squared.pv

The significance (one-sided p-value) corresponding to chi.squared, the one-sided version of Pearson's test for the independence of rows and columns (see the UCS::AM manpage for details). The p-value is obtained from the standard normal distribution (since the signed square root of the chi-squared test statistic has a standard normal distribution).

chi.squared.corr.pv

The significance (one-sided p-value) corresponding to chi.squared.corr, the one-sided version of Pearson's chi-squared test with Yates' continuity correction applied. Again, the p-value is obtained from the standard normal distribution.

log.likelihood.tt

The log-likelihood statistic suggested by Dunning (1993), a likelihood ratio test for independence of rows and columns in a 2x2 contingency table. (Dunning introduced the statistic as a test for homogeneity of the table columns, i.e. equal success probabilites of two independent binomial distributions). Note that all likelihood ratio tests are two-sided tests.

log.likelihood.tt.pv

The significance (two-sided p-value) corresponding to log.likelihood.tt, obtained from the chi-squared distribution with one degree of freedom.

log.likelihood.pv

The significance (one-sided p-value) corresponding to log.likelihood, the one-sided version of Dunning's likelihood ratio test (see the UCS::AM manpage for details). The p-value is obtained from the standard normal distribution (since the signed square root of the log-likelihood statistic has a standard normal distribution.)

binomial.pv

Significance (one-sided p-value) of an exact binomial test for the observed cooccurrence frequency O11 compared to the expected frequency E11 under the point null hypothesis of independence. This test is computationally expensive and may be numerically unstable, so use with caution. (This is also the reason why it is not included in the UCS::AM module.)

multinomial.likelihood.pv

Likelihood of the observed contingency table under the point null hypothesis of independence (i.e. with expected frequencies E11, E12, E21, and E22 estimated from the observed table).

hypergeometric.likelihood.pv

Likelihood of the observed contingency table under the null hypothesis of independence of rows and columns, with all marginal frequencies fixed to the observed values.

binomial.likelihood.pv

Binomial likelihood of the observed cooccurrence frequency O11 under the point null hypothesis (with expected frequency E11 estimated from the observed table). This function is relatively slow and may be numerically unstable, so use with caution.

Poisson.likelihood.pv

Poisson approximation of the binomial likelihood binomial.likelihood.pv, which is numerically and analytically more manageable.

Poisson.likelihood.Perl.pv

Alternative version of binomial.likelihood.pv, based on a direct Perl implementation of the naive multiplicative algorithm.

COPYRIGHT

Copyright 2003 Stefan Evert.

This software is provided AS IS and the author makes no warranty as to its use and performance. You may use the software, redistribute and modify it under the same terms as Perl itself.

<<