<<

# NAME

UCS::SFunc - Special functions and statistical distributions

# SYNOPSIS

```  use UCS::SFunc;

# special functions (all logarithms are base 10)
\$c     = choose(\$n, \$k);       # binomial coefficient
\$log_c = lchoose(\$n, \$k);

\$y     = gamma(\$a);            # Gamma function
\$log_y = lgamma(\$a);
\$y     = igamma(\$a, \$x [, \$upper]);  # incomplete Gamma functions
\$log_y = ligamma(\$a, \$x [, \$upper]);
\$y     = rgamma(\$a, \$x [, \$upper]);  # regularised Gamma functions
\$log_y = lrgamma(\$a, \$x [, \$upper]);
\$x     = igamma_inv(\$a, \$y [, \$upper]); # inverse Gamma functions
\$x     = ligamma_inv(\$a, \$log_y [, \$upper]);
\$x     = rgamma_inv(\$a, \$y [, \$upper]);
\$x     = lrgamma_inv(\$a, \$log_y [, \$upper]);

\$y     = beta(\$a, \$b);         # Beta function
\$log_y = lbeta(\$a, \$b);
\$y     = ibeta(\$x, \$a, \$b);    # incomplete Beta function
\$log_y = libeta(\$x, \$a, \$b);
\$y     = rbeta(\$x, \$a, \$b);    # regularised Beta function
\$log_y = lrbeta(\$x, \$a, \$b);
\$x     = ibeta_inv(\$y, \$a, \$b);        # inverse Beta functions
\$x     = libeta_inv(\$log_y, \$a, \$b);
\$x     = rbeta_inv(\$y, \$a, \$b);
\$x     = lrbeta_inv(\$log_y, \$a, \$b);

# binomial distribution (density, tail probabilities, quantiles)
\$d  = dbinom(\$k, \$size, \$prob);
\$ld = ldbinom(\$k, \$size, \$prob);
\$p  = pbinom(\$k, \$size, \$prob [, \$upper]);
\$lp = lpbinom(\$k, \$size, \$prob [, \$upper]);
\$k  = qbinom(\$p, \$size, \$prob [, \$upper]);
\$k  = lqbinom(\$lp, \$size, \$prob [, \$upper]);

# Poisson distribution (density, tail probabilities, quantiles)
\$d  = dpois(\$k, \$lambda);
\$ld = ldpois(\$k, \$lambda);
\$p  = ppois(\$k, \$lambda [, \$upper]);
\$lp = lppois(\$k, \$lambda [, \$upper]);
\$k  = qpois(\$p, \$lambda [, \$upper]);
\$k  = lqpois(\$lp, \$lambda [, \$upper]);

# normal distribution (density, tail probabilities, quantiles)
\$d  = dnorm(\$x, \$mu, \$sigma);
\$ld = ldnorm(\$x, \$mu, \$sigma);
\$p  = pnorm(\$x, \$mu, \$sigma [, \$upper]);
\$lp = lpnorm(\$x, \$mu, \$sigma [, \$upper]);
\$x  = qnorm(\$p, \$mu, \$sigma [, \$upper]);
\$x  = lqnorm(\$lp, \$mu, \$sigma [, \$upper]);

# chi-squared distribution (density, tail probabilities, quantiles)
\$d  = dchisq(\$x, \$df);
\$ld = ldchisq(\$x, \$df);
\$p  = pchisq(\$x, \$df [, \$upper]);
\$lp = lpchisq(\$x, \$df [, \$upper]);
\$x  = qchisq(\$p, \$df [, \$upper]);
\$x  = lqchisq(\$lp, \$df [, \$upper]);

# hypergeometric distribution (density and tail probabilities)
\$d = dhyper(\$k, \$R1, \$R2, \$C1, \$C2);
\$ld = ldhyper(\$k, \$R1, \$R2, \$C1, \$C2);
\$p = phyper(\$k, \$R1, \$R2, \$C1, \$C2 [, \$upper]);
\$lp = lphyper(\$k, \$R1, \$R2, \$C1, \$C2 [, \$upper]);```

# DESCRIPTION

This module provides special functions and common statistical distributions. Currently, all functions are imported from the UCS/R system (using the UCS::R interface).

# SPECIAL FUNCTIONS

UCS::SFunc currently provides the following special mathematical functions: binomial coefficients, the Gamma function, the incomplete Gamma functions and their inverses, the regularised Gamma functions and their inverses, the Beta function, the incomplete Beta function and its inverse, and the regularised Beta function and its inverse. Note that all logarithmic versions return base 10 logarithms!

\$coef = choose(\$n, \$k);
\$log_coef = lchoose(\$n, \$k);

The binomial coefficient "\$n over \$k", and its logarithm.

\$y = gamma(\$a);
\$log_y = lgamma(\$a);

The (complete) Gamma function with argument \$a, and its logarithm. Note that the factorial n! is equal to `gamma(n+1)`.

\$y = igamma(\$a, \$x [, \$upper]);
\$log_y = ligamma(\$a, \$x [, \$upper]);

The incomplete Gamma function with arguments \$a and \$x, and its logarithm. If \$upper is specified and true, the upper incomplete Gamma function is computed, otherwise the lower incomplete Gamma function. It is recommended to set \$upper to the string constant `'upper'` as a reminder of its function.

\$x = igamma_inv(\$a, \$y [, \$upper]);
\$x = ligamma_inv(\$a, \$log_y [, \$upper]);

The inverse of the incomplete Gamma function, as well as the inverse of its logarithm.

\$y = rgamma(\$a, \$x [, \$upper]);
\$log_y = lrgamma(\$a, \$x [, \$upper]);

The regularised Gamma function with arguments \$a and \$x, and its logarithm. If \$upper is specified and true, the upper regularised Gamma function is computed, otherwise the lower regularised Gamma function. It is recommended to set \$upper to the string constant `'upper'` as a reminder of its function.

\$x = rgamma_inv(\$a, \$y [, \$upper]);
\$x = lrgamma_inv(\$a, \$log_y [, \$upper]);

The inverse of the regularised Gamma function, as well as the inverse of its logarithm.

\$beta = beta(\$a, \$b);
\$log_beta = lbeta(\$a, \$b);

The (complete) Beta function with arguments \$a and \$b, and its logarithm.

\$y = ibeta(\$x, \$a, \$b);
\$log_y = libeta(\$x, \$a, \$b);

The incomplete Beta function with arguments \$x, \$a, and \$b, and its logarithm.

\$x = ibeta_inv(\$y, \$a, \$b);
\$x = libeta_inv(\$log_y, \$a, \$b);

The inverse of the incomplete Beta function, as well as the inverse of its logarithm.

\$y = rbeta(\$x, \$a, \$b);
\$log_y = lrbeta(\$x, \$a, \$b);

The regularised Beta function with arguments \$x, \$a, and \$b, and its logarithm.

\$x = rbeta_inv(\$y, \$a, \$b);
\$x = lrbeta_inv(\$log_y, \$a, \$b);

The inverse of the regularised Beta function, as well as the inverse of its logarithm.

# STATISTICAL DISTRIBUTIONS

UCS::SFunc computes densities, tail probabilities (= distribution function), and quantiles for the following statistical distributions: binomial distribution, Poisson distribution, normal distribution, chi-squared distribution, hypergeometric distribution. The function names are the common abbreviations as used e.g. in the R language, with additional logarithmic versions (that start with the letter `l`) (these correspond to the `log=TRUE` and `log.p=TRUE` parameters in R).

Note that logarithmic probabilities are always given as negative base 10 logarithms. The logarithmic density and tail probability functions return such logarithmic p-values, and the quantile functions expect them in their first argument.

## The Binomial Distribution

Binomial distribution with parameters \$size (= number of trials) and \$prob (= success probability in single trial). E[X] = \$size * \$prob, V[X] = \$size * \$prob * (1 - \$prob).

\$d = dbinom(\$k, \$size, \$prob);
\$ld = ldbinom(\$k, \$size, \$prob);

Density P(X = \$k) and its negative base 10 logarithm.

\$p = pbinom(\$k, \$size, \$prob [, \$upper]);
\$lp = lpbinom(\$k, \$size, \$prob [, \$upper]);

Tail probabilities P(X <= \$k) and P(X > \$k) (if \$upper is specified and true), and their negative base 10 logarithms. It is recommended to set \$upper to the string `'upper'` as a reminder of its meaning.

The R implementation of binomial tail probabilities underflows for very small probabilities (even in the logarithmic version), as of R version 2.1. Therefore, these functions use a mixture of R and Perl code to compute upper tail probabilities for large samples (which are most likely to lead to undeflow problems for cooccurrence data).

\$k = qbinom(\$p, \$size, \$prob [, \$upper]);
\$k = lqbinom(\$lp, \$size, \$prob [, \$upper]);

Lower and upper quantiles. The lower quantile is the smallest value \$k with P(X <= \$k) >= \$p. The upper quantile (which is computed when \$upper is specified and true) is the largest value \$k with P(X > \$k) >= \$p. In the logarithmic version, \$lp must be the negative base 10 logarithm of the desired p-value.

Note that these functions use the R implementation directly without a workaround for undeflow problems. The quantiles returned for very small p-values (especially when using lqbinom) are therefore unreliable and should be used with caution.

## The Poisson Distribution

Poisson distribution with parameter \$lambda (= expectation); E[X] = V[X] = \$lambda.

\$d = dpois(\$k, \$lambda);
\$ld = ldpois(\$k, \$lambda);

Density P(X = \$k) and its negative base 10 logarithm.

\$p = ppois(\$k, \$lambda [, \$upper]);
\$lp = lppois(\$k, \$lambda [, \$upper]);

Tail probabilities P(X <= \$k) and P(X > \$k) (if \$upper is specified and true), and their negative base 10 logarithms. It is recommended to set \$upper to the string `'upper'` as a reminder of its meaning.

\$k = qpois(\$p, \$lambda [, \$upper]);
\$k = lqpois(\$lp, \$lambda [, \$upper]);

Lower and upper quantiles. The lower quantile is the smallest value \$k with P(X <= \$k) >= \$p. The upper quantile (which is computed when \$upper is specified and true) is the largest value \$k with P(X > \$k) >= \$p. In the logarithmic version, \$lp must be the negative base 10 logarithm of the desired p-value.

## The Normal Distribution

Normal distribution with parameters \$mu (= expectation) and \$sigma (= standard deviation). Unspecified parameters default to \$mu = 0 and \$sigma = 1. E[X] = \$mu, V[X] = \$sigma ** 2.

\$d = dnorm(\$x, \$mu, \$sigma);
\$ld = ldnorm(\$x, \$mu, \$sigma);

Density P(X = \$x) and its negative base 10 logarithm.

\$p = pnorm(\$x, \$mu, \$sigma [, \$upper]);
\$lp = lpnorm(\$x, \$mu, \$sigma [, \$upper]);

Tail probabilities P(X <= \$x) and P(X >= \$x) (if \$upper is specified and true), and their negative base 10 logarithms. It is recommended to set \$upper to the string `'upper'` as a reminder of its meaning.

\$x = qnorm(\$p, \$mu, \$sigma [, \$upper]);
\$x = lqnorm(\$lp, \$mu, \$sigma [, \$upper]);

Lower and upper quantiles. The lower quantile is the smallest value \$x with P(X <= \$x) >= \$p. The upper quantile (which is computed when \$upper is specified and true) is the largest value \$x with P(X >= \$x) >= \$p. In the logarithmic version, \$lp must be the negative base 10 logarithm of the desired p-value.

## The Chi-Squared Distribution

Chi-squared distribution with parameter \$df (= degrees of freedom); E[X] = \$df, V[X] = 2 * \$df.

\$d = dchisq(\$x, \$df);
\$ld = ldchisq(\$x, \$df);

Density function f(x) and its negative base 10 logarithm.

\$p = pchisq(\$x, \$df [, \$upper]);
\$lp = lpchisq(\$x, \$df [, \$upper]);

Tail probabilities P(X <= \$x) and P(X >= \$x) (if \$upper is specified and true), and their negative base 10 logarithms. It is recommended to set \$upper to the string `'upper'` as a reminder of its meaning.

\$x = qchisq(\$p, \$df [, \$upper]);
\$x = lqchisq(\$lp, \$df [, \$upper]);

Lower and upper quantiles. The lower quantile is the smallest value \$x with P(X <= \$x) >= \$p. The upper quantile (which is computed when \$upper is specified and true) is the largest value \$x with P(X >= \$x) >= \$p. In the logarithmic version, \$lp must be the negative base 10 logarithm of the desired p-value.

## The Hypergeometric Distribution

Hypergeometric distribution of the upper left-hand corner X in a 2x2 contingency table with fixed marginals \$R1, \$R2, \$C1, and \$C2, where both \$R1 + \$R2 and \$C1 + \$C2 must sum to the sample size N. \$k represents the observed value of X and must be in the admissible range max(0, \$R1 - \$C2) <= \$k <= min(\$R1, \$C1), otherwise the density will be given as 0 and tail probabilities as 1 or 0, respectively. E[X] = \$R1 * \$C1 / \$N, V[X] = \$R1 * \$R2 * \$C1 * \$C2 / (N^2 * (N-1)).

For R versions before 2.0, the upper tail probabilities are computed with a mixture of R and Perl code to circumvent a cancellation problem in the R implementation and achieve better precision. For this reason, the functions for quantiles are currently not supported (but may be when R version 2.0 is required for the UCS toolkit).

\$d = dhyper(\$k, \$R1, \$R2, \$C1, \$C2);
\$ld = ldhyper(\$k, \$R1, \$R2, \$C1, \$C2);

Density P(X = \$k) and its negative base 10 logarithm.

\$p = phyper(\$k, \$R1, \$R2, \$C1, \$C2 [, \$upper]);
\$lp = lphyper(\$k, \$R1, \$R2, \$C1, \$C2 [, \$upper]);

Tail probabilities P(X <= \$k) and P(X > \$k) (if \$upper is specified and true), and their negative base 10 logarithms. It is recommended to set \$upper to the string `'upper'` as a reminder of its meaning.