Jackknife Estimation

April 1st, 2008 by admin

The Jackknife estimation method is first used by Quenouille (1956) (Link to the paper: http://links.jstor.org/sici?sici=0006-3444%28195612%2943%3A3%2F4%3C353%3ANOBIE%3E2.0.CO%3B2-4) and Jones (1956) (http://links.jstor.org/sici?sici=0162-1459%28195603%2951%3A273%3C54%3AITPOAS%3E2.0.CO%3B2-O).

A simple case

Suppose we have a sample x=(x_1,x_2,...,x_n) and an estimator \hat{\theta}=s(x). The jackknife
uses the samples that leave out one observation at a time:
x_{(i)}=(x_1,x_2,...,x_{i-1}, x_{i+1},...,x_n).

which is called jackknife samples. The ith jackknife sample consists of the data set
with the ith observation removed. Let \hat{\theta}_{(i)}=s(x_{(i)}), then the jackknife estimator of \theta is

\hat{\theta}_{jack}=\hat{\theta}

and the jackknife standard error is

\hat{s.e.(\theta)}_{jack}=\sqrt{\frac{n-1}{n}\sum(\hat{\theta}_{(i)}-\hat{\theta}_{(.)})^2}

with \hat{\theta}_{(.)}=\sum\hat{\theta}_{(i)}/n.

The General Case

Divide the sample of size n into g groups of size m each, so n = mg. (Often m = 1 and g = n.) Let \hat{\theta}_{(j)} be the estimator for \theta obtained by ignoring the jth group and using the only using the other g-1 other groups.

The Jackknife estimator is \hat{\theta}_{jack}=g\hat{\theta}-(g-1)\hat{\theta}_{(.)}, where \hat{\theta}_{(.)}=\sum_{j=1}^g\hat{\theta}_{(j)}/g.

The benifit of Jackknife estimator is that The Jackknife estimator lowers the bias from order 1/n to 1/{n^2}.

Newton-Raphson method

April 1st, 2008 by admin

Newton-Raphson method, also called the Newton’s method, is a root-finding algorithm that uses the Taylor series of a function f(x) in the vicinity of a suspected root. Given an initial guess of the root x_0, the Taylor series of f(x) about the point x=x_0+\varepsilon_0 is given by

f(x_0+\varepsilon)=f(x_0)+f\prime(x_0)\varepsilon_0+...

If x=x_0+\varepsilon_0 is the root, then f(x_0)=0. Thus we can get

\varepsilon_0=-\frac{f(x_0)}{f\prime(x_0)}.

By letting x_1=x_0+\varepsilon_0, we can calculate a new \varepsilon_1, and so on. At the nth step, we can get

x_n=x_{n-1}-\frac{f(x_{n-1})}{f\prime(x_{n-1})}.

Newton-Raphson can be used to obtain maximum likelihood estimation of a statistical model. For MLE, after we get the log-likelihood function, we take the first derivative and set it to 0. In this case, it likes to find the root of a function. Thus, Newton-Raphson method can be used directly.

Simple linear regression

April 1st, 2008 by admin

In statistics, linear regression is a method of estimating the conditional expected value of one variable y given the values of some other variable or variables x.

A linear regression model is typically stated in the form

y=\alpha+\beta*x+\varepsilon.

Usually, we assume x is determinstic. Conditionally on x,

y|x\sim N(\alpha+\beta*x,\sigma_2^2).

However,

y\sim N(\alpha+\beta*\mu_x, \beta^2*\sigma_x^2+\sigma_e^2).

This can be obtained using the following formula:

var(y)=var[E(y|x)] + E[var(y|x)].

var(y_i|x_i)=\sigma_e^2, thus E[var(y_i|x_i)]=\sigma_e^2.

E(y_i|x_i)=\alpha+\beta*x_i, thus

var[E(y_i|x_i)]=var(\alpha+\beta*x_i)=\beta^2*\sigma_x^2.

R square, which represents how much variance in y can be explained by x, is equal to

R^2=\frac{\beta^2*\sigma_x^2}{\beta^2*\sigma_x^2+\sigma_e^2}.

Adjusted R square =1-(1-R^2)\frac{n-1}{n-k-1}.

R sqaure sometimes is used to judge how well x can predict y. Big R suqare means that x is a good predictor of y. Small R square means we may need the other variables to predict y well.

R square does nothing with the model fit. For the simple regression, the F-test is the same with t-test of H_0: \beta=0. If this kind of test is significant, there exists linear relationship between y and x. Whether F/t-test is significant or not is not related to the magnitude of R square. However, if R square is very small, it usually means x is not a good predictor of y.

A related discussion of R square can be found at http://www.statisticalexperts.com/jianxu/2006/10/08/r2-confusion/.

Any comments are welcome.

Law of total variance

April 1st, 2008 by admin

In probability theory, the law of total variance (conditional variance formula) states that if X and Y are random variables on the same probability space, and the variance of X is finite, then

var(Y)=var[E(Y|X)] + E[var(Y|X)]

This can be proved easily.

var(Y) = E(Y2) − E(Y)2
= E(E(Y2|X)) − E(E(Y|X))2
= E(var(Y|X)) + E(E(Y|X)2) − E(E(Y|X))2
= E(var(Y|X)) + var(E(Y|X)).

The first term is the unexplained component of the variance; the second is the explained component of the variance. The conditional expected value E( X | Y ) is a random variable in its own right, whose value depends on the value of Y. Notice that the conditional expected value of X given the event Y = y is a function of y.

This formula can be applied widely. An example to see: http://www.statisticalexperts.com/statexp/2006/10/10/simple-linear-regression/

Greek letters in LaTex

April 1st, 2008 by admin

 \alpha  \alpha  \rho  \rho
 \beta  \beta  \varrho  \varrho
 \gamma  \gamma  \tau  \tau
 \delta  \delta  \upsilon  \upsilon
 \epsilon  \epsilon  \phi  \phi
 \varepsilon  \varepsilon  \varphi  \varphi
 \zeta  \zeta  \chi  \chi
 \eta  \eta  \psi  \psi
 \theta  \theta  \omega  \omega
 \vartheta  \vartheta  \Gamma  \Gamma
\gamma   \gamma  \Delta  \Delta
\kappa   \kappa  \Theta  \Theta
 \lambda  \lambda  \Lambda  \Lambda
 \mu  \mu  \Xi  \Xi
 \nu  \nu  \Pi  \Pi
 \xi  \xi  \Sigma  \Sigma
 o  o  \Upsilon  \Upsilon
 \pi  \pi  \Phi  \Phi
 \varpi  \varpi  \Psi  \Psi
 \sigma  \sigma  \Omega  \Omega

\varsigma \varsigma