Statistical Inference ( Unit 2: Cramer-Rao Inequality, Method Of Moment, Maximum Likelihood Estimator)
Statistical Inference I: (Cramer-Rao Inequality, Method Of Moment, Maximum Likelihood Estimator)
I. Introduction
We see in unbiased estimator from two distinct unbiased estimators give infinitely many unbiased estimators of θ, among these estimators we find the best estimator for parameter θ by comparing their variance or mean square errors. But in some examples, we see that the number of estimators is possible as.
For Normal distribution:
If X1, X2, ........Xn. random sample from a normal distribution with mean 𝛍 and variance 𝛔², then T1 = x̄, T2 = Sample median, both are unbiased estimators for parameter 𝛍. Now we find a sufficient estimator; therefore, T1 is a sufficient estimator for 𝛍, hence it is the best estimator for parameter 𝛍.
Thus, for finding the best estimator, we check if the estimator is sufficient or not.
Now we are interested in finding the variance of the best estimator. This can be discussed in this article.
II. Properties of Probability Mass Function (p.m.f.) or Probability Density Function (p.d.f)
If X1, X2, ........Xn. are from any p.d.f or p.m.f f(x,θ), θ ∈ Θ, following are the properties of P.D.F. OR P.M.F:
Properties of Fisher Information Function
1. ∫∞−∞f(x,θ)dx=1
Proof: We have
∫∞−∞f(x,θ)dx=1 from 1
2. ∂∂θ∫∞−∞f(x,θ)dx=0
Proof: We have
∂∂θ∫∞−∞f(x,θ)dx=0 from 2
3. E[∂∂θlogf(x,θ)]=0
Proof: We have
E[∂∂θlogf(x,θ)]=0 from 3
4. E[∂2∂θ2logf(x,θ)]=−E[∂logf(x,θ)∂θ]2
Proof: We have
E[∂2∂θ2logf(x,θ)]=−E[∂logf(x,θ)∂θ]2 from 4
5. Var[∂∂θlogf(x,θ)]=−E[∂2∂θ2logf(x,θ)]=E[∂∂θlogf(x,θ)]2
Proof: We have
Var(∂∂θlogf(x,θ)) = E[∂∂θlogf(x,θ)]2−(E[∂∂θlogf(x,θ)])2
Var(∂∂θlogf(x,θ)) = E[∂∂θlogf(x,θ)]2−0 from 3
Var(∂∂θlogf(x,θ)) = E[∂∂θlogf(x,θ)]2=−E[∂2∂θ2logf(x,θ)] from 4
Fisher Information Function:
Definition:
- Fisher Information Measure (or the amount of information) about parameter θ obtained in the random variable x is denoted as I(θ) and is defined as
- Fisher Information Measure (or the amount of information) about parameter θ obtained in the random variable X1,X2,…,Xn of size n is denoted as In(θ) and is defined as
- Let X1,X2,…,Xn be a random sample from the distribution of random variable x, and T=T(X1,X2,…,Xn) be any statistic for the parameter θ, and g(t,θ) is the probability function, then the Fisher information function (or the amount of information) about parameter θ contained in statistic T is given by IT(θ) and is given as
I(θ)=Var[∂∂θlogf(x,θ)]=E[(∂∂θlogf(x,θ))2]=−E[∂2∂θ2logf(x,θ)]
In(θ)=Var[∂∂θlogL(θ)]=E[(∂∂θlogL(θ))2]=−E[∂2∂θ2logL(θ)]
IT(θ)=Var[∂∂θlogg(t,θ)]=E[(∂∂θlogg(t,θ))2]=−E[∂2∂θ2logg(t,θ)]
Properties of Fisher Information Function
Result 1:
Let X1,X2,…,Xn be a random sample from the distribution f(x,θ), then In(θ)=nI(θ).
Proof:
Let X1,X2,…,Xn be a random sample from the distribution f(x,θ), θ∈Θ.
The Fisher Information Measure (or the amount of information) about parameter θ obtained in random variable x is denoted as I(θ) and is defined as:
I(θ)=Var[∂∂θlogf(x,θ)]=E[(∂∂θlogf(x,θ))2]=−E[∂2∂θ2logf(x,θ)]
Now consider the joint probability function or the likelihood function of random variables X1,X2,…,Xn, then L(θ)=f(X1,X2,…,Xn,θ).
L(θ)=∏f(x,θ)
Taking the logarithm on both sides:
logL(θ)=log(∏f(x,θ))
logL(θ)=∑logf(x,θ)
Differentiating with respect to θ:
∂∂θlogL(θ)=∑∂∂θlogf(x,θ)(i)
We have Fisher Information Measure (or the amount of information) about parameter θ obtained in random variable X1,X2,…,Xn is denoted as In(θ) and it is defined as:
In(θ)=Var[∂∂θlogL(θ)]=E[(∂∂θlogL(θ))2]=−E[∂2∂θ2logL(θ)]
So, In(θ)=nI(θ), hence proved.
Result 2:
Show that for any statistic T, In(θ)≥IT(θ).
Proof:
Let X1,X2,…,Xn be a random sample from the distribution of random variable x, and T=T(X1,X2,…,Xn) be any statistic for the parameter θ, and g(t,θ) is the probability function.
The Fisher information function (or the amount of information) about parameter θ contained in statistic T is given by IT(θ) and is given as:
IT(θ)=Var[∂∂θlogg(t,θ)]=E[(∂∂θlogg(t,θ))2]=−E[∂2∂θ2logg(t,θ)]
Now, consider the joint probability function or the likelihood function of random variables X1,X2,…,Xn.
L(θ)=f(X1,X2,…,Xn,θ)
L(θ)=g(t,θ)⋅h(x)
Taking the logarithm of L(θ):
logL(θ)=log(g(t,θ)⋅h(x))
logL(θ)=log(g(t,θ))+log(h(x))
Differentiating with respect to θ:
∂∂θlogL(θ)=∂∂θlog(g(t,θ))+∂∂θlog(h(x))
∂∂θlogL(θ)≥∂∂θlog(g(t,θ))+0(ii)
Var[∂∂θlogL(θ)]≥Var[∂∂θlog(g(t,θ))]
In(θ)≥IT(θ)
Remark:
If T is a sufficient statistic for θ, then In(θ)=IT(θ).
Example 1: Fisher Information Function for Exponential Distribution
Let X1,X2,…,Xn be a random sample from the Exponential distribution with parameter 1θ. The probability density function is:
f(x,θ)={1θe−xθ,x≥0,θ>00,otherwise
Likelihood Function
The likelihood function of the sample X1,X2,…,Xn is given by:
L(θ)=∏f(x,θ)=(1θ)ne−∑xθ
Taking the logarithm on both sides:
logL(θ)=−nlog(θ)−∑xθ
Derivative with Respect to θ
Differentiating with respect to θ:
∂∂θlogL(θ)=−nθ+∑xθ2
Second derivative with respect to θ:
∂2∂θ2logL(θ)=nθ2−2∑xθ3
Fisher Information Function
By definition of the Fisher information function:
In(θ)=−E[∂2∂θ2logL(θ)]=−E[nθ2−2∑xθ3]=−E(n)θ2+2E(∑x)θ3
Since E(n)=n and E(∑x)=nθ, we have:
In(θ)=−nθ2+2nθθ3=−nθ2+2nθ2=nθ2
Answer:
So, In(θ)=nθ2.
Example 2: Fisher Information Function for Poisson Distribution
Let X1,X2,…,Xn be a random sample from the Poisson distribution with parameter λ. The probability mass function is:
f(x,θ)={e−λλxx!,x=0,1,2,…,λ>00,otherwise
Likelihood Function
The likelihood function is given by:
L(θ)=∏f(x,θ)=(e−nλλ∑x∏x!)
Taking the logarithm on both sides:
logL(θ)=−nλ+∑xlogλ−∑logx
Derivative with Respect to λ
Differentiating with respect to λ:
∂∂λlogL(θ)=−n+∑xλ
Second derivative with respect to λ:
∂2∂λ2logL(θ)=−∑xλ2
Fisher Information Function
By definition of the Fisher information function:
In(θ)=−E[∂2∂λ2logL(θ)]=−E[−∑xλ2]=E(∑x)λ2
Since E(∑x)=nλ, we have:
In(θ)=nλλ2=nλ
Answer:
So, In(θ)=nλ.
Example3:
Let X1,X2,…,Xn be a random sample from the Normal distribution with μ and σ2, then the probability density function is:
f(x,θ)=1σ√2πe−12σ2(x−μ)2,−∞≤x,μ≤∞,σ2>0
f(x,θ)={1σ√2πe−12σ2(x−μ)2,-∞≤x,μ≥∞,σ^2>00,otherwise
1. I(μ)
We have:
logf(x,θ)=log(1σ√2πe−12σ2(x−μ)2)=−log(√2π)−log(σ)−12σ2(x−μ)2(i)
Differentiating with respect to μ:
∂∂μlogf(x,θ)=∂∂μ(−log(√2π)−log(σ)−12σ2(x−μ)2)=0−0−−22σ2(x−μ)(−1)=x−μσ2
Second derivative with respect to μ:
∂2∂μ2logf(x,θ)=−1σ2
So, I(μ)=−E[∂2∂μ2logf(x,θ)]=−E[−1σ2]=1σ2.
2. I(σ)
Differentiating equation (i) with respect to σ:
∂∂σlogf(x,θ)=∂∂σ(−log(√2π)−log(σ)−12σ2(x−μ)2)=0−1σ−−22σ3(x−μ)2=1σ−(x−μ)2σ3
Second derivative with respect to σ:
∂2∂σ2logf(x,θ)=1σ2−3(x−μ)2σ4
So, I(σ)=−E[∂2∂σ2logf(x,θ)]=−E[1σ2−3(x−μ)2σ4]=2σ2.
3. I(σ2)
Let θ=σ2, and rewrite equation (i):
logf(x,θ)=−log(√2π)−12log(θ)−12θ(x−μ)2
Differentiating with respect to θ:
∂∂θlogf(x,θ)=∂∂θ(−log(√2π)−12log(θ)−12θ(x−μ)2)=0−12θ−−12θ2(x−μ)2=12θ−(x−μ)22θ2
Second derivative with respect to θ:
∂2∂θ2logf(x,θ)=12θ2−(x−μ)2θ3
So, I(σ2)=−E[∂2∂θ2logf(x,θ)]=−E[12θ2−(x−μ)2θ3]=12θ2.
I(μ)=1σ2
I(σ)=2σ2
I(σ2)=12σ4
Cramer-Rao Inequality
Regularity Conditions:
- The parameter space Θ is an open interval.
- The support or range of the distribution is independent of θ.
- For every x and θ, ∂∂θf(x,θ) and ∂2∂θ2f(x,θ) exist and are finite.
- The statistic T has finite mean and variance.
- Differentiation and integration are permissible, i.e., ∂∂θ∫TL(x,θ)dx=∫∂∂θTL(x,θ)dx.
Cramer-Rao Inequality
Regularity Conditions:
- The parameter space Θ is an open interval.
- The support or range of the distribution is independent of θ.
- For every x and θ, ∂∂θf(x,θ) and ∂2∂θ2f(x,θ) exist and are finite.
- The statistic T has finite mean and variance.
- Differentiation and integration are permissible, i.e., ∂∂θ∫TL(x,θ)dx=∫∂∂θTL(x,θ)dx.
Cramer-Rao Inequality Statement:
Let X1,X2,…,Xn be a random sample from any probability density function (p.d.f.) or probability mass function (p.m.f.) f(x,θ), where θ∈Θ. If T=T(X1,X2,…,Xn) is an unbiased estimator of ϕ(θ) under regularity conditions, then:
Var(T)≥(∂∂θϕ(θ))2In(θ)orVar(T)≥(ϕ′(θ))2nI(θ)
Proof:
Let x be a random variable following the p.d.f. or p.m.f. f(x,θ), θ∈Θ, and L(θ) is the likelihood function of a random sample X1,X2,…,Xn from the distribution. Then:
L(θ)=∏f(x,θ)=f(X1,X2,…,Xn,θ)
∫L(θ)dx=1
∂∂θ∫L(θ)dx=0
∫∂∂θL(θ)dx=0
∫1L⋅∂∂θL(θ)Ldx=0
∫∂∂θlogL(θ)Ldx=0
E[∂∂θlogL(θ)]=0…ii
And we know that T is an unbiased estimator of ϕ(θ), such that:
E(T(X))=ϕ(θ)
∫T⋅L(θ)dx=ϕ(θ)
Now, differentiating with respect to θ:
∂∂θ∫T⋅L(θ)dx=∂∂θϕ(θ)
∫∂∂θT⋅L(θ)dx=ϕ′(θ)
∫1L⋅∂∂θL(θ)⋅TLdx=ϕ′(θ)
∫∂∂θlogL(θ)⋅T⋅Ldx=ϕ′(θ)
E[∂∂θlogL(θ)⋅T]=ϕ′(θ)…iii
And we have:
COV(∂∂θlogL(θ)⋅T)=E(∂∂θlogL(θ)⋅T)−0⋅E(T)
COV(∂∂θlogL(θ)⋅T)=ϕ′(θ)…iv
By Cauchy-Schwarz inequality for covariance, we have:
(COV(∂∂θlogL(θ)⋅T))2≤Var(∂∂θlogL(θ))⋅Var(T)
(ϕ′(θ))2≤In(θ)⋅Var(T)
Var(T)≥(ϕ′(θ))2In(θ)
This is the lower bound given by Cramer-Rao inequality, known as Cramer-Rao Lower Bound.
Remark: If the estimator T is unbiased, then ϕ(θ)=θ and ϕ′(θ)=1. In this case, Cramer-Rao Lower Bound is:
Var(T)≥1In(θ)
Comments
Post a Comment