Processing math: 100%
Skip to main content

Statistical Inference ( Unit 2: Cramer-Rao Inequality, Method Of Moment, Maximum Likelihood Estimator)

Changing Color Blog Name
Shr

Statistical Inference I: (Cramer-Rao Inequality, Method Of Moment, Maximum Likelihood Estimator)

I. Introduction

We see in unbiased estimator from two distinct unbiased estimators give infinitely many unbiased estimators of θ, among these estimators we find the best estimator for parameter θ by comparing their variance or mean square errors. But in some examples, we see that the number of estimators is possible as.

For Normal distribution:

If X1, X2, ........Xn. random sample from a normal distribution with mean 𝛍 and variance 𝛔², then T1 = x̄, T2 = Sample median, both are unbiased estimators for parameter 𝛍. Now we find a sufficient estimator; therefore, T1 is a sufficient estimator for 𝛍, hence it is the best estimator for parameter 𝛍.

Thus, for finding the best estimator, we check if the estimator is sufficient or not.

Now we are interested in finding the variance of the best estimator. This can be discussed in this article.

II. Properties of Probability Mass Function (p.m.f.) or Probability Density Function (p.d.f)

If X1, X2, ........Xn. are from any p.d.f or p.m.f f(x,θ), θ ∈ Θ, following are the properties of P.D.F. OR P.M.F:

Properties of Fisher Information Function

1. f(x,θ)dx=1

Proof: We have

f(x,θ)dx=1 from 1

2. θf(x,θ)dx=0

Proof: We have

θf(x,θ)dx=0 from 2

3. E[θlogf(x,θ)]=0

Proof: We have

E[θlogf(x,θ)]=0 from 3

4. E[2θ2logf(x,θ)]=E[logf(x,θ)θ]2

Proof: We have

E[2θ2logf(x,θ)]=E[logf(x,θ)θ]2 from 4

5. Var[θlogf(x,θ)]=E[2θ2logf(x,θ)]=E[θlogf(x,θ)]2

Proof: We have

Var(θlogf(x,θ)) = E[θlogf(x,θ)]2(E[θlogf(x,θ)])2

Var(θlogf(x,θ)) = E[θlogf(x,θ)]20 from 3

Var(θlogf(x,θ)) = E[θlogf(x,θ)]2=E[2θ2logf(x,θ)] from 4

Fisher Information Function:

Definition:

  1. Fisher Information Measure (or the amount of information) about parameter θ obtained in the random variable x is denoted as I(θ) and is defined as
  2. I(θ)=Var[θlogf(x,θ)]=E[(θlogf(x,θ))2]=E[2θ2logf(x,θ)]

  3. Fisher Information Measure (or the amount of information) about parameter θ obtained in the random variable X1,X2,,Xn of size n is denoted as In(θ) and is defined as
  4. In(θ)=Var[θlogL(θ)]=E[(θlogL(θ))2]=E[2θ2logL(θ)]

  5. Let X1,X2,,Xn be a random sample from the distribution of random variable x, and T=T(X1,X2,,Xn) be any statistic for the parameter θ, and g(t,θ) is the probability function, then the Fisher information function (or the amount of information) about parameter θ contained in statistic T is given by IT(θ) and is given as
  6. IT(θ)=Var[θlogg(t,θ)]=E[(θlogg(t,θ))2]=E[2θ2logg(t,θ)]

Properties of Fisher Information Function

Result 1:

Let X1,X2,,Xn be a random sample from the distribution f(x,θ), then In(θ)=nI(θ).

Proof:

Let X1,X2,,Xn be a random sample from the distribution f(x,θ), θΘ.

The Fisher Information Measure (or the amount of information) about parameter θ obtained in random variable x is denoted as I(θ) and is defined as:

I(θ)=Var[θlogf(x,θ)]=E[(θlogf(x,θ))2]=E[2θ2logf(x,θ)]

Now consider the joint probability function or the likelihood function of random variables X1,X2,,Xn, then L(θ)=f(X1,X2,,Xn,θ).

L(θ)=f(x,θ)

Taking the logarithm on both sides:

logL(θ)=log(f(x,θ))

logL(θ)=logf(x,θ)

Differentiating with respect to θ:

θlogL(θ)=θlogf(x,θ)(i)

We have Fisher Information Measure (or the amount of information) about parameter θ obtained in random variable X1,X2,,Xn is denoted as In(θ) and it is defined as:

In(θ)=Var[θlogL(θ)]=E[(θlogL(θ))2]=E[2θ2logL(θ)]

So, In(θ)=nI(θ), hence proved.

Result 2:

Show that for any statistic T, In(θ)IT(θ).

Proof:

Let X1,X2,,Xn be a random sample from the distribution of random variable x, and T=T(X1,X2,,Xn) be any statistic for the parameter θ, and g(t,θ) is the probability function.

The Fisher information function (or the amount of information) about parameter θ contained in statistic T is given by IT(θ) and is given as:

IT(θ)=Var[θlogg(t,θ)]=E[(θlogg(t,θ))2]=E[2θ2logg(t,θ)]

Now, consider the joint probability function or the likelihood function of random variables X1,X2,,Xn.

L(θ)=f(X1,X2,,Xn,θ)

L(θ)=g(t,θ)h(x)

Taking the logarithm of L(θ):

logL(θ)=log(g(t,θ)h(x))

logL(θ)=log(g(t,θ))+log(h(x))

Differentiating with respect to θ:

θlogL(θ)=θlog(g(t,θ))+θlog(h(x))

θlogL(θ)θlog(g(t,θ))+0(ii)

Var[θlogL(θ)]Var[θlog(g(t,θ))]

In(θ)IT(θ)

Remark:

If T is a sufficient statistic for θ, then In(θ)=IT(θ).

Example 1: Fisher Information Function for Exponential Distribution

Let X1,X2,,Xn be a random sample from the Exponential distribution with parameter 1θ. The probability density function is:

f(x,θ)={1θexθ,x0,θ>00,otherwise

Likelihood Function

The likelihood function of the sample X1,X2,,Xn is given by:

L(θ)=f(x,θ)=(1θ)nexθ

Taking the logarithm on both sides:

logL(θ)=nlog(θ)xθ

Derivative with Respect to θ

Differentiating with respect to θ:

θlogL(θ)=nθ+xθ2

Second derivative with respect to θ:

2θ2logL(θ)=nθ22xθ3

Fisher Information Function

By definition of the Fisher information function:

In(θ)=E[2θ2logL(θ)]=E[nθ22xθ3]=E(n)θ2+2E(x)θ3

Since E(n)=n and E(x)=nθ, we have:

In(θ)=nθ2+2nθθ3=nθ2+2nθ2=nθ2

Answer:

So, In(θ)=nθ2.

Example 2: Fisher Information Function for Poisson Distribution

Let X1,X2,,Xn be a random sample from the Poisson distribution with parameter λ. The probability mass function is:

f(x,θ)={eλλxx!,x=0,1,2,,λ>00,otherwise

Likelihood Function

The likelihood function is given by:

L(θ)=f(x,θ)=(enλλxx!)

Taking the logarithm on both sides:

logL(θ)=nλ+xlogλlogx

Derivative with Respect to λ

Differentiating with respect to λ:

λlogL(θ)=n+xλ

Second derivative with respect to λ:

2λ2logL(θ)=xλ2

Fisher Information Function

By definition of the Fisher information function:

In(θ)=E[2λ2logL(θ)]=E[xλ2]=E(x)λ2

Since E(x)=nλ, we have:

In(θ)=nλλ2=nλ

Answer:

So, In(θ)=nλ.

Example3:

Let X1,X2,,Xn be a random sample from the Normal distribution with μ and σ2, then the probability density function is:

f(x,θ)=1σ2πe12σ2(xμ)2,x,μ,σ2>0

f(x,θ)={1σ2πe12σ2(xμ)2,-∞≤x,μ≥∞,σ^2>00,otherwise

1. I(μ)

We have:

logf(x,θ)=log(1σ2πe12σ2(xμ)2)=log(2π)log(σ)12σ2(xμ)2(i)

Differentiating with respect to μ:

μlogf(x,θ)=μ(log(2π)log(σ)12σ2(xμ)2)=0022σ2(xμ)(1)=xμσ2

Second derivative with respect to μ:

2μ2logf(x,θ)=1σ2

So, I(μ)=E[2μ2logf(x,θ)]=E[1σ2]=1σ2.

2. I(σ)

Differentiating equation (i) with respect to σ:

σlogf(x,θ)=σ(log(2π)log(σ)12σ2(xμ)2)=01σ22σ3(xμ)2=1σ(xμ)2σ3

Second derivative with respect to σ:

2σ2logf(x,θ)=1σ23(xμ)2σ4

So, I(σ)=E[2σ2logf(x,θ)]=E[1σ23(xμ)2σ4]=2σ2.

3. I(σ2)

Let θ=σ2, and rewrite equation (i):

logf(x,θ)=log(2π)12log(θ)12θ(xμ)2

Differentiating with respect to θ:

θlogf(x,θ)=θ(log(2π)12log(θ)12θ(xμ)2)=012θ12θ2(xμ)2=12θ(xμ)22θ2

Second derivative with respect to θ:

2θ2logf(x,θ)=12θ2(xμ)2θ3

So, I(σ2)=E[2θ2logf(x,θ)]=E[12θ2(xμ)2θ3]=12θ2.

I(μ)=1σ2

I(σ)=2σ2

I(σ2)=12σ4

Cramer-Rao Inequality

Regularity Conditions:

  • The parameter space Θ is an open interval.
  • The support or range of the distribution is independent of θ.
  • For every x and θ, θf(x,θ) and 2θ2f(x,θ) exist and are finite.
  • The statistic T has finite mean and variance.
  • Differentiation and integration are permissible, i.e., θTL(x,θ)dx=θTL(x,θ)dx.

Cramer-Rao Inequality

Regularity Conditions:

  • The parameter space Θ is an open interval.
  • The support or range of the distribution is independent of θ.
  • For every x and θ, θf(x,θ) and 2θ2f(x,θ) exist and are finite.
  • The statistic T has finite mean and variance.
  • Differentiation and integration are permissible, i.e., θTL(x,θ)dx=θTL(x,θ)dx.

Cramer-Rao Inequality Statement:

Let X1,X2,,Xn be a random sample from any probability density function (p.d.f.) or probability mass function (p.m.f.) f(x,θ), where θΘ. If T=T(X1,X2,,Xn) is an unbiased estimator of ϕ(θ) under regularity conditions, then:

Var(T)(θϕ(θ))2In(θ)orVar(T)(ϕ(θ))2nI(θ)

Proof:

Let x be a random variable following the p.d.f. or p.m.f. f(x,θ), θΘ, and L(θ) is the likelihood function of a random sample X1,X2,,Xn from the distribution. Then:

L(θ)=f(x,θ)=f(X1,X2,,Xn,θ)

L(θ)dx=1

θL(θ)dx=0

θL(θ)dx=0

1LθL(θ)Ldx=0

θlogL(θ)Ldx=0

E[θlogL(θ)]=0ii

And we know that T is an unbiased estimator of ϕ(θ), such that:

E(T(X))=ϕ(θ)

TL(θ)dx=ϕ(θ)

Now, differentiating with respect to θ:

θTL(θ)dx=θϕ(θ)

θTL(θ)dx=ϕ(θ)

1LθL(θ)TLdx=ϕ(θ)

θlogL(θ)TLdx=ϕ(θ)

E[θlogL(θ)T]=ϕ(θ)iii

And we have:

COV(θlogL(θ)T)=E(θlogL(θ)T)0E(T)

COV(θlogL(θ)T)=ϕ(θ)iv

By Cauchy-Schwarz inequality for covariance, we have:

(COV(θlogL(θ)T))2Var(θlogL(θ))Var(T)

(ϕ(θ))2In(θ)Var(T)

Var(T)(ϕ(θ))2In(θ)

This is the lower bound given by Cramer-Rao inequality, known as Cramer-Rao Lower Bound.

Remark: If the estimator T is unbiased, then ϕ(θ)=θ and ϕ(θ)=1. In this case, Cramer-Rao Lower Bound is:

Var(T)1In(θ)

Comments

Popular posts from this blog

MCQ'S based on Basic Statistics (For B. Com. II Business Statistics)

    (MCQ Based on Probability, Index Number, Time Series   and Statistical Quality Control Sem - IV)                                                            1.The control chart were developed by ……         A) Karl Pearson B) R.A. fisher C) W.A. Shewhart D) B. Benjamin   2.the mean = 4 and variance = 2 for binomial r.v. x then value of n is….. A) 7 B) 10 C) 8 D)9   3.the mean = 3 and variance = 2 for binomial r.v. x then value of n is….. A) 7 B) 10 C) 8 D)9 4. If sampl...

Measures of Central Tendency :Mean, Median and Mode

Changing Color Blog Name  Measures of Central Tendency  I. Introduction. II. Requirements of good measures. III. Mean Definition. IV . Properties  V. Merits and Demerits. VI. Examples VII.  Weighted Arithmetic Mean VIII. Median IX. Quartiles I. Introduction Everybody is familiar with the word Average. and everybody are used the word average in daily life as, average marks, average of bike, average speed etc. In real life the average is used to represent the whole data, or it is a single figure is represent the whole data. the average value is lies around the centre of the data. consider the example if we are interested to measure the height of the all student and remember the heights of all student, in that case there are 2700 students then it is not possible to remember the all 2700 students height so we find out the one value that represent the height of the all 2700 students in college. therefore the single value represent ...

Business Statistics Notes ( Meaning, Scope, Limitations of statistics and sampling Methods)

  Business Statistics Paper I Notes. Welcome to our comprehensive collection of notes for the Business Statistics!  my aim is to provided you  with the knowledge you need as you begin your journey to comprehend the essential ideas of this subject. Statistics is a science of collecting, Presenting, analyzing, interpreting data to make informed business decisions. It forms the backbone of modern-day business practices, guiding organizations in optimizing processes, identifying trends, and predicting outcomes. I will explore several important topics through these notes, such as: 1. Introduction to Statistics. :  meaning definition and scope of  Statistics. 2. Data collection methods. 3. Sampling techniques. 4. Measures of  central tendency : Mean, Median, Mode. 5. Measures of Dispersion : Relative and Absolute Measures of dispersion,  Range, Q.D., Standard deviation, Variance. coefficient of variation.  6.Analysis of bivariate data: Correlation, Regr...

Classification, Tabulation, Frequency Distribution, Diagrams & Graphical Presentation.

Business Statistics I    Classification, Tabulation, Frequency Distribution ,  Diagrams & Graphical Presentation. In this section we study the following point : i. Classification and it types. ii. Tabulation. iii. Frequency and Frequency Distribution. iv. Some important concepts. v. Diagrams & Graphical Presentation   I. Classification and it's types:        Classification:- The process of arranging data into different classes or groups according to their common  characteristics is called classification. e.g. we dividing students into age, gender and religion. It is a classification of students into age, gender and religion.  Or  Classification is a method used to categorize data into different groups based on the values of specific variable.  The purpose of classification is to condenses the data, simplifies complexities, it useful to comparison and helps to analysis. The following are some criteria to classi...

Measures of Dispersion : Range , Quartile Deviation, Standard Deviation and Variance.

Measures of Dispersion :  I.  Introduction. II. Requirements of good measures. III. Uses of Measures of Dispersion. IV.  Methods Of Studying Dispersion:     i.  Absolute Measures of Dispersions :             i. Range (R)          ii. Quartile Deviation (Q.D.)          iii. Mean Deviation (M.D.)         iv. Standard Deviation (S. D.)         v. Variance    ii.   Relative Measures of Dispersions :              i. Coefficient of Range          ii. Coefficient of Quartile Deviation (Q.D.)          iii. Coefficient of Mean Deviation (M.D.)         iv. Coefficient of Standard Deviation (S. D.)         v. Coefficien...

Basic Concepts of Probability and Binomial Distribution , Poisson Distribution.

 Probability:  Basic concepts of Probability:  Probability is a way to measure hoe likely something is to happen. Probability is number between 0 and 1, where probability is 0 means is not happen at all and probability is 1 means it will be definitely happen, e.g. if we tossed coin there is a 50% chance to get head and 50% chance to get tail, it can be represented in probability as 0.5 for each outcome to get head and tail. Probability is used to help us taking decision and predicting the likelihood of the event in many areas, that are science, finance and Statistics.  Now we learn the some basic concepts that used in Probability:  i) Random Experiment OR Trail: A Random Experiment is an process that get one or more possible outcomes. examples of random experiment include tossing a coin, rolling a die, drawing  a card from pack of card etc. using this we specify the possible outcomes known as sample pace.  ii)Outcome: An outcome is a result of experi...

Statistical Inference I ( Theory of estimation : Efficiency)

🔖Statistical Inference I ( Theory of estimation : Efficiency)  In this article we see the  terms:  I. Efficiency. II. Mean Square Error. III. Consistency. 📚 Efficiency:  We know that  two unbiased estimator of parameter gives rise to infinitely many unbiased estimators of parameter. there if one of parameter have two estimators then the problem is to choose one of the best estimator among the class of unbiased estimators. in that case we need to some other criteria to to find out best estimator. therefore, that situation  we check the variability of that estimator, the measure of variability of estimator T around it mean is Var(T). hence If T is an Unbiased estimator of parameter then it's variance gives good precision. the variance is smaller then it give's greater precision. 📑 i. Efficient estimator: An estimator T is said to be an Efficient Estimator of 𝚹, if T is unbiased estimator of    𝛉. and it's variance is less than any other estima...

The Power of Statistics: A Gateway to Exciting Opportunities

  My Blog The Power of Statistics: A Gateway to Exciting Opportunities     Hey there, future statistician! Ever wondered how Netflix seems to know exactly what shows you'll love, how sports teams break down player performance, or how businesses figure out their pricing strategies? The answer is statistics—a fascinating field that helps us make sense of data in our everyday lives. Let's dive into why choosing statistics for your B.Sc. Part First can lead you to some exciting opportunities.     Why Statistics Matters in Everyday Life     From predicting election outcomes and analyzing social media trends to understanding consumer behavior and optimizing public transport routes, statistics are crucial. It's the backbone of modern decision-making, helping us sift through complex data to uncover meaningful insights that drive innovation and progress.   The Role of Statistics in Future Opportunities ...

Statistical Inference I ( Theory of Estimation) : Unbiased it's properties and examples

 📚Statistical Inference I Notes The theory of  estimation invented by Prof. R. A. Fisher in a series of fundamental papers in around 1930. Statistical inference is a process of drawing conclusions about a population based on the information gathered from a sample. It involves using statistical techniques to analyse data, estimate parameters, test hypotheses, and quantify uncertainty. In essence, it allows us to make inferences about a larger group (i.e. population) based on the characteristics observed in a smaller subset (i.e. sample) of that group. Notation of parameter: Let x be a random variable having distribution function F or f is a population distribution. the constant of  distribution function of F is known as Parameter. In general the parameter is denoted as any Greek Letters as θ.   now we see the some basic terms :  i. Population : in a statistics, The group of individual under study is called Population. the population is may be a group of obj...