Component
Faculty of Science
Description
The importance of statistical science in the process of scientific discovery and industrial advancement is that it enables the formulation of inferences concerning phenomena of interest, to which risks of error or degrees of confidence can be associated. The calculation of these risks of error is based on probability theory, but the principles and methods for associating these risks with inferences constitute a theoretical corpus that serves as the basis for all statistical methodologies.
This module is intended to provide a fairly comprehensive presentation of these basic principles and of the mathematical tools, results and theorems used in inferential statistics. It covers the notions of point and interval estimation, hypothesis testing and fundamental concepts such as exponential families, the maximum likelihood principle and the use of p-value.
To implement certain applications, we will present the tools adapted from R software.
Objectives
At the end of this module, the student should be able to develop optimal statistical methodologies for estimation and hypothesis testing in certain families of parametric probability laws. They will need to understand the limits of the inferences produced and be able to communicate them to users. When faced with small datasets, he or she should be able to choose the best approach and perform the necessary calculations using R software.
Teaching hours
- Inferential statistics - CMLecture21h
- Inferential statistics - TDTutorial21h
Necessary prerequisites
A Bachelor's-level probability calculus course.
Recommended prerequisites: A Bachelor's-level course in descriptive statistics would be an asset.
Knowledge control
CC + CT with the formula: final grade = max(CT, (CC+CT)/2)
Syllabus
1. Parametric statistical model
a) Parametric statistical model;
b) iid sampling model ;
c) Asymptotic theorems (LGN, TCL, Delta-method).
d) Notion of statistics - empirical characteristics of a sample & asymptotic laws.
2. Exponential family
a) Definition
b) Moments.
3. Fisher score and information
a) Score ;
b) Fisher information ;
c) The case of the exponential family.
4. Comprehensive statistics
a) Completeness & characterization
b) Minimum exhaustive statistics; Complete statistics.
5. Point estimate
a) Risk. Quadratic risk = bias2 + variance. Order on estimators. No optimal estimator.
b) Unbiased estimator: Fréchet inequality. Efficient estimation & exponential family. Rao improvement. Optimal BSE & Lehmann-Scheffe theorem.
c) Maximum likelihood estimation, asymptotic properties.
d) Estimation by the method of moments, asymptotic properties.
6. Ensemblistic estimation
a) Trust region.
b) Pivot.
c) Asymptotic confidence region.
7. Hypothesis testing
a) Test problem: hypotheses, errors, losses, associated risks, level and power. Test function. Pure vs. mixed testing.
b) No optimal test. Unbiased test. Convergent test.
c) Neyman's principle.
d) Simple hypothesis test: Neyman's PP test.
e) One-sided hypothesis tests: monotone likelihood ratio family & UPP tests.
f) Two-way hypothesis testing: exponential family & UPPSB tests.
g) Link between test acceptance regions and confidence regions.
h) Asymptotic tests: Wald test, Rao score test, maximum likelihood ratio test.
8. Two-sample problems
Parameter comparison: estimation and testing.
9. Suitability tests
a) Chi2 test & application to the test of independence.
b) Kolmogorov-Smirnov test.
c) Shapiro-Wilks normality tests.
Further information
Timetable:
CM: 21h
TD: 21h
TP:
Field :