Training structure
Faculty of Science
Description
The importance of statistical science in the process of scientific discovery and industrial advancement lies in its ability to formulate inferences about phenomena of interest that can be associated with risks of error or degrees of confidence. The calculation of these risks of error is based on probability theory, but the principles and methods used to associate these risks with inferences constitute a theoretical corpus that serves as the basis for all statistical methodologies.
This module aims to provide a fairly comprehensive presentation of these basic principles and the mathematical tools, results, and theorems used in inferential statistics. It develops the concepts of point and interval estimation, hypothesis testing, and fundamental concepts such as exponential families, the maximum likelihood principle, and the use of p-values.
For the implementation of certain applications, the appropriate tools from the R software will be presented.
Objectives
At the end of this module, students will be able to develop optimal statistical methodologies for estimation and hypothesis testing in certain families of parametric probability distributions. They will understand the limitations of the inferences produced and be able to communicate them to users. When dealing with small data sets, they will be able to choose the best approach in a reasoned manner and perform the necessary calculations using the R software.
Mandatory prerequisites
A bachelor's level probability theory course.
Recommended prerequisites: A bachelor's degree-level course in descriptive statistics would be an asset.
Knowledge assessment
CC + CT with the formula: final grade = max(CT, (CC+CT)/2)
Syllabus
1. Parametric statistical model
a) Parametric statistical model;
b) iid sampling model;
c) Reminders on asymptotic theorems (LGN, TCL, Delta method).
d) Concept of statistics - empirical characteristics of a sample & asymptotic laws.
2. Exponential family
a) Definition
b) Moments.
3. Fisher's score and information
a) Score;
b) Information from Fisher;
c) Case of the exponential family.
4. Comprehensive statistics
a) Completeness & characterizations
b) Minimum comprehensive statistics; Complete statistics.
5. Point estimate
a) Risk. Quadratic risk = bias² + variance. Order on estimators. Absence of optimal estimator.
b) Unbiased estimator: Fréchet's inequality. Efficient estimation & exponential family. Rao's improvement. Optimal ESB & Lehmann-Scheffe theorem.
c) Maximum likelihood estimation, asymptotic properties.
d) Estimation using the method of moments, asymptotic properties.
6. Ensemble estimation
a) Trusted region.
b) Pivot.
c) Asymptotic confidence region.
7. Hypothesis testing
a) Test problem: assumptions, errors, losses, associated risks, level, and power. Test function. Pure test vs. mixed test.
b) Absence of optimal test. Unbiased test. Convergent test.
c) Neyman's principle.
d) Simple hypothesis testing: Neyman's PP test.
e) One-tailed hypothesis tests: monotonic likelihood ratio family & UPP tests.
f) Two-tailed hypothesis tests: exponential family & UPPSB tests.
g) Link between test acceptance regions and confidence regions.
h) Asymptotic tests: Wald test, Rao score test, likelihood ratio test.
8. Two-sample problems
Comparison of parameters: estimation and testing.
9. Suitability tests
a) Chi-square test & application to the independence test.
b) Kolmogorov-Smirnov test.
c) Shapiro-Wilks normality tests.
Additional information
Hours:
CM: 21 hours
TD: 21 hours
TP:
Fieldwork: