ECTS
5 credits
Component
Faculty of Science
Description
The size of statistical data is constantly increasing, and so is the richness of the description of statistical units. However, classical linear statistical modeling becomes invalid in high dimensions, i.e. when the number of variables exceeds the number of statistical units. This course presents the most common techniques used to regularize linear models in high dimension.
Objectives
Training in uni- and multivariate high-dimensional linear modeling, i.e. various techniques for regularizing classical linear modeling.
Necessary prerequisites
Course in multidimensional data analysis (PCA & CA). Courses in Euclidean geometry, normed vector spaces and reduction of endomorphisms.
Recommended prerequisites: Univariate and bivariate descriptive statistics. Good command of matrix calculus.
Syllabus
Introduction
Large-scale data. Dimensional reduction and regularization.
I - Regularized linear modeling of a continuous variable.
1. The classic linear model.
a) Express reminders.
b) Failures due to collinearities.
2. Principal component regression.
a) Method.
b) Qualities and defects.
3. PLS regression.
a) Rank 1 criteria and program.
b) Criteria and program for subsequent ranks.
c) Why PLS regularizes.
d) Choice of number of components for prediction.
e) Continuum metric between OLS and PLS.
4. Penalized linear regressions.
a) Ridge regression.
b) LASSO.
c) Elastic net.
II - Regularized linear modeling of a group of continuous variables.
1. The multivariate linear Gaussian model
a) The classic model.
b) The penalized model.
c) The MANOVA model.
2. Multivariate PLS regression.
a) Rank 1 criterion and program with any metric.
b) Special cases: canonical analysis, PCA on Instrumental Variables, PLS2 regression.
c) Criteria and program for subsequent ranks.
d) Prediction: choosing the optimum number of components.
e) Continuum metrics between Canonical Analysis, PCAVI and PLS.
III - Linear modeling of a nominal variable: linear discriminant analyses.
1. Discriminant factor analysis
a) Criteria and program.
b) Components and discriminating powers.
2. PLS discriminant analysis.
a) Criteria and program.
b) Components and discriminating powers.
c) Barycentric discriminant analysis.
3. Decision-making aspects.
a) Decision (classification), losses, decision rules (allocation), risks.
b) Choosing the right number of components for the decision.
Further information
Hourly volumes :
CM: 21
TD :
TP :
Terrain :