• ECTS

    5 credits

  • Component

    Faculty of Science

Description

The size of statistical data is constantly increasing, and in particular the richness of the description of the statistical units. However, classical linear statistical modeling becomes invalid in high dimension, i.e. when the number of variables exceeds the number of statistical units. This course presents the most common techniques used to regularize linear models in high dimension.

Read more

Objectives

Training in uni- and multivariate linear modeling in high dimension, i.e., various regularization techniques of classical linear modeling.

Read more

Necessary pre-requisites

Course in multidimensional data analysis (PCA & CA). Courses in Euclidean geometry, normal vector spaces and reduction of endomorphisms.

 

 

Recommended prerequisites: Courses in univariate and bivariate descriptive statistics. Good command of matrix calculus.

Read more

Syllabus

Introduction

High dimensional data. Dimensional reduction and regularization.

I - Regularized linear modeling of a continuous variable.

  1. The classical linear model.

    a) Express reminders.

    b) Failures due to collinearities.

  2. Principal component regression.

    a) The method.

    b) Qualities and defects.

  3. PLS regression.

    a) Rank 1 criterion and program.

    b) Criteria and program for subsequent ranks.

    c) Why PLS regularizes.

    d) Choice of the number of components for the prediction.

    e) Metric of the continuum between OLS and PLS.

  4. Penalized linear regressions.

    a) Regression ridge.

    b) LASSO.

    c) Elastic net.

II - Regularized linear modeling of a group of continuous variables.

  1. The linear Gaussian multivariate model

    a) The classical model.

    b) The penalized model.

    c) The MANOVA model.

  2. Multivariate PLS regression.

    a) Criterion and rank 1 program with any metrics

    b) Special cases: canonical analysis, PCA on Instrumental Variables, PLS2 regression.

    c) Criteria and program for subsequent ranks.

    d) Prediction: choice of the optimal number of components.

    e) Metrics of the continuum between Canonical Analysis, PCAVI and PLS.

III - Linear modeling of a nominal variable: linear discriminant analyses.

  1. Discriminant factor analysis

    a) Criteria and program.

    b) Components and discriminating powers.

  2. PLS discriminant analysis.

    a) Criteria and program.

    b) Components and discriminating powers.

    c) Barycentric discriminant analysis.

  3. Decision-making aspects.

    a) Decision (ranking), losses, decision rules (assignment), risks.

    b) Choosing the right number of components for the decision.

Read more

Additional information

Hourly volumes:

            CM : 21

            TD:

            TP: 

            Terrain:

Read more