• ECTS

    5 credits

  • Component

    Faculty of Science

Description

Statistical data are becoming more and more massive. Before modeling them, it is essential to explore them and to reduce their dimension while losing as little information as possible. This is the objective of this course on multidimensional exploratory statistics. From a methodological point of view, the tools used are essentially those of Euclidean geometry. The statistical problems and notions will be translated into the language of Euclidean geometry before being treated in this framework. The two families of exploratory methods that will be seen in this course are:

1) automatic classification methods, which group observations into classes and reduce the disparity of the observations to the disparity between these classes;

2) component analysis methods, which search for the main directions of disparity between observations and provide interpretable images of this disparity in reduced dimension.

Read more

Objectives

Bridge the gap between Euclidean geometry and multidimensional exploratory statistics. Build a comprehensive skill set in exploring big data tables and analyzing them prior to statistical modeling.

Read more

Necessary pre-requisites

Course in Euclidean geometry, normal vector spaces and reduction of endomorphisms.

 

 

Recommended prerequisites: Courses in univariate and bivariate descriptive statistics. Good command of matrix calculus.

Read more

Knowledge control

Continuous assessment (homework / mini-projects) + final assessment

Read more

Syllabus

I - Introduction :

a) Multidimensional data, observations, variables, encodings; b) Translations into point clouds in Euclidean metric spaces. c) Need for dimensional reduction: components / classes.

II - Geometric writings of statistical quantities

Univariate description:

a) Average, frequency,

b) Variance and standard deviation.

c) Centering and reduction of a variable.

Bivariate relationships:

a) Bivariate linkage & conditioning.

b) Covariance and correlation of two quantitative variables.

c) R2 analysis of variance of a quantitative variable on a qualitative variable. d) Phi2 and T2 of two qualitative variables. e) Unified writing of links. f) Limits of bivariate & how to overcome it.

III - Automatic classification

Dissembling and resemblance.

a) Measurements.

b) Partial vs. global similarity.

Partial similarity: logical/conceptual classification by Galois lattice.

Overall similarity:

a) Partitioning in metric space: K-means method & refinements.

b) Hierarchical classification: indices, CAH algorithm, criteria for choosing partitions.

c) Mixed classification.

d) Interpretation of the classes.

e) Classification on variables.

IV - Principal component analyses

Standard PCA

a) Cloud of individuals, inertia and direct PCA.

b) Variable cloud, inertia and dual PCA.

c) Duality relations and joint interpretation of the graphs.

d) Additional elements & duality relationship.

e) The first component as an estimate of a continuous latent variable.

General PCA (with any metric)

a) Line cloud and direct PCA.

b) Application to multidimensional scaling.

c) Which PCA of the columns, for which duality relations?

d) Interpretation aids.

e) Additional elements & duality relationship.

f) Reconstruction formula (decomposition into singular elements).

Binary Correspondence Analysis

a) Phi2 as direct and dual inertia of profile-line and profile-column clouds.

b) Which metrics for which duality relations: barycentric positionings.

c) Interpretation of the graphs attached.

d) Guttman effect. E) Additional elements.

Multiple correspondence analysis.

a) Application of the CBA to a complete disjunctive logic array.

b) Application of CBA to a Burt table; equivalence.

c) Barycentric relations between individuals and modalities. Barycentric relations between modalities.

d) Guttman effect. e) Additional elements.

e) The first component as an estimate of a continuous latent variable.

The practice of WMD

a) Complementarity of FA and CA.

b) How to conduct a good WMD.

Read more

Additional information

Hourly volumes:

            CM : 21

            TD : 21

            TP : 0

            Land : 0

Read more