M2 - Statistics and Data Science (SSD) - BIOSTATS

  • Training structure

    Faculty of Science

Presentation

This M2 is intended for students with a M1 in Statistics and Data Science (SSD) or any other M1 in mathematics or equivalent with a strong specialization in probability and statistics.

This M2 is divided into two more specialized sub-courses, whose teaching remains partially shared.

- The first of these specializations is Biostatistics, which focuses on the analysis and modeling of living data.

- The second is Information and Decision Management (MIND), which specializes in the analysis and modeling of economic data as well as the management of decisions and associated risks.

  • The ambition of the SSD-Biostat course is to meet the expectations of M1 SSD students who are attracted to the modeling of data from life or the environment. The statistical aspects addressed in this course range from the modeling of living organisms to the most theoretical problems of statistics and stochastic modeling. The numerical aspects are extremely present in this program and require a strong taste for computer programming.

The SSD-Biostat course is a demanding course because it focuses on concepts rather than techniques. Indeed, in the field of data in the broadest sense, digital technologies, with the advent of artificial intelligence, are evolving rapidly and becoming outdated even faster. Future statistical engineers or researchers who will have to deal with data will be able to train themselves to new technologies throughout their professional life, all the better if they have had a solid initial conceptual training. The added value of the training is precisely to provide the theoretical understanding of the statistical concepts underlying automatic algorithms. Graduates must also be able to ensure a quality technological watch.

The SSD-Biostat course remains partly shared in the second year with the Information and Decision Management course (SSD-MIND). However, the SSD-BIOSTAT course has its own specialization courses in life and environmental data science, and is more oriented towards initiation to research (two courses per semester in M2).

  • The ambition of the SSD-MIND program is to meet the expectations of M1 SSD students who feel attracted to the application of data science in business. Given the great diversity of companies and their problems, this M2 program trains students in general data science, "all fields". In addition, it provides training that is more specific to the context of the company and its economic and managerial issues (economic information, financial risk management, customers, corporate strategy, etc.).

The SSD - MIND program is a mathematical engineering type of training, which places the emphasis on methodology and the perfect mastery of statistical concepts and models. The graduate of this program will be able to deal with all types of data and problems, and to design a complete and often original methodology for this problem, starting with the management and organization of data, continuing with their exploration and targeted reduction, then with the modeling of the phenomena of interest, and finally synthesizing the extracted information for decision-making purposes. He or she will also have to know how to transmit to the company the knowledge synthesized from the information extracted from the data. Each new set of data and each question asked about it is often a new problem, and the application of a standard method to these data is then inadequate. On the contrary, it is necessary to write a mathematical model adapted to these data (in the sense that it reflects their complexity in a satisfactory way) and to make it assimilable to a standard estimation method, or to design and program a more specific method. The emphasis placed by this training on the conceptual and mathematical mastery of tools guarantees graduates of this program the great capacity for adaptation and self-training required by the rapid evolution of data science.

The SSD-MIND program is partly shared in the second year with the biostatistics program (SSD-Biostat), which is more specialized in the analysis and modeling of data from the living world or the environment. The SSD-MIND program is a double program in partnership with the IAE (which provides the economics and management courses), leading to a double degree.

Read more

Objectives

Several objectives are targeted by the training.

  • To make the student capable of dealing with all types of data and problems, and of designing a complete and often original methodology to serve this problem.
  • To make the student capable of integrating very quickly into any type of company by quickly grasping the issues.
  • To bring the students who wish to do so to a theoretical level allowing them to do a doctoral thesis in statistics.
  • To train future researchers or teacher-researchers in the field of mathematics of randomness: probability or theoretical or applied statistics. After a doctorate, they will be able to join laboratories in universities, engineering schools or research organizations such as CNRS, INRAE, Inria, CIRAD, INSERM, etc. It is also possible to join a company or an industrial research laboratory directly after the M2.
  • To train specialists in the statistical processing of high-level data for research organizations or companies for which statistics are now an indispensable tool, such as pharmaceutical laboratories, epidemiological monitoring institutes, air and water quality monitoring institutes, agri-food companies, biotechnology companies, companies in the health sector (diagnostic assistance, personalized medicine), etc
Read more

Know-how and skills

  • Be able to extract relevant data
  • Pre-processing of data (cleaning and formatting if necessary)
  • Conduct exploratory data analysis using visualization and dimension reduction tools.
  • Model a problem: master the usual methods in modern data science and know how to propose the adequate method(s) for the resolution of the problem, write one or several mathematical model(s) adapted to the problem, and put them in a suitable form for processing by the usual methods of modern data science.
  • Implement the method from a computational point of view and be able to propose model selection strategies
  • Program efficiently in at least one language (python, R)
  • Know how to analyze and interpret the results, i.e., produce knowledge from the information extracted.
  • Link the knowledge produced to the decision in order to inform and optimize the latter.
  • Be able to communicate the results in writing and orally
Read more

Program

M2 is open to alternation with long periods of teaching and work experience.

First period: 7 weeks of teaching from September to the end of October.

Second period: 7 weeks in a company for alternating students or a long tutored project in a laboratory for non-alternating students from November to mid-January.

Third period: 7 weeks of teaching from mid-January to mid-March.

Fourth period: alternating work experience in a company from March to August or a 4 to 6 month internship for those who are not alternating.

Internships and tutored projects: A 7-week tutored project for non-alternates with a report and oral defense.

Internship of 4 to 6 months at the end of M2.

Read more
  • Non-parametric estimation

    5 credits
  • Generalized linear models

    5 credits
  • English

    2 credits
  • Alternation project or defense

    3 credits
  • Bayesian Statistics

    5 credits
  • Multivariate analysis

    5 credits
  • Statistical learning

    5 credits
  • Lifetime analysis

    4 credits
  • Supplement 2

    4 credits
  • Supplement 1

    4 credits
  • Internship

    14 credits
  • Latent variable models

    4 credits

Admission

How to register

Applications are made on the following platforms: 

Read more

Target audience

Target audience*: This program is intended for students who have completed the M1 Math - Statistics - Data Science (SSD) or any other equivalent M1 in mathematics with a strong specialization in probability and statistics.

Read more

Necessary pre-requisites

M1 Math - Statistics Data Science (SSD)

Read more

Recommended prerequisites

 M1 Math - Statistics Data Science (SSD)

Read more

And then

Further studies

Doctorate possible at the end of the M2.

Read more

Professional integration

Careers: statistician, biostatistician, data scientist, data analyst, all at the engineering level.

All sectors of activity: industry, research and development, health, agronomy, banking and insurance, commerce, etc.

Read more