M2 - Statistics and Data Science (SSD) - BIOSTATS

  • Training structure

    Faculty of Science

Presentation

This M2 program is aimed at students with an M1 in Statistics and Data Science (SSD) or any other M1 in mathematics or equivalent, with a strong specialization in probability and statistics.

This M2 course is divided into two more specialized sub-courses, some of which are shared.

- The first of these specializations is Biostatistics, which focuses on the analysis and modeling of living data.

- The second is Information and Decision Management (MIND), specializing in the analysis and modeling of economic data, as well as the management of associated decisions and risks.

  • The ambition of the SSD-Biostat pathway is to meet the expectations of M1 SSD students who are attracted by the modeling of data from the living world or the environment. The statistical aspects covered in this course range from life modeling to the most theoretical issues in statistics and stochastic modeling. Numerical aspects are extremely present in this pathway, and require a pronounced taste for computer programming.

The SSD-Biostat pathway is a demanding course because it focuses on concepts rather than techniques. Indeed, in the field of data in the broadest sense, digital technologies, with the advent of artificial intelligence, are evolving fast and becoming obsolete even faster. Future statistical engineers or researchers who have to deal with data will be able to train in new technologies throughout their working lives, all the more so if they have had solid initial conceptual training. The added value of training is precisely to provide a theoretical understanding of the statistical concepts underlying automatic algorithms. Graduates must also be able to keep abreast of the latest technological developments.

In the second year, the SSD-Biostat pathway remains partly shared with the Information and Decision Management pathway (SSD-MIND). However, the SSD-BIOSTAT pathway has its own specialization courses (two per M2 semester) in life and environmental data science, with an emphasis on introduction to research.

  • The ambition of the SSD-MIND course is to meet the expectations of M1 SSD students who are attracted by the application of data science in business. Given the wide diversity of companies and their issues, this M2 program trains students in generalist, "all-terrain" data science. In addition, it provides training that is more specific to the corporate context and its economic and managerial issues (economic information, financial risk management, customers, corporate strategy, etc.).

The SSD - MIND program is a mathematical engineering course, with a focus on methodology and perfect mastery of statistical concepts and models. Graduates of this course will be able to deal with all types of data and problems, and design a complete and often original methodology to meet these problems, starting with the management and organization of data, continuing with its exploration and targeted reduction, then with the modeling of phenomena of interest, and finally synthesizing the information extracted for decision-making purposes. He or she must also be able to pass on to the business the knowledge synthesized from the information extracted from the data. Each new set of data and each question asked about it is often a new problem, and the application of a standard method to this data is therefore unsuitable. On the contrary, the aim is to write a mathematical model adapted to these data (in the sense that it expresses their complexity satisfactorily) and make it assimilable to a standard estimation method, or to design and program a more specific method. The emphasis placed by this course on conceptual and mathematical mastery of tools ensures that graduates are highly adaptable and self-training, as required by the rapid evolution of data science.

The SSD-MIND pathway remains partly shared in the second year with the biostatistics pathway (SSD-Biostat), more specialized in the analysis and modeling of data from the living world or the environment. The SSD-MIND pathway is a double degree program in partnership with IAE (which teaches economics and management).

Read more

Objectives

The course has several objectives.

  • To enable students to deal with all types of data and problems, and to devise a comprehensive and often original methodology to address them.
  • Enable students to integrate quickly into any type of business by quickly grasping its issues.
  • Bring students who wish to do so up to a theoretical level enabling them to write a doctoral thesis in statistics.
  • Train future researchers or teacher-researchers in the mathematics of randomness: probability or theoretical or applied statistics. After a doctorate, they will be able to join laboratories in universities, engineering schools or research organizations such as CNRS, INRAE, Inria, CIRAD, INSERM, etc. It is also possible to join a company or an industrial research laboratory directly after M2.
  • Train specialists in high-level statistical data processing for research organizations or companies for which statistics are now an indispensable tool, such as pharmaceutical laboratories, epidemiological monitoring institutes, air and water quality monitoring institutes, agri-food companies, biotechnology companies, healthcare sector companies (diagnostic assistance, personalized medicine), etc.
Read more

Know-how and skills

  • Extract relevant data
  • Pre-process data (cleaning and formatting where necessary)
  • Conduct exploratory data analysis using visualization and dimension reduction tools.
  • Modeling a problem: master the usual methods of modern data science and know how to propose the appropriate method(s) for solving the problem posed, write one or more mathematical model(s) adapted to the problem, and put them into a form suitable for processing by the usual methods of modern data science.
  • Implement the method from a computational point of view and be able to propose model selection strategies
  • Efficient programming in at least one language (python, R)
  • Know how to analyze and interpret results, i.e. produce knowledge from extracted information.
  • Link the knowledge produced to the decision, so as to inform and optimize the latter.
  • Ability to communicate results orally and in writing
Read more

Program

M2 is open to sandwich courses, with long periods of teaching and work experience.

First period: 7 weeks of teaching from September to the end of October.

Second period: 7 weeks in the company for alternating students or a long tutored project in the laboratory for non-alternating students, from November to mid-January.

Third period: 7 weeks of teaching from mid-January to mid-March.

Fourth period: work-study program in a company from March to August, or 4 to 6-month internship for non-work-study students.

Internships and tutored projects: A 7-week tutored project for non-alternates, with a report and oral presentation.

4-6 month internship at the end of M2.

Read more
  • Non-parametric estimation

    5 credits
  • Generalized linear models

    5 credits
  • English

    2 credits
  • Work-study project or presentation

    3 credits
  • Bayesian statistics

    5 credits
  • Multivariate analysis

    5 credits
  • Statistical learning

    5 credits
  • Life cycle analysis

    4 credits
  • Addendum 2

    4 credits
  • Supplement 1

    4 credits
  • Internship

    14 credits
  • Latent variable models

    4 credits

Admission

How to register

Applications can be submitted on the following platforms: 

Read more

Target audience

Target audience*: This course is aimed at students holding the M1 Maths - Statistics Data Science (SSD) or any other equivalent M1 in mathematics with a strong specialization in probability and statistics.

Read more

Necessary prerequisites

M1 Maths - Statistics Data Science (SSD)

Read more

Recommended prerequisites

 M1 Maths - Statistics Data Science (SSD)

Read more

And then

Further studies

Doctorate possible on completion of M2.

Read more

Professional integration

Careers: statistician, biostatistician, data scientist, data analyst, all at engineering level.

All business sectors: industry, research and development, healthcare, agronomy, banking and insurance, commerce, etc.

Read more