Centrale Lille Course Catalogue

#Start&Go Data Science

Course label : #Start&Go Data Science
Teaching departement : /
Teaching manager : Mister PASCAL YIM
Education language : French
Potential ects : 4
Results grid :
Code and label (hp) : G1_S5_SG_DSC - #Start&Go Data Science

Education team

Teachers : Mister PASCAL YIM / Madam VERONIQUE LE COURTOIS / Mister ALEXANDRE MEGE REVIL / Mister DAVID BOULINGUEZ / Mister PHILIPPE QUAEGEBEUR / Mister PHILIPPE VANHEEGHE / Mister Sire de Marc EBODE ONANA / Mister SLIM HAMMADI
External contributors (business, research, secondary education): various temporary teachers

Summary

The main goal of this #Start&Go is to generate useful information from a set of raw data. Students will thus be confronted with a large number of data from various sectors (catalysis, economy, transport, etc.) and of various types (numerical, lexical, etc.). Without prior knowledge of the sector from which the data originates, they will have to extract relevant information for their "client". This #Start&Go must also be an opportunity for students to take an ethical look at the use of information.

Educational goals

The educational objectives below are common to the 5 variants of #Start&Go and can be supplemented by specific objectives. At the end of the activity, the student will be able to : - Perform a bibliographical search - Understand and summarize reference documents - Produce quality documents - Use tools and apply a problem-solving methodology for which he or she does not necessarily have the prerequisites - To concretize its ideas by a functional demonstrator (which can be a model) - Acquire knowledge, particularly independently, in a new field of activity. - Reporting on knowledge gained - Present and defend one's work in a professional manner At the end of the activity, the student will be aware of : - Economic, societal and environmental constraints associated with the issue - Complexity and the need to model systems - The need to experimentally validate a model - Issues and notions of open source and open hardware - The study and production of documentation in English - The importance of good specifications - The cross-cutting nature of actual projects - Time management - The need to situate oneself in terms of knowledge/skills, to express one's training needs - The interest of helping your team to improve its level of knowledge At the end of the course, the student will be able to : - Visualize a complex data set - Understand the basic elements associated with the nature of the data being processed - Analyze and prepare a set of data in order to make them usable - Applying machine learning methods to a data set - Restoring the information extracted from the data - Questioning the use of current information processing tools Contribution of the course to the skills repository; at the end of the course, the student will have progressed in : - Ability to collect and analyze information with logic and method - Ability to mobilize a scientific/technical culture (transdisciplinarity and/or specialization) - Ability to understand and formulate the problem (hypotheses, orders of magnitude, etc.) - Ability to recognize the specific elements of a problem - Ability to identify interactions between elements - Ability to converge towards an acceptable solution (monitoring hypotheses, orders of magnitude, etc.) - Ability to quickly deepen an area - Ability to develop working methods, to organize *** Translated with www.DeepL.com/Translator (free version) ***

Sustainable development goals

Knowledge control procedures

Continuous Assessment
Comments:

Online resources

A simple introduction to R: <https://www.fun-mooc.fr/c4x/UPSUD/42001S02/asset/labs.html> Memo R: <http://perso.unifr.ch/florence.yerly/Script/IntroR-handout.pdf> An example on the Titanic data (notably Random Forests) : <https://www.kaggle.com/mrisdal/titanic/exploring-survival-on-the-titanic Principal Component Analysis: <http://eric.univ-lyon2.fr/~ricco/course/didacticiels/R/acp_with_r.pdf> Using Jupyter with R: <http://earlglynn.github.io/kc-r-users-jupyter/Interactive-Jupyter-Notebooks-in-R.pdf> Video Courses: <https://bigdatauniversity.com/courses/data-science-hands-open-source-tools/> Datasets by categories: https://github.com/caesar0301/awesome-public-datasets> Kaggle competition: https://www.kaggle.com/datasets> IBM: <https://my.datascientistworkbench.com/login?next=https%3A%2F%2Fmy.datascientistworkbench.com%2Ffind_data Base Isidore: <http://www.rechercheisidore.fr/>

Pedagogy

- Work in groups of 4 students - Possible participation in an international Kaggle-type challenge - Pragmatic" approach, with the fundamental elements essential to understanding the situation. - Contribution of several disciplines to the understanding of different data sets

Sequencing / learning methods

Number of hours - Lectures : 0
Number of hours - Tutorial : 0
Number of hours - Practical work : 0
Number of hours - Seminar : 0
Number of hours - Half-group seminar : 0
Number of student hours in TEA (Autonomous learning) : 0
Number of student hours in TNE (Non-supervised activities) : 0
Number of hours in CB (Fixed exams) : 0
Number of student hours in PER (Personal work) : 0
Number of hours - Projects : 0

Prerequisites

Maximum number of registrants

Remarks