Hugo Tekeng
SDD1002Winter 2025Data
← Back to university path

Modeling and Simulation

Detailed view of the course, studied concepts, technologies used, and major academic work associated with it.

Code

SDD1002

Session

Winter 2025

Domain

Data

Overview

Course focused on theoretical and practical techniques for modeling, simulation, and data analysis. It allowed me to work across several stages of a data science pipeline: data collection, cleaning, preparation, visualization, dimensionality reduction, modeling, and the application of machine learning algorithms on real datasets.

Technologies used

PythonNumPyPandasScikit-learnMatplotlibKaggle

Key concepts covered

  • Python review
  • Web scraping
  • Finding and selecting relevant datasets
  • Data cleaning and preparation
  • Data visualization
  • Using NumPy
  • Matrix operations
  • Dot product and cross product
  • Determinant, inverse matrix, and systems of linear equations
  • Eigenvalues and eigenvectors
  • Markov Chain Monte Carlo (MCMC)
  • Maximum likelihood
  • Expectation-maximization algorithm
  • Visualization with Matplotlib and boxplots
  • Gaussian processes
  • Using Pandas for data manipulation and cleaning
  • Introduction to Scikit-learn
  • Dimensionality reduction
  • Principal Component Analysis (PCA)
  • t-SNE
  • Cross-validation
  • Linear regression
  • Clustering
  • Classification
  • Statistical tests: p-value, t-test, chi-square, ANOVA (if time allows)
  • Real-world data science projects

Coursework and evaluated components

  • Lab 1 with presentation and report on a selected dataset
  • Midterm exam including a mini-project
  • Final project with oral presentation
  • Data cleaning, preparation, analysis, and visualization
  • Application of modeling and machine learning algorithms