SDD1002Winter 2025Data
← Back to university path
Modeling and Simulation
Detailed view of the course, studied concepts, technologies used, and major academic work associated with it.
Code
SDD1002
Session
Winter 2025
Domain
Data
Overview
Course focused on theoretical and practical techniques for modeling, simulation, and data analysis. It allowed me to work across several stages of a data science pipeline: data collection, cleaning, preparation, visualization, dimensionality reduction, modeling, and the application of machine learning algorithms on real datasets.
Technologies used
PythonNumPyPandasScikit-learnMatplotlibKaggle
Key concepts covered
- Python review
- Web scraping
- Finding and selecting relevant datasets
- Data cleaning and preparation
- Data visualization
- Using NumPy
- Matrix operations
- Dot product and cross product
- Determinant, inverse matrix, and systems of linear equations
- Eigenvalues and eigenvectors
- Markov Chain Monte Carlo (MCMC)
- Maximum likelihood
- Expectation-maximization algorithm
- Visualization with Matplotlib and boxplots
- Gaussian processes
- Using Pandas for data manipulation and cleaning
- Introduction to Scikit-learn
- Dimensionality reduction
- Principal Component Analysis (PCA)
- t-SNE
- Cross-validation
- Linear regression
- Clustering
- Classification
- Statistical tests: p-value, t-test, chi-square, ANOVA (if time allows)
- Real-world data science projects
Coursework and evaluated components
- Lab 1 with presentation and report on a selected dataset
- Midterm exam including a mini-project
- Final project with oral presentation
- Data cleaning, preparation, analysis, and visualization
- Application of modeling and machine learning algorithms