EEG Analysis and Causal Inference

Eye state prediction and causal effect estimation on EEG signals using deep learning and advanced causal methods.

Project overview

Full analysis of a temporal EEG recording to predict eye state (open/closed) and estimate causal effects. The project combines classical machine learning, sequence deep learning, and causal inference with causal trees and Double Machine Learning.

Current state

Temporal preprocessing and feature engineering completed.
Classical ML and deep learning (LSTM) models trained and evaluated.
Causal question formulated with DAG and identification assumptions.
Causal estimation via DML and causal trees being finalized.

Tech stack

PythonScikit-learnEconMLPandasJupyter

Tags & Code

Data ScienceCausal InferenceDeep LearningPython

Private code (academic project)

Vision

Predict eye state from EEG signals while respecting the temporal structure of the data.
Estimate a causal effect of eye state on posterior EEG activity using advanced methods.
Clearly distinguish prediction from causality — an accurate model does not prove a causal effect.

Architecture

Exploration and preprocessing: temporal index, sliding windows, derived features (mean, std, PCA).
Predictive models: logistic regression, random forest, gradient boosting, LSTM — temporal train/test split.
Causal question: DAG, backdoor criterion, adjustment variables.
Causal estimation: propensity score, causal trees (EconML), Double Machine Learning (5-fold cross-fitting).
Sensitivity analysis: robustness of results to confounder variations.

Roadmap

Phase 1: exploration, preprocessing, and temporal feature engineering.
Phase 2: predictive models (classical ML + deep learning) with rigorous evaluation.
Phase 3: causal formulation, DAG, and estimation via causal trees and DML.
Phase 4: interpretation, sensitivity analysis, and final report.

Engineering decisions

Strict temporal train/test split to prevent any data leakage.
Double Machine Learning to relax the linearity assumption of classical regression.
5-fold cross-fitting to avoid overfitting in nuisance estimation.
Explicit DAG to formalize causal assumptions and identify confounders.

Possible improvements

Extend the analysis to multiple EEG recordings to improve generalization.
Test more complex deep learning architectures (GRU, 1D CNN).
Explore additional causal methods (DiD, IV).

Lessons learned

Temporal data structure imposes strong constraints on modeling choices.
Distinguishing prediction from causality is fundamental — two very different questions.
Double Machine Learning provides solid theoretical guarantees without imposing a functional form.

← All projects Contact me