In progress2026Full analysis — modeling, causal inference, interpretation
EEG Analysis and Causal Inference
Eye state prediction and causal effect estimation on EEG signals using deep learning and advanced causal methods.

Project overview
Full analysis of a temporal EEG recording to predict eye state (open/closed) and estimate causal effects. The project combines classical machine learning, sequence deep learning, and causal inference with causal trees and Double Machine Learning.
Current state
- Temporal preprocessing and feature engineering completed.
- Classical ML and deep learning (LSTM) models trained and evaluated.
- Causal question formulated with DAG and identification assumptions.
- Causal estimation via DML and causal trees being finalized.
Tech stack
PythonScikit-learnEconMLPandasJupyter
Tags & Code
Data ScienceCausal InferenceDeep LearningPython
Private code (academic project)
Vision
- Predict eye state from EEG signals while respecting the temporal structure of the data.
- Estimate a causal effect of eye state on posterior EEG activity using advanced methods.
- Clearly distinguish prediction from causality — an accurate model does not prove a causal effect.
Architecture
- Exploration and preprocessing: temporal index, sliding windows, derived features (mean, std, PCA).
- Predictive models: logistic regression, random forest, gradient boosting, LSTM — temporal train/test split.
- Causal question: DAG, backdoor criterion, adjustment variables.
- Causal estimation: propensity score, causal trees (EconML), Double Machine Learning (5-fold cross-fitting).
- Sensitivity analysis: robustness of results to confounder variations.
Roadmap
- Phase 1: exploration, preprocessing, and temporal feature engineering.
- Phase 2: predictive models (classical ML + deep learning) with rigorous evaluation.
- Phase 3: causal formulation, DAG, and estimation via causal trees and DML.
- Phase 4: interpretation, sensitivity analysis, and final report.
Engineering decisions
- Strict temporal train/test split to prevent any data leakage.
- Double Machine Learning to relax the linearity assumption of classical regression.
- 5-fold cross-fitting to avoid overfitting in nuisance estimation.
- Explicit DAG to formalize causal assumptions and identify confounders.
Possible improvements
- Extend the analysis to multiple EEG recordings to improve generalization.
- Test more complex deep learning architectures (GRU, 1D CNN).
- Explore additional causal methods (DiD, IV).
Lessons learned
- Temporal data structure imposes strong constraints on modeling choices.
- Distinguishing prediction from causality is fundamental — two very different questions.
- Double Machine Learning provides solid theoretical guarantees without imposing a functional form.