HT
Projects/EEG Analysis and Causal Inference
In progress2026Full analysis — modeling, causal inference, interpretation

EEG Analysis and Causal Inference

Eye state prediction and causal effect estimation on EEG signals using deep learning and advanced causal methods.

EEG Analysis and Causal Inference

Project overview

Full analysis of a temporal EEG recording to predict eye state (open/closed) and estimate causal effects. The project combines classical machine learning, sequence deep learning, and causal inference with causal trees and Double Machine Learning.

Current state

  • Temporal preprocessing and feature engineering completed.
  • Classical ML and deep learning (LSTM) models trained and evaluated.
  • Causal question formulated with DAG and identification assumptions.
  • Causal estimation via DML and causal trees being finalized.

Tech stack

PythonScikit-learnEconMLPandasJupyter

Tags & Code

Data ScienceCausal InferenceDeep LearningPython

Private code (academic project)

Vision

  • Predict eye state from EEG signals while respecting the temporal structure of the data.
  • Estimate a causal effect of eye state on posterior EEG activity using advanced methods.
  • Clearly distinguish prediction from causality — an accurate model does not prove a causal effect.

Architecture

  • Exploration and preprocessing: temporal index, sliding windows, derived features (mean, std, PCA).
  • Predictive models: logistic regression, random forest, gradient boosting, LSTM — temporal train/test split.
  • Causal question: DAG, backdoor criterion, adjustment variables.
  • Causal estimation: propensity score, causal trees (EconML), Double Machine Learning (5-fold cross-fitting).
  • Sensitivity analysis: robustness of results to confounder variations.

Roadmap

  • Phase 1: exploration, preprocessing, and temporal feature engineering.
  • Phase 2: predictive models (classical ML + deep learning) with rigorous evaluation.
  • Phase 3: causal formulation, DAG, and estimation via causal trees and DML.
  • Phase 4: interpretation, sensitivity analysis, and final report.

Engineering decisions

  • Strict temporal train/test split to prevent any data leakage.
  • Double Machine Learning to relax the linearity assumption of classical regression.
  • 5-fold cross-fitting to avoid overfitting in nuisance estimation.
  • Explicit DAG to formalize causal assumptions and identify confounders.

Possible improvements

  • Extend the analysis to multiple EEG recordings to improve generalization.
  • Test more complex deep learning architectures (GRU, 1D CNN).
  • Explore additional causal methods (DiD, IV).

Lessons learned

  • Temporal data structure imposes strong constraints on modeling choices.
  • Distinguishing prediction from causality is fundamental — two very different questions.
  • Double Machine Learning provides solid theoretical guarantees without imposing a functional form.