The article discusses the credibility of using Random Forest Variable Importance for identifying causal links in data where the output is binary. It contrasts this method with fitting a Logistic Regression model and examining its coefficients. The discussion highlights the challenges of extracting causality from observational data without controlled experiments, emphasizing the importance of domain knowledge and the use of partial dependence plots for interpreting model results.
A team from MIT has developed an algorithm to identify causal links in complex systems by measuring interactions between variables over time.
The versatile algorithm identifies variables that likely influence others in complex systems. This method analyzes data collected over time to measure interactions between variables and estimate the impact of changes in one variable on another. It generates a "causality map" showing which variables are strongly linked.
The algorithm distinguishes between different types of causality:
- **Synergistic:** A variable only influences another when paired with a second variable.
- **Redundant:** A change in one variable has the same effect as another variable.
The algorithm also estimates "causal leakage," indicating that some unknown influence is missing.
This article explains how adding monotonic constraints to traditional ML models can make them more reliable for causal inference, illustrated with a real estate example.