This is an open, unconventional textbook covering mathematics, computing, and artificial intelligence from foundational principles. It's designed for practitioners seeking a deep understanding, moving beyond exam preparation and focusing on real-world application. The author, drawing from years of experience in AI/ML, has compiled notes that prioritize intuition, context, and clear explanations, avoiding dense notation and outdated material.
The compendium covers a broad range of topics, from vectors and matrices to machine learning, computer vision, and multimodal learning, with future chapters planned for areas like data structures and AI inference.
This article provides a comprehensive overview of advanced causal inference methods, moving beyond traditional statistical approaches. It emphasizes the importance of understanding causal relationships rather than just correlations for effective decision-making. The playbook covers techniques like instrumental variables, regression discontinuity, difference-in-differences, and causal discovery algorithms.
It discusses the assumptions required for each method and how to validate them. The author stresses the need for careful consideration of confounding variables and potential biases when attempting to establish causality. Ultimately, the article aims to equip data scientists with the tools and knowledge to draw more meaningful and actionable insights from data.
Strong statistical understanding is crucial for data scientists to interpret results accurately, avoid misleading conclusions, and make informed decisions. It's a foundational skill that complements technical programming abilities.
* **Statistical vs. Practical Significance:** Don't automatically act on statistically significant results. Consider if the effect size is meaningful in a real-world context and impacts business goals.
* **Sampling Bias:** Be aware that your dataset is rarely a perfect representation of the population. Identify potential biases in data collection that could skew results.
* **Confidence Intervals:** Report ranges (confidence intervals) alongside point estimates to communicate the uncertainty of your data. Larger intervals indicate a need for more data.
* **Interpreting P-Values:** A p-value indicates the probability of observing your results *if* the null hypothesis is true, *not* the probability the hypothesis is true. Always report alongside effect sizes.
* **Type I & Type II Errors:** Understand the risks of false positives (Type I) and false negatives (Type II) in statistical testing. Sample size impacts the likelihood of Type II errors.
* **Correlation vs. Causation:** Correlation does not equal causation. Identify potential confounding variables that might explain observed relationships. Randomized experiments (A/B tests) are best for establishing causation.
* **Curse of Dimensionality:** Adding more features doesn't always improve model performance. High dimensionality can lead to data sparsity, overfitting, and reduced model accuracy. Feature selection and dimensionality reduction techniques are important.
A visual introduction to probability and statistics, covering basic probability, compound probability, probability distributions, frequentist inference, Bayesian inference, and regression analysis. Created by Daniel Kunin and team with interactive visualizations using D3.js.
A simple explanation of the Pearson correlation coefficient with examples
A step-by-step guide to catching real anomalies without drowning in false alerts.
A neofetch-style CLI tool for GitHub statistics. Display your GitHub profile and stats in a beautiful, colorful terminal interface.
This article details a hands-on approach to modeling rare events in time series data using Python. It covers data exploration, defining extreme events, fitting distributions (GEV, Weibull, Gumbel), and evaluating model performance using metrics like log-likelihood, AIC, and BIC. The example uses weather data and provides code snippets for implementation.
Understanding and Implementing Brant’s Tests in Ordinal Logistic Regression with Python. This article details the proportional odds model for ordinal logistic regression, its assumptions, and methods to assess the proportional odds assumption using likelihood ratio tests and separate fits approaches, with Python implementation examples.