Skip to main content
Version: 2.0
πŸ•‘Estimated time for completion

This section takes about 1h30m to complete.

Jump into Data Science (Bonus)

AI vs Machine Learning​

ml-vs-ai.png

Misconceptions​

uncovering-ai-meme.png

what-x-thinks-i-do.png

data-engineer-data-scientist.png

tableau-data-science.png

Machine Learning​

machine-learning.png

Source: https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781789537550/1/ch01lvl1sec10/differences-between-classification-and-regression

Feeling Lost? Machine Learning Estimators Map

Supervised Learning​

  • Classification
    • Binary:
      • should this loan be approved?
      • Is this a picture of a cat or a dog?
    • Multi-class: what bird species is this picture?
  • Regression
    • Many-to-one:
      • Predicting the price of a second-hand car
    • Many-to-many:
      • Forecasting sales for the next 3 months

Unsupervised Learning​

  • Clustering, Topic Modelling
  • Dimensionality Reduction

Optimization & Reinforcement Learning​

  • Convex Optimization, Genetic Algorithms
  • Deep Reinforcement Learning

Intuition​

function.png decision-tree-regression.png

decision-tree-regression-profit-production-cost.png

What are some limitations of linear models (y = mx + b)? or even multiple linear regression (y = m1x1 + m2x2 + … + b)?
  • Assumes monotonic gradient (slope) between the target variable (y) and any feature (e.g. x1)
  • Assumes constant slope steepness between the target variable (y) and any feature (e.g. x1) for the entire domain of that feature

Machine Learning models can often provide the flexibility to overcome this. Think of it as automatic curve fitting and if-else!

  • You don’t need to hard-code the predicates
  • The algorithm will determine and optimize your if-conditions

However, note the issue happening with the green line in the picture 😡

Data Science​

Trade-offs in modelling​

underfitting-overfitting.png

accuracy-vs-intelligibility.png

  • Flexibility vs Generalization
  • Overfitting vs Underfitting
    • How can we avoid overfitting in particular?
  • Complexity vs Interpretability
    • Is that dichotomy still strictly true today?

Modelling in the real world​

confidence-intervals.png

  • ML & Deep Learning training
    • can be expensive and difficult to debug, start simple!
  • Explainability
  • Are point estimates always enough?
  • some stakeholders require uncertainties / confidence intervals

What problem are you actually trying to solve?​

roc-curve.png

true-class-predicted-class.png

Tools​

tensor-flow.png pytorch.png xg-boost.png scikid-learn.png

  • Forecasting
  • Machine Learning
    • scikit-learn
      • plenty of models and preprocessing methods to choose from (except for large-scale deep learning)
    • XGBoost (RandomForests on steroids)
      • Popular choice for winning Kaggle competitions!
    • InterpretML (see next slides)
  • Deep Learning
    • PyTorch
      • Super flexible, create any model architecture
    • TensorFlow
      • the Keras API is super easy to use
      • if not using Keras, probably better to go with PyTorch
  • MLOps, CD4ML

Interpretability​

i-am-dog.png

Why is interpretability crucial?

variety-of-techniques glass-box

Christoph Molnar: [Interpretable Machine Learning](https://christophm.github.io/interpretable-ml-book/) * Great introduction to concepts and theory Fundamentals: [LIME](https://christophm.github.io/interpretable-ml-book/lime.html) & [SHAP](https://shap.readthedocs.io/en/latest/index.html)

Modern Toolkits​

  • InterpretML
    • Intro Video, Deep Dive
    • Includes various explainability techniques
    • You can even build advanced yet interpretable models from the ground up
  • Fairlearn
    • Assess and (automatically) mitigate unfairness in your current models

CD4ML - Business Applications​