Version: 2.0

🕑Estimated time for completion

This section takes about 1h30m to complete.

Pawarit Laosunthara

Curator

Jump into Data Science (Bonus)

AI vs Machine Learning

Misconceptions

Machine Learning

Source: https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781789537550/1/ch01lvl1sec10/differences-between-classification-and-regression

Feeling Lost? Machine Learning Estimators Map

Supervised Learning

Classification
- Binary:
  - should this loan be approved?
  - Is this a picture of a cat or a dog?
- Multi-class: what bird species is this picture?
Regression
- Many-to-one:
  - Predicting the price of a second-hand car
- Many-to-many:
  - Forecasting sales for the next 3 months

Unsupervised Learning

Clustering, Topic Modelling
Dimensionality Reduction

Optimization & Reinforcement Learning

Convex Optimization, Genetic Algorithms
Deep Reinforcement Learning

Intuition

What are some limitations of linear models (y = mx + b)? or even multiple linear regression (y = m1x1 + m2x2 + … + b)?

Assumes monotonic gradient (slope) between the target variable (y) and any feature (e.g. x1)
Assumes constant slope steepness between the target variable (y) and any feature (e.g. x1) for the entire domain of that feature

Machine Learning models can often provide the flexibility to overcome this. Think of it as automatic curve fitting and if-else!

You don’t need to hard-code the predicates
The algorithm will determine and optimize your if-conditions

However, note the issue happening with the green line in the picture 😵

Data Science

Trade-offs in modelling

Flexibility vs Generalization
Overfitting vs Underfitting
- How can we avoid overfitting in particular?
Complexity vs Interpretability
- Is that dichotomy still strictly true today?

Modelling in the real world

ML & Deep Learning training
- can be expensive and difficult to debug, start simple!
Explainability
Are point estimates always enough?
some stakeholders require uncertainties / confidence intervals

What problem are you actually trying to solve?

Machine Learning models optimize for a single objective function
- Pick the appropriate objective!
  - Are False Positives or False Negatives most detrimental to business and/or society?
  - If so, balance it with ‘overall accuracy’
Communicate with business & stakeholders
Evaluate holistically, not just with one metric
- Classification examples
  - Confusion matrix
  - ROC and AUC, (detailed, clearly explained video)
- Regression examples
  - Error distributions

Tools

Forecasting
- Prophet
Machine Learning
- scikit-learn
  - plenty of models and preprocessing methods to choose from (except for large-scale deep learning)
- XGBoost (RandomForests on steroids)
  - Popular choice for winning Kaggle competitions!
- InterpretML (see next slides)
Deep Learning
- PyTorch
  - Super flexible, create any model architecture
- TensorFlow
  - the Keras API is super easy to use
  - if not using Keras, probably better to go with PyTorch
MLOps, CD4ML
- MLflow for experiment tracking, logging, model versioning
- DVC for data versioning
- Azure Machine Learning and/or AWS SageMaker
  - covers full lifecycle management

Interpretability

Why is interpretability crucial?

Safety & accountability
Fairness & bias
Regulatory compliance
Security
- Adversarial attacks
- Must-watch: How many pixels does it take to fool a model?

variety-of-techniques glass-box

Christoph Molnar: [Interpretable Machine Learning](https://christophm.github.io/interpretable-ml-book/) * Great introduction to concepts and theory Fundamentals: [LIME](https://christophm.github.io/interpretable-ml-book/lime.html) & [SHAP](https://shap.readthedocs.io/en/latest/index.html)

Modern Toolkits

InterpretML
- Intro Video, Deep Dive
- Includes various explainability techniques
- You can even build advanced yet interpretable models from the ground up
Fairlearn
- Assess and (automatically) mitigate unfairness in your current models

CD4ML - Business Applications

Emily Gorcenski: Why bother with Continuous Delivery for Machine Learning

Jump into Data Science (Bonus)

AI vs Machine Learning​

Misconceptions​

Machine Learning​

Supervised Learning​

Unsupervised Learning​

Optimization & Reinforcement Learning​

Intuition​

Data Science​

Trade-offs in modelling​

Modelling in the real world​

What problem are you actually trying to solve?​

Tools​

Interpretability​

Modern Toolkits​

CD4ML - Business Applications​