Overview
AI vs Machine Learning
Misconceptions
Machine Learning
Feeling Lost? Machine Learning Estimators Map
Supervised Learning
- Classification
- Binary:
- should this loan be approved?
- Is this a picture of a cat or a dog?
- Multi-class: what bird species is this picture?
- Binary:
- Regression
- Many-to-one:
- Predicting the price of a second-hand car
- Many-to-many:
- Forecasting sales for the next 3 months
- Many-to-one:
Unsupervised Learning
- Clustering, Topic Modelling
- Dimensionality Reduction
Optimization & Reinforcement Learning
- Convex Optimization, Genetic Algorithms
- Deep Reinforcement Learning
Intuition
Machine Learning models can often provide the flexibility to overcome this. Think of it as automatic curve fitting and if-else!
- You don’t need to hard-code the predicates
- The algorithm will determine and optimize your if-conditions
However, note the issue happening with the green line in the picture 😵
Data Science
Trade-offs in modelling
- Flexibility vs Generalization
- Overfitting vs Underfitting
- How can we avoid overfitting in particular?
- Complexity vs Interpretability
- Is that dichotomy still strictly true today?
Modelling in the real world
- ML & Deep Learning training
- can be expensive and difficult to debug, start simple!
- Explainability
- Are point estimates always enough?
- some stakeholders require uncertainties / confidence intervals
What problem are you actually trying to solve?
- Machine Learning models optimize for a single objective function
- Pick the appropriate objective!
- Are False Positives or False Negatives most detrimental to business and/or society?
- If so, balance it with ‘overall accuracy’
- Pick the appropriate objective!
- Communicate with business & stakeholders
- Evaluate holistically, not just with one metric
- Classification examples
- Regression examples
Tools
- Forecasting
- Machine Learning
- scikit-learn
- plenty of models and preprocessing methods to choose from (except for large-scale deep learning)
- XGBoost (RandomForests on steroids)
- Popular choice for winning Kaggle competitions!
- InterpretML (see next slides)
- scikit-learn
- Deep Learning
- PyTorch
- Super flexible, create any model architecture
- TensorFlow
- the Keras API is super easy to use
- if not using Keras, probably better to go with PyTorch
- PyTorch
- MLOps, CD4ML
- MLflow for experiment tracking, logging, model versioning
- DVC for data versioning
- Azure Machine Learning and/or AWS SageMaker
- covers full lifecycle management
Interpretability
Why is interpretability crucial?
- Safety & accountability
- Fairness & bias
- Regulatory compliance
- Security
- Adversarial attacks
- Must-watch: How many pixels does it take to fool a model?
Modern Toolkits
- InterpretML
- Intro Video, Deep Dive
- Includes various explainability techniques
- You can even build advanced yet interpretable models from the ground up
- Fairlearn
- Assess and (automatically) mitigate unfairness in your current models
CD4ML - Business Applications
Emily Gorcenski: Why bother with Continuous Delivery for Machine Learning