πEstimated time for completion
This section takes about 1h30m to complete.
Jump into Data Science (Bonus)
AI vs Machine Learningβ
Misconceptionsβ
Machine Learningβ
Feeling Lost? Machine Learning Estimators Map
Supervised Learningβ
- Classification
- Binary:
- should this loan be approved?
- Is this a picture of a cat or a dog?
- Multi-class: what bird species is this picture?
- Binary:
- Regression
- Many-to-one:
- Predicting the price of a second-hand car
- Many-to-many:
- Forecasting sales for the next 3 months
- Many-to-one:
Unsupervised Learningβ
- Clustering, Topic Modelling
- Dimensionality Reduction
Optimization & Reinforcement Learningβ
- Convex Optimization, Genetic Algorithms
- Deep Reinforcement Learning
Intuitionβ
- Assumes monotonic gradient (slope) between the target variable (y) and any feature (e.g. x1)
- Assumes constant slope steepness between the target variable (y) and any feature (e.g. x1) for the entire domain of that feature
Machine Learning models can often provide the flexibility to overcome this. Think of it as automatic curve fitting and if-else!
- You donβt need to hard-code the predicates
- The algorithm will determine and optimize your if-conditions
However, note the issue happening with the green line in the picture π΅
Data Scienceβ
Trade-offs in modellingβ
- Flexibility vs Generalization
- Overfitting vs Underfitting
- How can we avoid overfitting in particular?
- Complexity vs Interpretability
- Is that dichotomy still strictly true today?
Modelling in the real worldβ
- ML & Deep Learning training
- can be expensive and difficult to debug, start simple!
- Explainability
- Are point estimates always enough?
- some stakeholders require uncertainties / confidence intervals
What problem are you actually trying to solve?β
- Machine Learning models optimize for a single objective function
- Pick the appropriate objective!
- Are False Positives or False Negatives most detrimental to business and/or society?
- If so, balance it with βoverall accuracyβ
- Pick the appropriate objective!
- Communicate with business & stakeholders
- Evaluate holistically, not just with one metric
- Classification examples
- Regression examples
Toolsβ
- Forecasting
- Machine Learning
- scikit-learn
- plenty of models and preprocessing methods to choose from (except for large-scale deep learning)
- XGBoost (RandomForests on steroids)
- Popular choice for winning Kaggle competitions!
- InterpretML (see next slides)
- scikit-learn
- Deep Learning
- PyTorch
- Super flexible, create any model architecture
- TensorFlow
- the Keras API is super easy to use
- if not using Keras, probably better to go with PyTorch
- PyTorch
- MLOps, CD4ML
- MLflow for experiment tracking, logging, model versioning
- DVC for data versioning
- Azure Machine Learning and/or AWS SageMaker
- covers full lifecycle management
Interpretabilityβ
Why is interpretability crucial?
- Safety & accountability
- Fairness & bias
- Regulatory compliance
- Security
- Adversarial attacks
- Must-watch: How many pixels does it take to fool a model?
Modern Toolkitsβ
- InterpretML
- Intro Video, Deep Dive
- Includes various explainability techniques
- You can even build advanced yet interpretable models from the ground up
- Fairlearn
- Assess and (automatically) mitigate unfairness in your current models
CD4ML - Business Applicationsβ
Emily Gorcenski: Why bother with Continuous Delivery for Machine Learning