Space is limited
Course logo

Interpreting Machine Learning Models

This course provides an introduction to interpreting machine learning models, specifically deep learning models that are pre-trained on large amounts of data using self-supervision. The course is designed to cover the breadth of different techniques for explaining predictions of classifiers trained on language, vision, and tabular data. You will get hands-on experience working with different algorithms for interpretability and benchmark widely used ML models for faithfulness and plausibility. The course will also cover the basics of how some explanations can help uncover artifacts and use those observations to build robust ML models.

Instructor profile photo
Nazneen Rajani
Robustness Research Lead at Hugging Face
Real-world projects that teach you industry skills.
Learn alongside a small group of your professional peers
Part-time program with 2 live events per week:
Tuesday @ 6:00 PM UTC
Project Session
Thursday @ 6:00 PM UTC
Next Cohort
August 28, 2023
4 weeks
US$ 400
or included with membership

Course taught by expert instructors

Instructor Photo
Affiliation logo

Nazneen Rajani

Robustness Research Lead at Hugging Face

Nazneen is a Robustness Research Lead at Hugging Face She got her Ph.D. in Computer Science from UT Austin where she was advised by Pof. Ray Mooney. Several of her works (15+) have been published in top-tier AI conferences including ACL, EMNLP, NAACL, NeurIPS, ICLR. Nazneen was one of the finalists for the VentureBeat Transform 2020 women in AI Research. Her work has been covered by various media outlets including Quanta magazine, VentureBeat, SiliconAngle, ZDNet, Datanami. More details about her work can be found on her website

The course

Learn and apply skills with real-world projects.

Who is it for?
  • Data scientists and ML practitioners with a background in Machine Learning who want to better understand deep learning models and interpret their predictions.

  • Domain experts in ethics, law, policy, and other regulators who would like to get a deeper understanding of how ML models work.

Prerequisites / Commitment
  • Ability to write Python and work with documented libraries (Scikit-learn, Captum, Numpy, Transformers)

  • Completed a course in foundational machine learning (CoRise Deep Learning Essentials or similar)

  • Foundational knowledge of statistics

Not ready?

Try these prep courses first

  • What is interpretable ML?
  • Why explanations?
  • Types of explanations
  • Methods for interpreting ML models
  • Saliency based methods
  • Integrated gradients, SHAP, GradCam, LIME
Generate and compare explanations for predictions of a image classifier
  • Train an image classifier of your choice (eg: food classifier)
  • Implement saliency based methods for explaining model predictions
  • Compare the methods – which one do you agree with most?
  • Text explanations – leave-one-out, rationale
  • Open-ended explanations using pre-trained language models
  • Influence functions
Interpret the predictions of a pre-trained large language model
  • Fine-tune a language model on a task (eg: sentiment)
  • Study model behavior using leave-one-out and rationales
  • Generate explanations via prompts using BLOOM/GPT3
  • Implement influence functions for interpreting predictions
  • Use cases for different methods
  • Hard vs. soft rationales
  • Faithfulness vs. plausibility
  • Comprehensiveness and sufficiency metrics
Comprehensive evaluation of explanations
  • Generate rationales for a text classification model
  • Evaluate and compare using the ERASER or Ferret benchmarks
  • Analyze performance on different metrics
  • What can we infer about the methods based on these evaluations? Are some methods better suited to certain use cases?
  • Using explanations to 1) examine plausible spurious patterns and 2) uncover dataset artifacts
  • Data curation via counterfactuals for model training
  • Evaluate model for robustness
Applying lessons learned from interpreting models to make them robust
  • Generate counterfactual explanations for model predictions on tabular data OR
  • In-depth robustness evaluation using counterfactuals, subpopulations, and adversarial attacks OR
  • Robustify model via weak supervision instead of counterfactuals

Real-world projects

Work on projects that bring your learning to life.
Made to be directly applicable in your work.

Live access to experts

Sessions and Q&As with our expert instructors, along with real-world projects.

Network & community

Core reviews a study groups. Share experiences and learn alongside a global network of professionals.

Support & accountability

We have a system in place to make sure you complete the course, and to help nudge you along the way.

Get reimbursed by your company

More than half of learners get their Courses and Memberships reimbursed by their company.

Hundreds of companies have dedicated L&D and education budgets that have covered the costs.


Frequently Asked Questions

Still not sure?

Get in touch and we'll help you decide.

Keep in touch for updates, discounts, and new courses.

Questions? Ask us anything at

© 2021-2022 CoRise Education