Space is limited
Course logo

Interpreting Machine Learning Models

This course provides an introduction to interpreting machine learning models, specifically deep learning models that are pre-trained on large amounts of data using self-supervision. The course is designed to cover the breadth of different techniques for explaining predictions of classifiers trained on language, vision, and tabular data. You will get hands-on experience working with different algorithms for interpretability and benchmark widely used ML models for faithfulness and plausibility. The course will also cover the basics of how some explanations can help uncover artifacts and use those observations to build robust ML models.

Instructor profile photo
Nazneen Rajani
Robustness Research Lead at Hugging Face
Real-world projects that teach you industry skills.
Learn alongside a small group of your professional peers
Part-time program with 2 live events per week:
Next Cohort
January 23, 2023
4 weeks
US$ 400
or included with membership

Course taught by expert instructors

Instructor Photo
Affiliation logo

Nazneen Rajani

Robustness Research Lead at Hugging Face

Nazneen is a Robustness Research Lead at Hugging Face She got her Ph.D. in Computer Science from UT Austin where she was advised by Pof. Ray Mooney. Several of her works (15+) have been published in top-tier AI conferences including ACL, EMNLP, NAACL, NeurIPS, ICLR. Nazneen was one of the finalists for the VentureBeat Transform 2020 women in AI Research. Her work has been covered by various media outlets including Quanta magazine, VentureBeat, SiliconAngle, ZDNet, Datanami. More details about her work can be found on her website

The course

Learn and apply skills with real-world projects.

Generate and compare explanations for predictions of a image classifier
  • Train an image classifier of your choice (eg: food classifier)
  • Implement saliency based methods for explaining model predictions
  • Compare the methods – which one do you agree with most?
  • What is interpretable ML?
  • Why explanations?
  • Types of explanations
  • Methods for interpreting ML models
  • Saliency based methods
  • Integrated gradients, SHAP, GradCam, LIME
Interpret the predictions of a pre-trained large language model
  • Fine-tune a language model on a task (eg: sentiment)
  • Study model behavior using leave-one-out and rationales
  • Generate explanations via prompts using BLOOM/GPT3
  • Implement influence functions for interpreting predictions
  • Use cases for different methods
  • Text explanations – leave-one-out, rationale
  • Open-ended explanations using pre-trained language models
  • Influence functions
Comprehensive evaluation of explanations
  • Generate rationales for a text classification model
  • Evaluate and compare using the ERASER or Ferret benchmarks
  • Analyze performance on different metrics
  • What can we infer about the methods based on these evaluations? Are some methods better suited to certain use cases?
  • Hard vs. soft rationales
  • Faithfulness vs. plausibility
  • Comprehensiveness and sufficiency metrics
Applying lessons learned from interpreting models to make them robust
  • Generate counterfactual explanations for model predictions on tabular data OR
  • In-depth robustness evaluation using counterfactuals, subpopulations, and adversarial attacks OR
  • Robustify model via weak supervision instead of counterfactuals
  • Using explanations to 1) examine plausible spurious patterns and 2) uncover dataset artifacts
  • Data curation via counterfactuals for model training
  • Evaluate model for robustness

Real-world projects

Work on projects that bring your learning to life.
Made to be directly applicable in your work.

Live access to experts

Sessions and Q&As with our expert instructors, along with real-world projects.

Network & community

Core reviews a study groups. Share experiences and learn alongside a global network of professionals.

Support & accountability

We have a system in place to make sure you complete the course, and to help nudge you along the way.

Course success stories

Learn together and share experiences with other industry professionals

This course is for...


Data scientists and ML practitioners with a background in Machine Learning who want to better understand deep learning models and interpret their predictions.


Domain experts in ethics, law, policy, and other regulators who would like to get a deeper understanding of how ML models work.


Ability to write Python and work with documented libraries (Scikit-learn, Captum, Numpy, Transformers)

Completed a course in foundational machine learning (CoRise Deep Learning Essentials or similar)

Foundational knowledge of statistics

Frequently Asked Questions

Keep in touch for updates, discounts, and new courses.

Questions? Ask us anything at

Β© 2021-2022 CoRise Education