Data Centric Deep Learning
4 weeks US$ 400
Not ready? for new sessions.
Space is limited
Course logo

Data Centric Deep Learning

Learn to build, improve, and repair deep learning models with a data-centric approach. This course will put you in the shoes of a deep learning engineer, and simulate the real world challenge of improving data quality, building and testing deep learning models, and improving performance with a human-in-the-loop. Week by week, we will develop an understanding of the critical role of data in deep learning operations – from integration tests to deep learning tooling to iterative annotation. Learn the best practices for deep learning in the real world.

Instructor profile photo
Andrew Maas
Senior Manager at Apple and Instructor at Stanford
Instructor profile photo
Mike Wu
PhD Scholar at Stanford
US$ 400
Course Duration
4 weeks
Start Date
November 14
Registration By
November 13
Project Session
Wednesday @ 5:00 PM UTC
Learn alongside a small group of your professional peers
Connect with experts through live sessions and office hours
Real-world projects that teach you industry skills.
Created and taught by
Instructor Photo
Link to an instructor's LinkedIn profile
Link to an instructor's Twitter profile

Andrew Maas

Senior Manager at Apple and Instructor at Stanford

Andrew Maas is currently at Apple working on data-centric deep learning. He completed a PhD in Computer Science at Stanford in 2015 advised by Andrew Ng and Dan Jurafsky. His dissertation focused on large scale deep learning methods for spoken and written language. Andrew has worked as an engineer and scientific advisor to several startups including, Coursera, and Semantic Machines. Prior to Apple, he built an NLP platform for precise healthcare language as cofounder of Roam Analytics. Additionally he also teaches CS224S: Spoken Language Processing as a visiting lecturer at Stanford University.

Affiliation logo
Instructor Photo
Link to an instructor's LinkedIn profile

Mike Wu

PhD Scholar at Stanford

Mike Wu is currently a fifth year PhD student at Stanford University advised by Noah Goodman. His research spans the fields of inference algorithms, deep generative models, and unsupervised learning. Mike’s research has appeared in NeurIPS, ICLR, AISTATS, and other top ML conferences with two best paper awards and his work has been featured in the New York Times. Mike previously worked as a software engineer at an AI startup called Lattice Data, and as a research engineer at Meta’s applied machine learning group. Mike and Andrew designed and taught a new version of Stanford’s CS224S: Spoken Language Processing in 2022.

Affiliation logo
The course

As deep learning becomes more deeply embedded in real world applications, there are fundamental questions around scalability, reproducibility, and quality. Unlike its predecessors, neural network systems introduce a new relationship between the practitioner and data – trained deep learning engineers take a “data-centric” approach to building, improving, and repairing models to be high performing and reliable in the real world. There is now a new skill set and toolkit of best practices for ensuring the quality of data, annotations, and models altogether.

In this course, students are given a series of projects that showcase best practices in both natural language and computer vision. Students will receive a mix of practical knowledge – the best tools and frameworks for deep learning engineering, and decision making guidelines – what are the different ways I can use data in the modern AI workflow? The course will take students through multiple stages, from inspecting annotations to continuous testing to iterative annotation to protecting models against distribution shift and adversarial examples. By the end of the course, students will have built a web application with an embedded model and have a thorough understanding of what it means to take a data-centric approach to AI.

Week 1
Understanding Data Quality
  • How to inspect and improve data quality and annotation quality.
  • How to identify and remove data anomalies or outliers.
  • The types of annotation errors and their effects on model performance.
  • Data analysis in NLP and computer vision.
  • Simulations of annotation errors and a model evaluation framework.
  • Annotation analysis for (1) a bounding-box task for object detection and (2) a text span task for entity recognition.
  • Train deep learning models in two different modalities: text and images.
Week 2
Deep Learning Workflows and CI/CD
  • To construct reproducible end-to-end machine learning workflows.
  • To finetune small networks on top of foundation models in computer vision.
  • Post-training processing (such as exporting, tracking, compression) of deep learning models for deployment.
  • Best practices for continuous testing of deep learning models.
  • Comfort with popular deep learning tools like Weights and Biases, ONNX, and FastAPI.
  • Integration tests, regression tests, and directionality tests for model quality assurance.
  • A MetaFlow pipeline that chains together training, evaluation, and deployment on a benchmark dataset of handwritten digits.
Week 3
Iterative Annotation with a Human-in-the-Loop
  • The role of active learning and self-learning in a deep learning framework.
  • How to use unlabeled data and model uncertainty to improve performance.
  • Best practices for designing web applications with embedded ML models.
  • Tools to identify which examples to prioritize for labeling.
  • Tools to noisily label large batches of data quickly without a third party service.
  • A lightweight web application in Flask that supports human-in-the-loop labeling.
Week 4
Model Maintenance and Repair
  • How to identify and handle distribution shift and adversarial examples.
  • The different types of distribution shift in NLP and computer vision.
  • Data augmentation techniques for model robustness.
  • Leverage the implemented workflows to quickly retrain and deploy a model.
  • Pipeline to handle the appearance of a new label class.
  • Repair models in response to adversarial examples in a visual classification task with outlier image watermarks.
  • Monitoring tools to track model performance and detect distribution shifts.
This course is for...

Students who want to learn the infrastructure and operations behind practical deep learning for real world applications.

Students who have taken the first two courses in the co:rise ML foundations track.

Data scientists and research engineers looking for best practices in building and maintaining deep learning models.

And students curious about the new data-centric approach to ML and AI.

Familiarity with Python, and comfortable with reading documentation for learning new tools. co:rise Python for Machine Learning course or equivalent.

Experience in basic machine learning and data science. Co:rise Introduction to Applied ML: Supervised Learning course or equivalent.

Basic web development with tools like Flask. Students do not need to be experts at building web applications.

Basic experience in deep learning, including using PyTorch. Co:rise Deep learning essentials, ML Coursera course, or equivalent.

Course experience

Live Sessions with Experts

Top industry leaders teach you everything you need in only 4 weeks

Interactive Learning

Real-world projects put your learning into immediate action

Professional Communities

Grow your network by learning with an intimate cohort of peers from top companies
Frequently Asked Questions
Stay in the loop
Keep in touch for updates, discounts, and new course sessions.
Backed by top VCs, including
Share your unique expertise with the world.
Receive best-in-class skills training for your teams and organization.
Join us as we change the future of online education.
© 2021 - 2022 Corise Education. Terms of Service. Privacy Policy.
Questions? Email us at