Data Centric Deep Learning
Learn to build, improve, and repair deep learning models with a data-centric approach. This course will put you in the shoes of a deep learning engineer, and simulate the real world challenge of improving data quality, building and testing deep learning models, and improving performance with a human-in-the-loop. Week by week, we will develop an understanding of the critical role of data in deep learning operations – from integration tests to deep learning tooling to iterative annotation. Learn the best practices for deep learning in the real world.
Course taught by expert instructors
Senior Manager at Apple and Instructor at Stanford University
Andrew Maas is currently at Apple working on data-centric deep learning. He completed a PhD in Computer Science at Stanford in 2015 advised by Andrew Ng and Dan Jurafsky. His dissertation focused on large scale deep learning methods for spoken and written language. Andrew has worked as an engineer and scientific advisor to several startups including Wit.ai, Coursera, and Semantic Machines. Prior to Apple, he built an NLP platform for precise healthcare language as cofounder of Roam Analytics. Additionally he also teaches CS224S: Spoken Language Processing as a visiting lecturer at Stanford University.
PhD Scholar at Stanford
Mike Wu is currently a fifth year PhD student at Stanford University advised by Noah Goodman. His research spans the fields of inference algorithms, deep generative models, and unsupervised learning. Mike’s research has appeared in NeurIPS, ICLR, AISTATS, and other top ML conferences with two best paper awards and his work has been featured in the New York Times. Mike previously worked as a software engineer at an AI startup called Lattice Data, and as a research engineer at Meta’s applied machine learning group. Mike and Andrew designed and taught a new version of Stanford’s CS224S: Spoken Language Processing in 2022.
Learn and apply skills with real-world projects.
- Simulations of annotation errors and a model evaluation framework.
- Annotation analysis for (1) a bounding-box task for object detection and (2) a text span task for entity recognition.
- Train deep learning models in two different modalities: text and images.
- How to inspect and improve data quality and annotation quality.
- How to identify and remove data anomalies or outliers.
- The types of annotation errors and their effects on model performance.
- Data analysis in NLP and computer vision.
- Comfort with popular deep learning tools like Weights and Biases, ONNX, and FastAPI.
- Integration tests, regression tests, and directionality tests for model quality assurance.
- A MetaFlow pipeline that chains together training, evaluation, and deployment on a benchmark dataset of handwritten digits.
- To construct reproducible end-to-end machine learning workflows.
- To finetune small networks on top of foundation models in computer vision.
- Post-training processing (such as exporting, tracking, compression) of deep learning models for deployment.
- Best practices for continuous testing of deep learning models.
- Tools to identify which examples to prioritize for labeling.
- Tools to noisily label large batches of data quickly without a third party service.
- A lightweight web application in Flask that supports human-in-the-loop labeling.
- The role of active learning and self-learning in a deep learning framework.
- How to use unlabeled data and model uncertainty to improve performance.
- Best practices for designing web applications with embedded ML models.
- Pipeline to handle the appearance of a new label class.
- Repair models in response to adversarial examples in a visual classification task with outlier image watermarks.
- Monitoring tools to track model performance and detect distribution shifts.
- How to identify and handle distribution shift and adversarial examples.
- The different types of distribution shift in NLP and computer vision.
- Data augmentation techniques for model robustness.
- Leverage the implemented workflows to quickly retrain and deploy a model.
Work on projects that bring your learning to life.
Made to be directly applicable in your work.
Live access to experts
Sessions and Q&As with our expert instructors, along with real-world projects.
Network & community
Core reviews a study groups. Share experiences and learn alongside a global network of professionals.
Support & accountability
We have a system in place to make sure you complete the course, and to help nudge you along the way.
Course success stories
Learn together and share experiences with other industry professionals
I believe this course should be required in any data-science curriculum. We gained practical skills to tackle with problems that data scientists and machine learning engineers often face when dealing with real-world messy data.
DCDL has taken my experience with ML from modeling datasets in Colab notebooks to working in a full ML system in a codebase. We touched upon the full lifecycle of ML — from annotating and cleaning data, to model training, to evaluation and testing, deployment, and monitoring. What an incredibly insightful 4 weeks of learning!
This course is for...
Students who want to learn the infrastructure and operations behind practical deep learning for real world applications.
Students who have taken the first two courses in the CoRise ML foundations track.
Data scientists and research engineers looking for best practices in building and maintaining deep learning models.
Familiarity with Python, and comfortable with reading documentation for learning new tools. CoRise Python for Machine Learning course or equivalent.
Experience in basic machine learning and data science. CoRise Introduction to Applied ML: Supervised Learning course or equivalent.
Basic web development with tools like Flask. Students do not need to be experts at building web applications.
Basic experience in deep learning, including using PyTorch. CoRise Deep learning essentials, ML Coursera course, or equivalent.