Sold out, but you can still join the waitlist!
Course logo

DataPrepOps and the Practice of Data-Centric AI

Data preparation is rarely seen as an actual discipline that can be taught and learned. Many data scientists currently "improvise" their approach to data preparation. However, a rigorous approach to this first step can yield information data scientists can use to develop, train, and tune ML models. We've developed this course to address this dichotomy in data science and to change the perception of data preparation. The DataPrepOps concepts we will cover in this course can help machine-learning practitioners develop solid, practical data preparation skills they can use across their ML projects.

Instructor profile photo
Jennifer Prendki
Founder and CEO, Alectio
Real-world projects that teach you industry skills.
Learn alongside a small group of your professional peers
Part-time program with 2 live events per week:
Tuesday @ 6:00 PM UTC
Project Session
Thursday @ 6:00 PM UTC
Next Cohort
February 6, 2023
4 weeks
US$ 400
or included with membership

Course taught by expert instructors

Instructor Photo
Affiliation logo

Jennifer Prendki

Founder and CEO, Alectio

Dr. Jennifer Prendki is the founder and CEO of Alectio, the first startup fully focused on DataPrepOps - the discipline focusing on the automation and operationalization of Data Preparation. Her team are on a mission to help ML teams build models with less data, and hence more cost-efficiently. Prior to Alectio, Jennifer was the VP of Machine Learning at Figure Eight; she also built an entire ML function from scratch at Atlassian, and led multiple Data Science projects on the Search team at Walmart Labs.

The course

Learn and apply skills with real-world projects.

Who is it for?
  • Experienced data scientists who wish to build a theoretical understanding of Data Preparation techniques

  • Data scientists interested in both the theoretical and practical aspects of Data-Centric AI

  • MLOps engineers who desire to learn about the operational side of Data-Centric AI

Prerequisites / Commitment
  • Familiarity with fundamental machine learning concepts, especially supervised machine learning

  • Familiarity with software development in Python

  • Familiarity with basic data orchestration tools such as Airflow or Flyte

Not ready?

Try these prep courses first

  • A brief history of AI winters, and how Big Data and ImageNet got us moving forward
  • How data-centric AI is different from model-centric AI, and why the data science field is adopting the data-centric AI model
  • The many aspects of data-centric AI, and why it isn't just another term for active learning or Human-in-the-Loop machine learning
  • What data preparation is, and what it is not
  • Operational challenges with data-centric AI
  • The economic benefits of data-centric AI
You will perform a few experiments using the same models from prior weeks, but with different subsets of data.
  • You will test different data-centric AI techniques like data labeling and data augmentation, and evaluate their impact on model performance
  • You will build intuition on why bad data preparation can lead to unrecoverable biases
  • You will begin to develop automated approaches to data preparation, specifically data curation
  • Types of data annotation for all data modalities
  • Commercial aspects of data labeling, and how to best choose a labeling partner for a particular project
  • Best practices for manual data labeling
  • Human-in-the-Loop data labeling and how to set up a Human-in-the-Loop data labeling pipeline in practice
  • Auto-labeling and when (and when not) to use it
  • The Snorkel algorithm and when (and when not) to use it
  • New concepts in data labeling
Your goal with this project will be to fully annotate from scratch the data that we will use for the rest of the course, and to annotate the data as accurately as possible.
  • You will upload the dataset
  • You will manually annotate some of the data
  • You will find a suitable model to annotate the remaining data using an auto-labeling approach
  • Advanced participants will work on an implementation of the Snorkel algorithm
  • You will have access to several open-source annotation tools throughout this project
  • Everything you need to know about active learning
  • The difference between pooling and streaming active learning
  • How active learning relates to online learning
  • Many, many querying strategies
  • Basic machine-learning and reinforcement-learning techniques for active learning
You will run your own active learning process on a given notebook.
  • You will practice some common querying strategies
  • You will tune off-the-shelf querying strategies and measure the impact of that tuning on the learning process
  • You will code and test several of your own querying strategies
  • You will learn how to measure and track the performance of your active learning process
  • Practical challenges around building a training pipeline to support data-centric AI
  • How to build an MLOps pipeline that incorporates the iteration and feedback loops required for Human-in-the-Loop ML and data-centric AI
  • DataPrepOps MVP: a basic pipeline to get things off the ground
  • Tips for integrating popular and open-source data-labeling APIs into an iterative training pipeline
  • What continuous labeling is, and why it is necessary for ML observability and online learning
  • How to incorporate data augmentation and synthetic data generation into traditional MLOps
You will be challenged to build a mini data-centric AI MLOps pipeline with Airflow.
  • You will set up a basic data-centric AI pipeline
  • You will integrate into the pipeline an auto-labeling process that allows you to annotate data continuously
  • (Time permitting) You will incorporate automated data-and-labeling quality management and basic control loops into your pipeline to ensure it runs properly

Real-world projects

Work on projects that bring your learning to life.
Made to be directly applicable in your work.

Live access to experts

Sessions and Q&As with our expert instructors, along with real-world projects.

Network & community

Core reviews a study groups. Share experiences and learn alongside a global network of professionals.

Support & accountability

We have a system in place to make sure you complete the course, and to help nudge you along the way.

Get reimbursed by your company

More than half of learners get their Courses and Memberships reimbursed by their company.

Hundreds of companies have dedicated L&D and education budgets that have covered the costs.


Frequently Asked Questions

Still not sure?

Get in touch and we'll help you decide.

Keep in touch for updates, discounts, and new courses.

Questions? Ask us anything at

© 2021-2022 CoRise Education