Building Computer Vision Applications
This course provides an introduction to machine learning for computer vision with a focus on practical applications relevant to industry teams. In this course, we will “reverse-engineer” a number of applications, such as traffic flow analysis, digital medicine, optical character recognition, and video analytics. We will discuss the fundamental machine learning principles required to build these applications, focusing on practical tools instead of algorithmic details. You will build these applications from scratch, using open-source tools that cover the full stack of modern machine learning, from datasets to deployment. By the end of the course, you will have built a portfolio of computer vision applications that you can reference or share with your team and colleagues.
Abubakar Abid
Machine Learning Team Lead at Hugging Face
Abubakar has been building machine learning models for over a decade. He did his PhD at Stanford in deep learning applied to medical images and videos. During his PhD, he developed Gradio (www.gradio.dev), an open-source Python library for creating GUIs for machine learning models. Since Gradio’s acquisition by Hugging Face, Abubakar continues to lead the Gradio team and also teaches machine learning at Hugging Face and beyond!

Getting computers to “see” has been one of the defining goals of machine learning since its inception in the mid-1900s. But in the last decade, we have seen significant progress towards this goal, with the deep learning for vision revolution beginning in 2012. Nowadays, computer vision applications are woven into technology, ranging from the way we unlock our phones (facial recognition) to the ways we might be driving in the future (autonomous driving).
There are a lot of courses that teach computer vision, but most focus on details of specific algorithms. In this course, we take a different approach: we build practical applications. We will take four exciting applications of computer vision: Face ID, plant identification, self-driving, and image generation, and reverse engineer them to understand the machine learning principles used to build and deploy such applications.
As part of the course, you will build each of these applications from scratch, using open-source tools that cover the full stack of modern machine learning, from datasets to deployment. By the end of the course, you will have built a portfolio of computer vision applications that you can reference or share with your team and colleagues.
- Steps to do machine learning: from building datasets to deploying applications
- Understand what is image classification
- An overview of algorithms for image classification, including an overview of recent progress in deep deep learning in the last decade (from AlexNet to Transformers)
- Training vs. fine-tuning machine learning models
- How to download models from the Hugging Face Hub using the transformers library
- How to finetune a model for image classification
- The machine learning system that powers a self-driving car
- The different kinds of image segmentation (semantic segmentation, object detection)
- An overview of algorithms for image segmentation
- How to download datasets from the Hugging Face Hub using the datasets library
- How to train an image segmentation model from scratch
- The machine learning system that powers the FaceID authentication system for Apple iPhones
- The different ways images can be converted into embeddings
- The different uses of embeddings
- How to download datasets from the Hugging Face Hub using the datasets library
- How to train an image segmentation model from scratch
- Machine learning models for generating images including GANs and diffusion models
- The different uses of image generation
- The ethical risks and biases that are part of such applications
- How to train an image generation model from scratch
- How to add class conditioning so that you can generate specific kinds of images
It has been my pleasure to be Abubakar Abid's student as he has taught visual algorithms. Abubakar is an excellent educator with a great ability to explain complex concepts in a simple and intuitive manner, something that has made learning enjoyable for me. Not only is he very highly capable from the technical aspects, but he is also an outstanding communicator
Abubakar is amongst the best teachers I’ve ever had. I was entirely new to machine learning yet he was able to distill complicated concepts clearly and effectively. I felt everywhere else was teaching me just the surface, but Abubakar was able to tie the theory, practice and intuition together.
Software engineers who want to build vision applications for prototyping or deployment without worrying too much about the underlying algorithmic details.
Machine learning engineers who may already know the algorithms but are interested in building practical computer vision applications using the best open-source tools.
Ability to write Python proficiently and work with documented libraries
Experience using Jupyter notebooks or Google Colab notebooks recommended
Basic understanding of machine learning (no experience in computer vision is required)