Data Engineering with Dagster
A well built data platform allows for fast iteration and safe deployments. This course will teach you how to design, build, and maintain a data platform that supports a wide range of data tasks.You will start by taking a simple data workflow, common to most companies, and deconstructing it to its core components. By the end of the course, you will reimplement the pipeline using modern data frameworks, running in the cloud.
Staff Data Engineer at Dutchie
Dennis is a Staff Data Engineer at Dutchie. He has over a decade+ experience in the data space, focusing on platform engineering and data infrastructure. He has helped many companies improve their data stacks, starting from Wolfram Alpha and Epic, and most recently Dutchie and Drizly.
Companies have a growing number of data applications to perform mission-critical tasks. Despite the increasingly important roles these applications are playing, many are not well understood by the engineers who are implementing them. This hinders experimentation and increases the risk involved with future changes and feature implementation.
Part of the problem is that seemingly simple data applications often rely on numerous systems. This complexity is not always appreciated when data teams stand up simple ad hoc processes. As a result, these systems often scale poorly as data volumes increase.
Over the next four weeks, we will discuss the importance of an effective data platform and the attributes that empower data analysts and data scientists to do their best work. We will design a data platform using Dagster, and we will tailor the environment using Docker for local development and fast iteration. We will use AWS to manage and run our platform at scale. By the end of class, you will feel confident implementing a data stack tailored to your organization, and you will feel confident knowing that your platform will be able to handle any specific workflow you need to support.
- The challenges of building data applications
- How to deconstruct a data workflows into its key components
- Docker and designing a local environment
- Fundamentals of Dagster
- How to handle some of the complexities of data workflows
- How to view individuals workflows as existing within a full data application
- Building both scheduled and event based processes
- Manage and isolate dependencies
- Define software assets within your Dagster project
- Discuss how to deploy Dagster
Dennis Hume is an expert and mentor on how to build out data platforms and support data teams. Dennis knows the past/present/future of the modern data stack intimately, and how to enable companies to graduate from analytics engineering batch SQL workflows to realtime, python, and machine learning services. He's a leading voice in the data community (see his talks on Dagster, Materialize, Modern Data Stack conference, and more). Any analytics engineer or data scientist would be lucky to work with Denins, and anyone lucky enough to learn from Dennis should hop at the opportunity!
Dennis brings with him ample industry experience when it comes to designing and delivering on data platform solutions. In addition to building out the data platform at Drizly, he has been instrumental in leveling up others on topics pertaining to the data space. The clarity of thought and the structured approach to delivering solutions makes it very easy for one to engage and learn with Dennis.
Data Engineers looking to build more reliable pipelines to support their analytics team
Software engineers who want to be more involved in building reliable data applications
Ability to write Python and work with documented libraries
Comfort working with Docker basics (start, stop) and the command line