Skip to main content
Eval engineering for AI developers is a 5-part course run as a series of live streams, hosted by Jim Bennett, Principal Developer Advocate at Galileo. 90% of AI agents don’t make it successfully to production. The biggest reason is the AI engineers building these apps don’t have a clear way of evaluating that these agents are doing what they should do, and using the results of this evaluation to fix them. In this course, you will learn all about evals for AI applications. You’ll start with some out-of-the-box metrics and learn about evals, then move onto understanding observability for AI apps, analyzing failure states, defining custom metrics, then finally using these across your whole SDLC. This is hands on, so be prepared to write some code, create some metrics, and do some homework!

Prerequisites

Lessons

Lesson 1 - Hello Evals

In this first lesson, you will
  • Learn what evals are
  • Learn how you can use simple evals to detect issues in an AI application
  • Get hands on adding an eval to an app

Lesson 2 - Observability in AI apps

In this second lesson, you will
  • Use observability to visualize the components of a typical multi-agent AI application
  • Learn about the different components that make up these applications
  • Apply some out-of-the-box metrics to start to get an understanding of how your application is working

Lesson 3 - Failure analysis

In this third lesson, you will
  • Learn the process for finding failures in your AI applications
  • Build out rubrics for identifying failure cases
  • Learn how to group failure cases to themes that can be used for building evals

Lesson 4 - Build custom metrics

In this fourth lesson, you will
  • Build datasets of known inputs and outputs for cases that pass and fail
  • Learn how to build custom metrics for your failure cases
  • Determine the success of your metrics by measuring true and false positives and negatives

Lesson 5 - Eval engineering in your SDLC

In this final lesson, you will
  • Learn how evals fit into the SDLC
  • Build unit tests using evals that can be run in your CI/CD pipeline
  • Learn about using evals as guardrails at runtime
  • Add observability and alerts to detect when your application is failing

Course materials

All the course materials are available on the Galileo GitHub.