Prerequisites
- A basic knowledge of Python, and Python 3.10 or higher installed
- An OpenAI API key. Other LLMs are supported, but the code samples use the OpenAI SDK.
- A Galileo account. The free account is fine for these lessons.
- A clone or download of the Eval engineering GitHub repo
Lessons
Lesson 1 - Hello Evals
In this first lesson, you will- Learn what evals are
- Learn how you can use simple evals to detect issues in an AI application
- Get hands on adding an eval to an app
Lesson 2 - Observability in AI apps
In this second lesson, you will- Use observability to visualize the components of a typical multi-agent AI application
- Learn about the different components that make up these applications
- Apply some out-of-the-box metrics to start to get an understanding of how your application is working
Lesson 3 - Failure analysis
In this third lesson, you will- Learn the process for finding failures in your AI applications
- Build out rubrics for identifying failure cases
- Learn how to group failure cases to themes that can be used for building evals
Lesson 4 - Build custom metrics
In this fourth lesson, you will- Build datasets of known inputs and outputs for cases that pass and fail
- Learn how to build custom metrics for your failure cases
- Determine the success of your metrics by measuring true and false positives and negatives
Lesson 5 - Eval engineering in your SDLC
In this final lesson, you will- Learn how evals fit into the SDLC
- Build unit tests using evals that can be run in your CI/CD pipeline
- Learn about using evals as guardrails at runtime
- Add observability and alerts to detect when your application is failing