I design and build ML systems that hold up in production.
Focused on evaluation, boundaries, and systems that fail loudly instead of silently.
I build production ML systems, from classical ML to GenAI. I work on reliability, evaluation, system design, and MLOps. I've built reproducible evaluation pipelines in research and production systems that serve real users.
How I think about production ML
Evaluation-first
If you can't measure behavior, you can't ship reliably.
Boundaries & contracts
Systems should know when to answer and when to abstain.
Operability
Monitoring, failure modes, and regression tests from day one.
Where to next
Home is a starting point. If you want more detail, these pages go deeper.
The story behind my systems mindset: foundations, research, and production work.
Featured projects, recent experiments, and tools — curated by intent, not GitHub noise.
Short notes on evaluation, reliability, and production ML systems.
Want to talk about building reliable ML systems?
Let's discuss evaluation, system design, or production reliability. I'm always interested in learning from others working on similar problems.
Email me