I design and build ML systems that hold up in production.

Focused on evaluation, boundaries, and systems that fail loudly instead of silently.

I build production ML systems, from classical ML to GenAI. I work on reliability, evaluation, system design, and MLOps. I've built reproducible evaluation pipelines in research and production systems that serve real users.

How I think about production ML

Evaluation-first

If you can't measure behavior, you can't ship reliably.

Boundaries & contracts

Systems should know when to answer and when to abstain.

Operability

Monitoring, failure modes, and regression tests from day one.

Want to talk about building reliable ML systems?

Let's discuss evaluation, system design, or production reliability. I'm always interested in learning from others working on similar problems.

Email me