What I optimize for

  • Latency: conversational and real-time workflows (STT/LLM/TTS) with pragmatic fallbacks.
  • Cost: quantization, hybrid deterministic + ML strategies, and infra-aware model selection.
  • Reliability: isolation boundaries, rate limiting, monitoring-friendly APIs, and safe failure modes.
  • Scale: async pipelines and orchestration for high-volume OCR and web crawling workloads.

Representative impact

  • OCR at scale: integrated production OCR to handle 10K+ documents/day.
  • Efficiency: VLM quantization yielding ~4× memory reduction and ~40% infra cost reduction.
  • Voice systems: end-to-end pipelines with ~sub-3s average turn latency and multi-backend TTS uptime strategy.
  • Web-scale: detection over 2M+ sites and 1M+ requests/day async crawling orchestration.

Details and context live in the full CV and the main page experience timeline.

Quick links