Performance engineer at Intel working on LLM inference, low-level runtime optimization, and AI compiler development for XPUs. I write about what I learn.
I'm a systems software engineer at Intel focused on making AI workloads run faster — telemetry-driven runtime optimization frameworks, cache/memory tuning, LLVM-based PGO, and AI compiler development targeting XPUs. I care about what happens at the hardware/software boundary, and I'm drawn to problems that live in the gap between ML research and production systems.
Before Intel, I built NLP systems and job recommendation engines at Phenom, and did AI/multimodal research at CSU Sacramento. My projects span autonomous agents (PEARL), RAG and GraphRAG pipelines (DrugGuard), local speech intelligence (VaultASR), and LLM inference benchmarking.
Currently working toward contributing to vLLM open source and writing about systems I build and study along the way.
Local, private speech-to-text pipeline with multi-speaker diarization, Silero VAD v5, and hardware-accelerated inference. Transcribes hours of audio in minutes with zero data leaving the device.
Autonomous AI agent with a cognitive architecture for reliable task execution. Features dynamic task decomposition, constrained decoding, and experience-based learning via persistent vector memory.
Comparative study of RAG vs GraphRAG for safety-critical medical information retrieval. Graph-based knowledge representation significantly improves retrieval accuracy over flat vector search for drug interaction queries.
Full resume covering Intel, Phenom, CSU Sacramento, projects (PEARL, DrugGuard, ARIA), and publications. Updated April 2025.
Open to interesting conversations about performance optimization, LLM inference, open source, or new opportunities.