Founder

Building startups taught me things that no course or research role could, about tradeoffs under uncertainty, the cost of technical debt at the worst possible time, and what it actually means to own a system end-to-end. The two ventures below shaped how I think about ML engineering as much as any paper or theorem.

Synthure | Care Clarity AI

Founder & ML Engineer · New York, NY · Jul 2023 to Jul 2025

Live platform → synthure.vercel.app · Code → github.com/aravinds-kannappan/Synthure

A single clinical encounter creates work for four people who each open different software: a patient trying to understand a diagnosis and a bill, a physician coding the visit and chasing prior authorization, a hospital revenue-cycle team constructing and defending the claim, and an employer benefits team watching population cost. Synthure takes one messy clinical note (a SOAP note, discharge summary, referral, ER note, or pasted physician text) and produces a single auditable, structured record that all four portals are projections of, not four separate dashboards. The whole design falls out of taking two constraints seriously: patient data cannot leave the building, and a model that invents a billing code is worse than useless.

The pipeline runs de-identification and biomedical NER on the user’s own device: OpenMed models (an int8 ONNX PII/de-identification model plus disease and drug NER, run via transformers.js in the browser) scan the note for the 18 Safe Harbor identifier classes and extract entities, so only de-identified text and a list of entities ever reach the API. OpenMed is the clinical NLP backbone; Synthure owns and trains the task-specific models on top: a note-type classifier (TF-IDF plus logistic regression), a rule-based section parser, an ICD reranker, per-field missing-information detectors, and a gradient-boosted claim-readiness predictor with isotonic calibration and an abstention layer. Diagnosis codes are chosen from the official CDC/NCHS ICD-10-CM FY2026 index rather than generated, so an out-of-index code is impossible by construction, and every accepted code is validated against the CMS tabular. Claude is never the runtime decision-maker: it is used only for synthetic data augmentation, weak labeling, adversarial test cases, rubric-based evaluation, and plain-language narration, with four writer agents (Haiku, escalating to Sonnet when a blocking readiness check fails) grounded strictly in the extracted facts and audited by a verify-critique-revise loop. Every figure the product shows is a published amount or visible arithmetic over visible inputs (98,186 ICD codes, roughly 168,000 index terms, 9,724 CMS-priced services, RxNorm drug matching), and there is deliberately no denial-probability score because no public adjudication dataset exists to justify one. On a held-out synthetic test split (real clinical text will score lower, which the repo states plainly) the models report note-type accuracy 1.00, section F1 0.82, ICD top-3 accuracy 0.82 at a 0.0 hallucination rate, claim-readiness AUROC 0.84 and AUPRC 0.90 at ECE 0.054, and abstention lifting accuracy from 0.88 to 0.92. It is a research and prototype-grade workflow system with auditable outputs and a human review step, not a production medical device.

Looking back, the most important thing I learned at Synthure wasn’t technical, it was about the hidden cost of building for a domain you don’t fully control. Healthcare moves slowly for reasons that have nothing to do with technology: procurement cycles, compliance reviews, trust-building with clinical staff. We built a genuinely good system faster than the market could adopt it, and that gap between product readiness and organizational readiness is something I didn’t fully appreciate until I lived it. I also learned that production ML in healthcare demands a different kind of rigor than research ML. Calibration isn’t a nice-to-have when a miscalibrated confidence score could influence a clinical decision. Drift monitoring isn’t optional when payer policies update quarterly. These experiences permanently changed how I think about what it means to build responsibly.

Replays AI

Software Engineer, AI/ML · New York, NY · Sep 2025 to Apr 2026

Live platform → replays-ai.vercel.app · Code → github.com/aravinds-kannappan/ReplaysAI

Sports media has a content-abundance problem: every game generates hours of footage, thousands of structured events, and an audience that wants personalized, contextual highlights, not a broadcast edited for the median fan. Replays AI grew from a recap generator into a personalized, agent-driven sports desk with no signup at all: a streaming feed, narrated highlight reels you can interrupt to ask questions during, a Monte-Carlo dream-team championship simulator, and AI game recaps, all keyed to the teams and players you pick and stored anonymously in the browser. Everything a fan consumes as fact (players, games, scores, play-by-play, box scores, season stats, news, and highlight clips) is real, live ESPN public-API data; the only modeled elements, like the championship forecast, are labeled as such.

The backend is a stateless FastAPI app with no database: state is either fetched from ESPN, cached, or held client-side, which keeps it deployable as serverless functions. A single SportsDataProvider (espn_public.py) wraps every ESPN call behind typed functions, sitting behind a two-tier cache (a 60-second process-local dict plus an optional Redis that degrades silently when absent). The LLM layer is Anthropic-first (claude-opus-4-8, falling back through claude-sonnet-4-6 and claude-haiku-4-5, then gpt-4o-mini), and every LLM path has a deterministic, data-backed fallback, so recaps, reel narration, and the in-app assistant all produce a grounded response with zero API keys configured and the app never hard-fails on a missing or timed-out model. Agents are LLM-backed roles rather than microservices: a beat-writer Recap, a Reel director that ranks real ESPN clips and writes per-clip narration spoken client-side, a CoachAgent and AnalystAgent that supply a chemistry read and an X-factor for the simulator, and an assistant that answers questions about the exact paused clip. The dream-team simulator derives a per-player rating vector from real ESPN season stats, applies the CoachAgent’s chemistry multiplier, and runs 10,000 seeded, reproducible pure-Python seasons into championship odds and a playoff-round distribution.

What Replays taught me above all else was to treat robustness and latency as first-class product requirements, not afterthoughts. Feed tiles, reel tiers, recaps, and simulation results are separate streamable endpoints so the UI fills in progressively instead of blocking on the slowest agent, and query keys are derived from the personalization inputs so results cache per fan. Some of the hardest bugs were about trusting real-world data rather than the model: ESPN’s by-id endpoint sometimes labels a finished game as scheduled, so recaps and the reel list treat “both scores present” as played instead of trusting the status string. Working backward from a strict latency budget, and a rule that the product should degrade gracefully rather than go dark, shaped the entire system: parallel endpoints, a deterministic fallback behind every model, and deriving data live from ESPN so there is nothing to keep in sync. It also deepened my appreciation for the difference between a system that works and a system that scales: the former is a proof of concept, the latter is an engineering discipline.

Aravind Kannappan

Founder

Synthure | Care Clarity AI

Replays AI