TeleMed Hearth
A 72-year-old in rural Dâmbovița County describes her symptoms by voice in Romanian and arrives at her family doctor's video call with a structured triage already in his hands. The AI runs entirely on her phone: Gemma 4 E4B via LiteRT-LM 0.11.0, ~3.5 GB on-device, no cloud inference, no third-party LLM API. The doctor reviews the triage report alongside a WebRTC video call from his browser. Clinical records dual-write to encrypted local SQLite and to the clinic's self-hosted Medplum FHIR server over a WireGuard tunnel. Built in six weeks under agent-orchestrated development: a fleet of specialized AI agents handled implementation while I architected, audited, and shipped. Co-developed with practicing physicians at Clinica Medicală Dr. Bogheanu, Brănești. Submitted to the Kaggle Gemma 4 Good Hackathon, May 2026.
Local-first AI Factory
The reason I can ship local-first AI prototypes fast is the development environment I built to do it. An ASUS Ascent GX10 in my home office — NVIDIA GB10 Grace Blackwell, ARM64, 128 GB unified memory — runs a ten-layer agentic stack end to end: inference (llama-swap orchestrating Nemotron and Qwen3-Next vLLM containers), agents (Paperclip plus Hermes, running CEO and CTO personas), automation (n8n, Crawl4AI), and data (Qdrant with 10,029 pages of stack documentation, Postgres, Redis). Eighteen containers, two compose files, zero data leaving the machine. I use it daily to build, audit, and iterate on the systems I sell to clients — with the same air-gapped discipline I'm offering them. The factory isn't a research toy — it's how I ship client work fast without sacrificing audit discipline.
War stories on ARM64 toolchain, agent orchestration, and the GB10 in production — forthcoming on Medium