Senior QA Lead, 20 years. Now building and auditing production AI agent systems, local-first by default.

Your AI features stop being a demo and start being a product. The parts of an LLM system that fail quietly — eval coverage, hidden failure modes, the gap between what the model emits and what your application routes on — are exactly the parts 20 years of senior QA leadership prepares you to find. Recently I shipped an on-device medical AI in Romanian for elderly rural patients, co-developed with practicing physicians. Same discipline, tighter constraints.

See the offers ↓ rares.bogheanu@pm.me

Working with me

Fractional QA leadership for AI-native teams

For AI-native startups, shipping LLM features and quietly aware that their testing discipline is weaker than the rest of their engineering. I run QA strategy: eval coverage, regression suites, defect lifecycle, release gates. I embed enough to be useful, not enough to slow you down. 20+ hours per week, remote, EU/US-friendly.

From €66/hr · 20+ hours per week · remote

Get in touch →

Local-first AI prototypes for healthcare and regulated domains

For health-tech, EU privacy-first companies, and clinics with regulatory pressure. I prototype on-device or self-hosted AI systems where data residency is a hard requirement, not a feature. TeleMed Hearth is the proof: Gemma 4 E4B on the patient's phone, FHIR records on the clinic's server, no cloud LLM API in the loop.

From €4,500 for a two-week local-first AI prototype sprint · €90/hr for ongoing work

Get in touch →

Recent work

TeleMed Hearth

A 72-year-old in rural Dâmbovița County describes her symptoms by voice in Romanian and arrives at her family doctor's video call with a structured triage already in his hands. The AI runs entirely on her phone: Gemma 4 E4B via LiteRT-LM 0.11.0, ~3.5 GB on-device, no cloud inference, no third-party LLM API. The doctor reviews the triage report alongside a WebRTC video call from his browser. Clinical records dual-write to encrypted local SQLite and to the clinic's self-hosted Medplum FHIR server over a WireGuard tunnel. Built in six weeks under agent-orchestrated development: a fleet of specialized AI agents handled implementation while I architected, audited, and shipped. Co-developed with practicing physicians at Clinica Medicală Dr. Bogheanu, Brănești. Submitted to the Kaggle Gemma 4 Good Hackathon, May 2026.

Repository · Demo video · LoRA adapter on Hugging Face

Local-first AI Factory

The reason I can ship local-first AI prototypes fast is the development environment I built to do it. An ASUS Ascent GX10 in my home office — NVIDIA GB10 Grace Blackwell, ARM64, 128 GB unified memory — runs a ten-layer agentic stack end to end: inference (llama-swap orchestrating Nemotron and Qwen3-Next vLLM containers), agents (Paperclip plus Hermes, running CEO and CTO personas), automation (n8n, Crawl4AI), and data (Qdrant with 10,029 pages of stack documentation, Postgres, Redis). Eighteen containers, two compose files, zero data leaving the machine. I use it daily to build, audit, and iterate on the systems I sell to clients — with the same air-gapped discipline I'm offering them. The factory isn't a research toy — it's how I ship client work fast without sacrificing audit discipline.

War stories on ARM64 toolchain, agent orchestration, and the GB10 in production — forthcoming on Medium

About

I build AI systems that work the way they claim to, and I audit the ones that don't. Local-first by default, ARM64-native, healthcare-grade where it matters. That direction sits on 20 years as a Senior QA Lead. Ubisoft (managing 32-person multiplayer test units on AAA titles, then acting department manager for 50+ during crunch). IBM (enterprise SAN storage on AIX, Linux, Solaris, HP-UX). O2 / Telefónica UK (billing migrations under regulatory deadline). ONRC (executive technical liaison for a national CRM across four national databases). 4D Pipeline (sole QA lead for Intel, Apple Vision Pro spatial computing, Unreal Engine digital twins).