[ Case Study · 02 / HealthTech AI ]

A safer kind of AI for when conversation matters most.

AdviceBuddy is a 24/7 AI companion for mental wellness — engineered for empathy, monitored for safety, and architected so that highly sensitive conversations never leave a controlled environment.

Client

Confidential · US

Industry

HealthTech · Mental Wellness

Region

United States

Status

Live in Production

Mental health AI is the most demanding kind of AI you can ship.

A general-purpose chatbot can ship rough edges. A mental wellness companion cannot. Every layer — model, infrastructure, data, billing — has to be safer-by-default than the industry norm.

/ 01 · Safety

Empathetic, but never reckless

The AI had to be conversational and warm — and immediately step aside when a user expresses crisis, surfacing official hotlines instead of generated text.

/ 02 · Privacy

Sensitive data needs sovereignty

We couldn't route deeply personal conversations through general-purpose public LLM APIs. Inference had to happen on infrastructure we controlled.

/ 03 · Monetization

Five tiers, dynamic limits

Free, Basic, Pro, Pro Plus, and Premium — each with its own message budgets, model access, and rate-limit ceilings, all enforced server-side without leaks.

/ 04 · Reliability

Low latency, every conversation

An empathy product cannot stall. Cold-starts, queue depth, and inference cost all had to be solved on a shoestring without compromising the experience.

A self-hosted LLM stack with safety wired into every layer.

We deployed Llama 3.1-8B-Instruct on serverless GPUs, paired it with a deterministic safety layer, and gave the platform a serverless backend that stays cheap until traffic genuinely demands more.

/ Frontend & Experience

Calming & Accessible

Next.js 14 (App Router) · React · TypeScript (Strict)
Tailwind CSS mobile-first, ARIA-compliant components
Real-time typing indicators and a soothing visual system tuned for users in distress
LocalStorage chat persistence — keeping PHI off the central database where possible

/ AI & Backend

Self-Hosted & Private

Llama 3.1-8B-Instruct deployed on Modal serverless A10G GPUs
Supabase Postgres with Row Level Security & service-role isolation
Stripe webhooks + tiered subscription engine
Zod validation & truncation across every entry point

Crisis detection that overrides the model.

When a user expresses severe distress, the system halts the AI and surfaces human-verified resources before any generated response can reach them.

How have you been feeling these last few days?

Honestly, a bit overwhelmed. I don't know what to do.

That sounds heavy. Let's break it down together — what's been weighing on you most?

⚑ Crisis Override · Resources surfaced
988 Suicide & Crisis Lifeline · Crisis Text Line "HOME" → 741741

Keyword + Heuristic Detection

A deterministic safety net that runs before model inference and instantly hands off to verified crisis resources — no chance of a generated response in those moments.

RLS & Service-Role Isolation

Row Level Security at the Postgres layer means users can strictly only read their own data. Backend service-role operations are sealed off from the client SDK.

Validated & Truncated I/O

Every message is Zod-validated and length-capped before inference, preventing injection, buffer attacks, and runaway context windows.

Five tiers, enforced at the database — not the UI.

Stripe handles billing; Supabase enforces the limits. Every plan upgrade or cancellation flows through a webhook into a single source of truth that the API consults on every request.

/ Tier 01 Guest Limited daily messages, no history.

/ Tier 02 Basic Authenticated, raised message ceiling.

/ Tier 03 · Featured Pro Extended history & faster lanes.

/ Tier 04 Pro Plus Priority routing & deeper memory.

/ Tier 05 Premium All capabilities, highest ceilings.

A live HealthTech product that doesn't compromise on safety.

/ Safety

Zero generated crisis responses

The deterministic safety layer ensures the model never speaks during a crisis moment — verified resources are always shown first, every time.

/ Privacy

Self-hosted inference

Deeply personal conversations stay on infrastructure under our client's control — no third-party LLM provider ever sees the raw content.

/ Performance

1–2s warm replies, <500ms API

Modal's serverless GPU lanes keep latency in conversational territory while costs stay tied to actual usage, not idle capacity.

/ Business

Five-tier monetization, live

Stripe-driven subscription engine with rate limits enforced server-side — ready to scale users without leaking limits or billing inconsistencies.

Llama 3.1-8B Modal Serverless GPU Next.js 14 React TypeScript Tailwind CSS Supabase Postgres Row Level Security Stripe Subscriptions Zod Validation Vercel Edge ARIA / WCAG