AdviceBuddy is a 24/7 AI companion for mental wellness — engineered for empathy, monitored for safety, and architected so that highly sensitive conversations never leave a controlled environment.
A general-purpose chatbot can ship rough edges. A mental wellness companion cannot. Every layer — model, infrastructure, data, billing — has to be safer-by-default than the industry norm.
The AI had to be conversational and warm — and immediately step aside when a user expresses crisis, surfacing official hotlines instead of generated text.
We couldn't route deeply personal conversations through general-purpose public LLM APIs. Inference had to happen on infrastructure we controlled.
Free, Basic, Pro, Pro Plus, and Premium — each with its own message budgets, model access, and rate-limit ceilings, all enforced server-side without leaks.
An empathy product cannot stall. Cold-starts, queue depth, and inference cost all had to be solved on a shoestring without compromising the experience.
We deployed Llama 3.1-8B-Instruct on serverless GPUs, paired it with a deterministic safety layer, and gave the platform a serverless backend that stays cheap until traffic genuinely demands more.
When a user expresses severe distress, the system halts the AI and surfaces human-verified resources before any generated response can reach them.
A deterministic safety net that runs before model inference and instantly hands off to verified crisis resources — no chance of a generated response in those moments.
Row Level Security at the Postgres layer means users can strictly only read their own data. Backend service-role operations are sealed off from the client SDK.
Every message is Zod-validated and length-capped before inference, preventing injection, buffer attacks, and runaway context windows.
Stripe handles billing; Supabase enforces the limits. Every plan upgrade or cancellation flows through a webhook into a single source of truth that the API consults on every request.
A calming deep teal palette, soft mint backgrounds, and warm sand tones — paired with type that reads like it's listening, not lecturing.
The deterministic safety layer ensures the model never speaks during a crisis moment — verified resources are always shown first, every time.
Deeply personal conversations stay on infrastructure under our client's control — no third-party LLM provider ever sees the raw content.
Modal's serverless GPU lanes keep latency in conversational territory while costs stay tied to actual usage, not idle capacity.
Stripe-driven subscription engine with rate limits enforced server-side — ready to scale users without leaking limits or billing inconsistencies.
If your product handles sensitive conversations and needs to behave responsibly — let's talk.