Should Someone Like Me Be Building a Mental Health App?
In 2026 I borrowed Judith Herman's Trauma and Recovery from the library. I've long been interested in counseling and psychoanalysis, and there were personal reasons the model didn't read as foreign to me. I'll leave that part there.
What stayed with me was the structure: recovery moves through three stages — establishing safety, remembrance and mourning, reconnection. Reading it, I kept thinking the structure could be the skeleton of a writing tool. If what you should be writing differs by which stage you're in, then a tool could hand you prompts that match your stage. That's how HealFrame started.
The heaviest feature is the least visible one
In a mental health app where AI guides your writing, the technically heaviest part isn't clever prompts. It's reading crisis signals in what the user writes.
HealFrame runs a Gemini-based crisis-detection pipeline that classifies each entry as GREEN, AMBER, or RED. The most important design decision was asymmetry. Input classification fails closed — if the classification is uncertain or the system is degraded, it errs toward the safe interpretation and treats the entry as a crisis. Output fails open — a misbehaving safety check is not allowed to block the response a user needs from reaching them. The cost of missing a crisis and the cost of over-flagging are not symmetric, so the system's failure directions shouldn't be either.
Validation follows the same logic. I built an LLM-judge eval harness that repeatedly tests the pipeline against one non-negotiable bar: zero missed crisis signals on the eval set. Every other metric is up for discussion; that tolerance isn't. I want to be precise about what that bar is, though — it's a passing criterion for my eval set, not proof of zero misses in the wild, and an LLM judging an LLM carries an obvious circularity: the judge itself can be wrong. That's exactly why the asymmetry above matters — for whatever the evals don't reach, the fail-closed input layer that treats uncertainty as crisis is the last layer underneath.
But does it actually work?
That was the engineering part. Here's the honest part.
I'm not the only one building crisis detection. Plenty of AI companies are building it, with far more people and far more data. And still, out in the world, people keep attempting and completing suicide. Between "my eval harness passes" and "this system works on the darkest night of a real person's life" there is a distance I cannot prove across.
And underneath that sits a more basic question. I'm not a PhD. I'm not a clinician. I'm a developer who read a book, cares about this, and has some lived experience. Is someone like that allowed to build an app that touches people's minds? That question never went away while I was building. It still hasn't.
Lines instead of answers
I haven't answered the question. What I have instead is a set of lines I hold while building.
First, this app is not therapy, and it never claims to be — it's a tool that supports stage-appropriate writing, and that's where it stops. Second, the most dangerous failure (a missed crisis) gets a zero-tolerance bar, enforced by an eval harness rather than by good intentions. Third, I try not to forget that I don't know. I've come to see the anxiety of not being an expert less as something to get rid of and more as a safety mechanism — the thing that keeps me building carefully in a domain that demands it.
I still have no certainty that I should be building this. But lately I think someone building without certainty might be less dangerous, in this particular domain, than someone building with it.
If you're going through a hard time right now, you don't have to carry it alone. In the US, call or text 988 (Suicide & Crisis Lifeline). Elsewhere, findahelpline.com lists free, confidential services by country.