The first trial of generative AI therapy shows it might help with depression

The first clinical trial of a therapy bot that uses generative AI suggests it was as effective as human therapy for participants with depression, anxiety, or risk for developing eating disorders. Even so, it doesn’t give a go-ahead to the dozens of companies hyping such technologies while operating in a regulatory gray area. 

A team led by psychiatric researchers and psychologists at the Geisel School of Medicine at Dartmouth College built the tool, called Therabot, and the results were published on March 27 in the New England Journal of Medicine. Many tech companies have built AI tools for therapy, promising that people can talk with a bot more frequently and cheaply than they can with a trained therapist—and that this approach is safe and effective.

Many psychologists and psychiatrists have shared the vision, noting that fewer than half of people with a mental disorder receive therapy, and those who do might get only 45 minutes per week. Researchers have tried to build tech so that more people can access therapy, but they have been held back by two things. 

One, a therapy bot that says the wrong thing could result in real harm. That’s why many researchers have built bots using explicit programming: The software pulls from a finite bank of approved responses (as was the case with Eliza, a mock-psychotherapist computer program built in the 1960s). But this makes them less engaging to chat with, and people lose interest. The second issue is that the hallmarks of good therapeutic relationships—shared goals and collaboration—are hard to replicate in software. 

In 2019, as early large language models like OpenAI’s GPT were taking shape, the researchers at Dartmouth thought generative AI might help overcome these hurdles. They set about building an AI model trained to give evidence-based responses. They first tried building it from general mental-health conversations pulled from internet forums. Then they turned to thousands of hours of transcripts of real sessions with psychotherapists.

“We got a lot of ‘hmm-hmms,’ ‘go ons,’ and then ‘Your problems stem from your relationship with your mother,’” said Michael Heinz, a research psychiatrist at Dartmouth College and Dartmouth Health and first author of the study, in an interview. “Really tropes of what psychotherapy would be, rather than actually what we’d want.”

Dissatisfied, they set to work assembling their own custom data sets based on evidence-based practices, which is what ultimately went into the model. Many AI therapy bots on the market, in contrast, might be just slight variations of foundation models like Meta’s Llama, trained mostly on internet conversations. That poses a problem, especially for topics like disordered eating.

“If you were to say that you want to lose weight,” Heinz says, “they will readily support you in doing that, even if you will often have a low weight to start with.” A human therapist wouldn’t do that. 

To test the bot, the researchers ran an eight-week clinical trial with 210 participants who had symptoms of depression or generalized anxiety disorder or were at high risk for eating disorders. About half had access to Therabot, and a control group did not. Participants responded to prompts from the AI and initiated conversations, averaging about 10 messages per day.

Participants with depression experienced a 51% reduction in symptoms, the best result in the study. Those with anxiety experienced a 31% reduction, and those at risk for eating disorders saw a 19% reduction in concerns about body image and weight. These measurements are based on self-reporting through surveys, a method that’s not perfect but remains one of the best tools researchers have.

These results, Heinz says, are about what one finds in randomized control trials of psychotherapy with 16 hours of human-provided treatment, but the Therabot trial accomplished it in about half the time. “I’ve been working in digital therapeutics for a long time, and I’ve never seen levels of engagement that are prolonged and sustained at this level,” he says.

Jean-Christophe Bélisle-Pipon, an assistant professor of health ethics at Simon Fraser University who has written about AI therapy bots but was not involved in the research, says the results are impressive but notes that just like any other clinical trial, this one doesn’t necessarily represent how the treatment would act in the real world. 

“We remain far from a ‘greenlight’ for widespread clinical deployment,” he wrote in an email.

One issue is the supervision that wider deployment might require. During the beginning of the trial, Heinz says, he personally oversaw all the messages coming in from participants (who consented to the arrangement) to watch out for problematic responses from the bot. If therapy bots needed this oversight, they wouldn’t be able to reach as many people. 

I asked Heinz if he thinks the results validate the burgeoning industry of AI therapy sites.

“Quite the opposite,” he says, cautioning that most don’t appear to train their models on evidence-based practices like cognitive behavioral therapy, and they likely don’t employ a team of trained researchers to monitor interactions. “I have a lot of concerns about the industry and how fast we’re moving without really kind of evaluating this,” he adds.

When AI sites advertise themselves as offering therapy in a legitimate, clinical context, Heinz says, it means they fall under the regulatory purview of the Food and Drug Administration. Thus far, the FDA has not gone after many of the sites. If it did, Heinz says, “my suspicion is almost none of them—probably none of them—that are operating in this space would have the ability to actually get a claim clearance”—that is, a ruling backing up their claims about the benefits provided. 

Bélisle-Pipon points out that if these types of digital therapies are not approved and integrated into health-care and insurance systems, it will severely limit their reach. Instead, the people who would benefit from using them might seek emotional bonds and therapy from types of AI not designed for those purposes (indeed, new research from OpenAI suggests that interactions with its AI models have a very real impact on emotional well-being). 

“It is highly likely that many individuals will continue to rely on more affordable, nontherapeutic chatbots—such as ChatGPT or Character.AI—for everyday needs, ranging from generating recipe ideas to managing their mental health,” he wrote.