The Vision: Identifying the Potential of Text-to-Speech in Healthcare

This project began as an effort to explore meaningful ways AI could enhance healthcare. During a brainstorming session at the Innovation Lab, our team discussed various possibilities of using AI to address health challenges. Among the many ideas, Text-to-Speech (TTS) technology stood out for its versatility and potential to improve accessibility, engagement, and personalization in a variety of healthcare applications. Its capacity to transform written content into lifelike, expressive audio opened the door to creating more inclusive and interactive user experiences. Thus, our initial aim was to gain expertise in TTS and explore its integration into a healthcare application. 

The Idea: Creating a Personalized Meditation Experience with AI

Our journey began with a straightforward yet impactful idea: using TTS technology to create guided meditations tailored to each individual. Beyond simply converting text into audio, we saw TTS as a tool capable of delivering deeply customized and meaningful meditation experiences, designed to make mental health resources more engaging and accessible. This aligns with the broader vision of leveraging AI for mental health progress, using technology to address key challenges in emotional well-being.

This idea arose from an understanding of the growing mental health challenges in today’s high-pressure world. While meditation is widely acknowledged for its benefits to emotional and mental well-being, traditional approaches often feel impersonal or disconnected from users’ specific needs. By applying AI to mental health, we aimed to overcome this limitation by creating an experience that adapts to each person’s preferences and emotional state, moving beyond one-size-fits-all solutions.

Our vision focused on tailoring every element of the meditation experience, including the techniques, content, and accompanying background music, to resonate with the user's circumstances. This level of personalization makes meditation feel relevant and supportive, helping users build stronger connections with the practice and integrate it into their daily lives.

By incorporating individual preferences, meditation can become more immersive and effective, encouraging people who may feel alienated by traditional methods to give it another chance. This approach opens the door for users to develop a sustainable practice that supports their emotional well-being in a more meaningful and tailored way, according to their specific journey.

To bring this idea to life, we conducted thorough research into both meditation practices and TTS technologies. This investigation laid the groundwork essential for the development of a personalized meditation app.

The Research: Building the Foundation

A. Understanding Meditation

Our initial focus was on studying the broad scope of meditation practices. While there are countless styles and techniques, we identified and selected five key techniques that stood out as both engaging and comprehensive. These selections served as the foundation for defining the categories through which users can customize their meditation experiences in our app, tailoring sessions to their unique needs and preferences.

Below, we provide an overview of the meditation techniques we explored.

Meditation Techniques

We explored five impactful focuses or techniques that enhance the meditative experience:

  1. Breathing Techniques:  Controlled patterns, such as diaphragmatic or box breathing, that calm the nervous system, reduce anxiety, and promote relaxation.
  2. Body Scan: A progressive practice of scanning the body from head to toe to increase awareness, release tension, and promote physical and mental relaxation.
  3. Guided Visualization: Using imaginative scenarios to inspire motivation, evoke calm, or guide relaxation through creative mental imagery.
  4. Affirmations: Repetition of positive statements to focus the mind, foster self-belief, and encourage positive thinking.
  5. Movement: Incorporating physical movement to strengthen the connection between body and mind, creating a holistic meditative experience.

By understanding and selecting these five focuses, we identified meaningful options for users to personalize their meditation experiences. This comprehensive research became the foundation for our app's personalization features.

B. Exploring TTS models

Alongside our work on meditation, we evaluated TTS technologies to identify tools capable of delivering high-quality audio experiences. Our analysis included paid options like OpenAI’s and ElevenLabs’ services, as well as open-source models hosted on Hugging Face. We also tested Suno AI—a platform designed for music creation—to explore its suitability for generating meditation content. Below is a summary of our findings:

OpenAI TTS

OpenAI’s TTS models provide straightforward usage, generating audio efficiently with options for standard and high-definition (HD) outputs. The HD model offers clear, high-quality audio suitable for professional applications, while the standard model often has a more robotic tone. We observed that reducing the playback speed below normal further enhanced this robotic quality. OpenAI provides six voice types with adjustable speed but lacks advanced controls for emotional expression. Pricing varies between $15 and $30 per 1 million tokens, depending on the model.

audio-thumbnail
Example: OpenAI TTS HD model, Alloy Voice
0:00
/16.085333

ElevenLabs 

ElevenLabs provides robust TTS solutions with two main models: the Standard Model, optimized for precision and quality, and the Turbo Model, designed for low-latency applications. Its standout features include an extensive voice library with detailed tags (language, gender, age, category) and professional voice cloning capabilities with adjustable stability and similarity settings for fine-tuning voice expressiveness. ElevenLabs supports multilingual synthesis and personalization, offering four pricing tiers ranging from Free (10 minutes of text-to-speech) to Pro ($99 per month for 500 minutes), with advanced features like voice cloning and higher-quality audio available in premium plans.

audio-thumbnail
Example: ElevenLabs, Archer Voice
0:00
/10.03102

Open-Source Alternatives on Hugging Face

We also assessed several open-source models for their potential as cost-effective alternatives:

  1. Parler-TTS: This model shows improvement in replicating the timbre of the human voice, producing natural-sounding speech with various voice options. However, it struggles with punctuation, tone, and melodic variation, and occasionally commits errors such as mispronounced or incomprehensible words.
audio-thumbnail
Example: Parler TTS, Laura Voice
0:00
/11.760907
  1. Microsoft SpeechT5: While the voice remains somewhat robotic and fails to replicate the natural timbre of a human voice, it produces clear and accurate pronunciation, avoiding errors in word diction. However, the audio quality lacks refinement.
audio-thumbnail
Example: Microsoft SpeechT5
0:00
/10.272
  1. Suno Bark: This model offers unique features like nonverbal communication elements (e.g., laughing, sighing), which expand creative possibilities. Despite these capabilities, its audio quality is low, with robotic output resembling a stretched audio file, failing to imitate a human voice convincingly.
audio-thumbnail
Example: Suno Bark, Woman Voice
0:00
/14.226667

Suno AI

Suno AI, primarily developed for generating music with lyrics, was evaluated for its ability to produce meditation content. This tool specializes in creating instrumental songs and custom music tailored to specific genres, including meditation. While the tool generates professional-quality music with smooth vocals, adapting it to spoken meditations proved challenging. It consistently produced meditative scripts in a singing voice rather than standard speech, and occasionally repeated verses. Moreover, the lack of an official API limits its integration potential. Despite these challenges, the tool showcased creative potential and expressive range, making it intriguing for experimental use in content creation.

audio-thumbnail
Example: Suno AI, Duet
0:00
/30.848

For our personalized meditation project, we selected the OpenAI TTS model for its balance between quality and ease of integration. As the TTS field evolves, we remain committed to exploring new technologies that enhance the user experience.

Lumia: A Personalized Meditation Experience

What started as an initial concept evolved through extensive research and exploration, eventually shaping Lumia, the personalized meditation app we’ve developed. Designed with the user’s well-being in mind, Lumia offers a tailored experience that adapts to individual preferences, making meditation a more meaningful practice for mental wellness. 

How Lumia Works

1. Begin Your Journey

Lumia starts by offering users 2 pathways to begin their meditation experience:

  1. Personalized Meditation: Users can freely describe their current mood, needs, and goals—like relaxation, focus, or an energy boost. The more context users provide, the more personalized the meditation becomes, offering a supportive experience for mental well-being. If unsure what to write, the “Suggest me a meditation” option provides inspiration based on common emotional states or goals, such as feeling anxious or needing motivation.
  2. Quick Meditations: For those looking for an immediate option, Lumia includes a library of pre-designed meditations, categorized as:
  • Stress Management
  • Better Sleep
  • Increase Focus
  • Decrease Anxiety
  • Emotion Handling
  • Feel Happier

These quick meditations provide instant access to relaxation tools without additional customization.

Mockup of Lumia app - Homepage

2. Set Your Preferences

Before starting a meditation, Lumia allows users to fine-tune their experience by setting preferences. The customizable options include:

  • Voice Gender: Choose between a male or female narrator.
  • Duration: Select from various lengths (e.g., 1, 5, 10, 15, or 20 minutes).
  • Background Sound: Pick from natural sounds (like River Sounds and Ocean Waves) or more abstract tones (like Dream Serenity and Eternal Echo), or opt for no background sound at all.
  • Meditation Technique: Select from a range of guided approaches, including:
    • Breathing Techniques
    • Body Scan
    • Guided Visualization
    • Affirmations
    • Movement-based Meditations

This level of personalization ensures that each session aligns perfectly with the user’s needs and preferences.

Mockup of Lumia app-Setting preferences

3. AI-Powered Meditation Creation

Once preferences are set, Lumia generates a customized meditation using advanced AI models. The script is created, transformed into audio, and enhanced with background sounds for an immersive experience. During this process, users can explore meditation tips and benefits

Mockup of Lumia app

4. Reflect and Improve

After completing a meditation, users are encouraged to:

  • Rate the Session: Provide feedback on their experience.
  • Share Thoughts: Leave comments to refine future sessions.
  • Download the Audio: Save the meditation for future use.

This feedback loop helps Lumia continually adapt to individual preferences, making each meditation session better than the last.

The Conclusion: Unlocking the Potential of TTS for Healthcare

Our work in developing Lumia, our personalized meditation application, has revealed the significant value of TTS technology in healthcare. Far from just a technical tool, it serves as a means to create more accessible, individualized, and meaningful user experiences.

Perspectives on TTS Technology

TTS technology holds great promise for reshaping healthcare, particularly in terms of accessibility and personalization. While open-source models still face limitations in speech quality and naturalness, the future looks promising. Advanced solutions, such as those from OpenAI and ElevenLabs, show considerable potential despite challenges.

While our project began with the vision of creating personalized meditation experiences, we quickly realized that TTS has much broader applications. It can enhance patient engagement and provide more tailored support in various healthcare services.

Personalization: Key to Enhance User Experience in Healthcare

One of the main findings of our project is the importance of personalization. In healthcare, particularly in mental health, adapting an experience to the user’s needs makes them feel acknowledged and valued. Our meditation application demonstrates how technology can create experiences that connect with individuals on a personal level.

Personalization is more than just adjusting names or voices. It’s about addressing the emotional needs of each user and crafting an experience that resonates with them. This approach not only boosts user engagement but can significantly improve their well-being.

Looking Ahead

We are thrilled to announce that Lumia is nearing its final stages of development. As we refine the app’s features and personalization tools, we prepare for its public launch in Q1 2025. Our journey has been driven by a passion for making mental health support accessible and meaningful, and we can’t wait to share Lumia with the world.

Stay tuned for updates as we bring this vision to life! The future of AI-powered mental health is promising, and Lumia is just the beginning of what’s possible.

Various Mockups of Lumia app
Need help with your health product? Contact us!