Anyone who has tried to learn a second language knows the shape of the problem. You memorize verb conjugations, build a decent reading vocabulary, pass quizzes with flying colors, and then freeze the moment a native speaker asks you a question at normal speed. The gap between knowing a language and speaking it has always been the hardest thing to close, because closing it requires the one resource that traditional study cannot simulate: a patient, available, endlessly forgiving conversation partner.

For decades, that meant either paying a tutor, finding a willing friend, or moving to another country. All three are expensive in time, money, or both. The last two years have compressed that gap in ways that would have seemed absurd in 2023. A constellation of apps, voice-enabled chatbots, and AI-native platforms now offer something that genuinely did not exist before: unlimited spoken conversation practice, on demand, with real-time feedback, for less than the cost of a single tutoring session per month.

The Three Tiers

The space has stratified into three tiers, each with a different theory about what learners actually need.

At the top sit purpose-built AI conversation platforms. Speak, which reached a $1 billion valuation in late 2024 and surpassed $100 million in annualized revenue last year, was built from the ground up around the premise that speaking should come first, not after months of grammar drills. Praktika, which claims over 20 million learners and generated $20 million in revenue by early 2025 with a team of fewer than 40 people, uses AI-powered avatars that adapt to your level, your accent, and your weak spots across sessions. Langua, a newer entrant, has attracted attention for voices cloned from real native speakers, producing conversations that sound less synthetic than most competitors.

In the middle tier, the incumbents are retrofitting. Duolingo’s Video Call feature, which puts learners in live AI conversations with its character Lily, is now available across nine language courses for Max subscribers. Babbel has added AI-driven speaking practice alongside its structured grammar lessons. Even Memrise, long a flashcard-first platform, now integrates conversational AI.

And at the base, the general-purpose models are surprisingly capable. ChatGPT’s voice mode, which OpenAI explicitly markets as a language practice tool, lets you role-play a café order in Portuguese or debate philosophy in Japanese with no setup required. It is free, immediate, and, for the first conversation a learner ever has in their target language, often transformative. The psychological barrier of “I have never actually spoken this language out loud” can crumble in five minutes with an AI that neither judges nor tires.

Latency, Feedback, and Memory

The shift is not that chatbots can speak languages; machine translation has existed for years. The shift is in three specific capabilities arriving simultaneously.

The first is latency. Modern voice-to-voice AI responds in fractions of a second, fast enough that a conversation feels like a conversation rather than a series of prompts and waits. When Speak integrated OpenAI’s Realtime API, the result was feedback loops that approach the rhythm of human speech. That rhythm matters more than most learners realize: conversation is a timing exercise as much as a vocabulary exercise, and you cannot practice timing against a system that pauses for three seconds between turns.

The second is pronunciation feedback. Earlier speech recognition systems could tell you whether you said the right word but not whether you said it well. Newer tools score individual phonemes, flag where your stress patterns diverge from native norms, and, in the case of apps like ELSA Speak, provide phoneme-level analysis that a human tutor would struggle to match in real time. The machine has a structural advantage here: it can listen to your vowel placement on every syllable of every sentence without ever losing focus.

The third is memory. The best purpose-built platforms now track your performance across sessions, surfacing words you consistently stumble over and adjusting difficulty without being asked. Praktika’s avatars remember what you discussed last Tuesday. Speak builds personalized review sessions around your specific error patterns. This is where the gap between a dedicated language AI and a general chatbot becomes most visible: ChatGPT is brilliant in the moment but starts every conversation with a blank slate unless you manually reconstruct context.

What Patience Cannot Teach

None of this means human tutors are obsolete, and anyone claiming otherwise is selling something (probably an app subscription). A meta-analysis published in late 2025, covering 46 empirical studies, found that AI-based language instruction produces a “medium-to-large” effect on learning outcomes across all major skills, but the same study noted that AI works best as a supplement to human teaching rather than a replacement for it. The tools can build fluency mechanics, but they cannot replicate the cultural negotiation of a real conversation, the social stakes of being misunderstood by someone who matters to you, or the improvisational chaos of a dinner party in a language you barely speak.

There are also practical gaps that the marketing glosses over. ChatGPT’s voice mode, for all its versatility, tends to understand you even when your pronunciation is rough, which is generous in the moment and counterproductive over time. Several reviewers have noted that Speak’s speech recognition can be overly lenient, giving learners a false sense of mastery by not catching word-order mistakes or subtle mispronunciations. And every conversation with an AI, no matter how sophisticated, shares a common trait: the AI is always, constitutionally, patient and encouraging. Real human conversation is not always patient and encouraging. Learning to speak under social pressure, when your interlocutor is bored or confused or in a hurry, is a skill that no chatbot currently teaches.

The research on anxiety reduction is genuinely promising: a 2025 mixed-methods study of 60 university students found that six weeks of AI chatbot practice produced measurable improvements in speaking skills while reducing foreign language speaking anxiety. But reducing anxiety and building resilience are not the same thing, and the learner who practices exclusively with an infinitely kind robot may find the transition to imperfect humans jarring.

The Practical Approach

Unsurprisingly, the most effective strategy combines the new tools rather than choosing one to the exclusion of others. A general-purpose model like ChatGPT is a low-friction entry point: free, available on your phone, good for breaking the ice with a language you have studied but never spoken. A purpose-built app like Speak or Praktika adds structure, memory, and pronunciation feedback that a general model lacks. And a human conversation partner, whether a tutor on iTalki, a language exchange on Tandem, or a friend who tolerates your stumbling, provides what no AI can: the unpredictable, socially charged, occasionally embarrassing experience of being a beginner in front of another person.

The price comparison is worth noting. A human tutor on most platforms runs $15 to $50 per hour. Speak charges roughly $20 per month, or $99 per year. Praktika costs around $8 per month. Duolingo Max, which includes Video Call, runs $168 per year. ChatGPT’s voice mode is free on the basic tier. For a learner willing to spend $20 per month total, the amount of spoken practice available in 2026 is orders of magnitude greater than what the same budget bought even three years ago.

The technology has not solved language learning. It has solved the access problem that made the hardest part of language learning, sustained spoken practice, available only to those with the money or luck to find a patient human. The vocabulary gap still requires study. Grammar still requires attention. Cultural fluency still requires exposure to actual culture. But the bottleneck has moved, and for anyone who has spent years understanding a language they cannot speak, the tools to change that are now sitting on their phone, waiting to talk whenever they are ready.