Connect with us

Thought Leaders

The Children’s AI That Predates ChatGPT – Now Beating Google on Kids’ Speech Recognition

mm

The language learning app market for children is getting crowded and increasingly scrutinized. Last week, nonprofit Common Sense Media launched the Youth AI Safety Institute, a first-of-its-kind independent lab backed by $20 million USD a year to stress-test AI products for kids – pressure that is arriving just as AI already accounts for 33.5% of revenue across digital language learning products.

At the same time, the kid’s segment alone within this is projected to spike from $2.38 billion USD in 2026 to $5.17 billion by 2035. Clearly, the money is moving parallel to AI integration – but so is oversight.

Research from the University of Colorado’s AI Institute has documented the gap plainly: speech recognition accuracy is substantially lower for children than for adults because most commercial systems are trained on clean audio from grown-up speakers – often reading from scripts.

A 2024 paper reinforces why: children’s voices involve rapid, unpredictable changes in pitch, articulation, and vocabulary development that cause adult-trained models to stumble.

And, in practice, this means a six-year old learning English gets a system that mishears them, fails to respond, or worse: responds to the wrong thing entirely. The child loses interest in learning; the parent loses money.

This is the problem Ivan Crewkov started working on in 2018, four years before large language models became a mainstream conversation.

A Familiar Starting Point

Crewkov’s entry into the edtech space was personal. When his family moved from Siberia to California in 2014, his daughter Sofia was four years old and struggling to learn English at preschool. He tried every available tool and found the same gaps across all of them.

“They often involved reading, which she couldn’t do yet, they felt too much like work, and they didn’t offer speaking practice – critical for kids learning a new language,” he told Unite AI.

He then turned to live online tutors, started recording the sessions, and noticed something he hadn’t expected. “Teachers were literally reading from their screens,” he said, and many weren’t certified educators; each session cost $15 to $20.

Coming from an AI background, Crewkov’s read of the situation was straightforward: “It was clear to me that probably 80% of this work could be done using a virtual, talking AI character and speech recognition. This would also dramatically democratize the market because it was extremely expensive,”

“The idea was to provide a month of learning at the cost of one tutoring session,” he added. That idea became Buddy.ai.

Close to 10.8 million young children – 32% of all U.S. children under the age of nine – are dual language learners (DLLs), with at least one parent who speaks a language other than English at home, as per the Migration Policy Institute. Since 2000, in fact, that population has grown by 24%, and DLLs now make up more than 20% of the young child population in 24 states and Washington D.C.

These are probably the families most likely to search for what platforms like Buddy.ai offer – and least likely to have consistent access to high-quality early childhood programs, with 43% living in low-income households compared to 33% of non-DLL children.

The commercial logic, then, is clear. The harder question is whether the technology actually delivers – and here, Buddy.ai’s approach diverges meaningfully from most of the competition.

Purpose-Built, Not Retrofitted

The distinction Crewkov draws between building for children versus adapting for them is architectural, not cosmetic. “Building AI from the ground up for children means we constrain and re-architect every layer of the stack around kids’ voices, behavior, regulation, and attention span – instead of trying to ‘kid-mode’ an adult chatbot.”

The core problem is acoustic. “Adult LLM stacks assume adult text and speech: long grammatical sentences, stable pronunciation, and quiet environments. A six-year-old gives you the opposite: fragmentary utterances, mispronunciations, shouting, strong accents, and background noise.”

Buddy’s speech recognizer has been trained and continuously retrained on tens of thousands of hours of real children’s voices across ages, accents, and noisy home environments. The startup claims it now outperforms Google’s general-purpose speech recognition on children’s voices, which carries more credibility than it might initially seem: training specifically on the target population consistently outperforms training on adult data.

The conversational layer is similarly purpose-built. Every session operates inside what Crekov describes as a “governing layer” – a curriculum-aware system with a session plan, defined topics and goal tracking. The language model operates as one component within that structure, not the product itself.

If a child veers off course, the system redirects; there’s no open-ended drift into whatever a general-purpose LLM might produce. “There’s a session plan, allowed topics, goal tracking, and code that steers, redirects, or ends the conversation when needed – so a six-year-old never just free-falls into open-ended internet chat.”

Privacy, too, is treated as an architectural constraint as well as a policy – but it has also quietly become a competitive one. Because the Children’s Online Privacy Protection Rule (COPPA) prohibits collecting children’s voice data without parental consent, the largest publicly available dataset of children’s voices amounts to only a few hundred hours.

Buddy.ai has accumulated tens of thousands of hours of real children’s voices across ages, accents and noisy home conditions. Most took the easier path of adapting adult LLMs for children rather than building from scratch, but the shortcut is now a liability.

In Buddy’s case, all interference runs on the platform’s own infrastructure, children’s voice is processed in real time, and discarded before parents even think about it, and no conversations are routed through commercial AI APIs, as per Crewkov.

“We’re very explicit with parents about what we collect, why, and how they can delete it – transparency is as important as the tech itself,” Crewkov stressed.

Since launching ChitChat, its most recent major update in the U.S. market that introduces more advanced free-flowing conversation, the company reports that American students completed 7.7 times more exercises in the same timeframe as the previous year.

Globally, average exercises completed per month have grown 3.5 times year-on-year.

Crewkov is candid about the limits of what those numbers prove. “Designing research studies that could quantifiably measure student learning is something we hope to do in the near future,” he said. For now, however, the evidence base relies on user feedback, product engagement data, and industry award recognition.

The Use Case Nobody Planned For

Perhaps the most telling moment in Buddy.ai’s story is the one that arrived uninvited: without any targeted marketing, parents of children with autism and speech delays began appearing in survey data as among the platform’s most engaged users.

“We were initially quite surprised. It definitely highlighted that there were other ways in which Buddy could affordably be serving students around the world,” Crewkov recalled.

For the founder, it wasn’t just an interesting anomaly – it pointed to a gap the product wasn’t designed to fill but apparently could. The company responded by building a dedicated Applied Behavior Analysis (ABA) course.

The gap is wide, and well-documented, too. A 2025 market report found autism prevalence in the U.S. is as high as 1 in 31 children today, up from 1 in 150 in 2000 – within the persistent backdrop of certified therapist shortages. For families without adequate insurance coverage, annual ABA therapy costs can reach $250,000.

National Institutes of Health (NIH) research on why families weren’t receiving ABA found the barriers to be strikingly practical: the most common reasons cited included long waitlist times (34%), lack of ABA providers in their area (10%), and insurance not covering the therapy (10%). Survey data also showed that more than half of all U.S. counties have no board-certified behavior analysts at all.

Whether an AI character constitutes a meaningful intervention at the clinical level remains an open question – but as a consistent, low-cost, on-demand practice tool for children on long waiting lists, its value is easier to defend.

“This remains a work in progress, but we’re really proud of where the initiative is going,” Crewkov said.

The Bigger Question

The broader edtech market is now full of companies making adjacent claims; AI that is purpose-built, robust architecture; real learning-inducing products. In fact, the global AI in education market is estimated to reach $32.27 billion by 2030, and every major player has a children’s strategy.

What distinguishes Buddy.ai’s position is not the claims themselves, but the timeline. Crewkov started working on children’s speech recognition far before the current wave of investment made it fashionable, and before LLMs made it easy to build a convincing-sounding language tutor in the afternoon.

The technical infrastructure – the acoustic models trained on children’s voices, the curriculum-aware architecture, the proprietary avatar system – reflects nearly a decade of iteration on the specific problem, not a quick pivot toward a hot market.

“In two years, Buddy will be able to give English learners the practice they need to advance from the very first English words to fluency. We are not in a rush, because we prioritize safety and privacy,” Crewkov explained.

And underpinning all of it is a design philosophy that cuts against the grain of most AI product thinking. “Six-year-olds don’t come back for AI. They come back for a character and a world they feel part of.”

That’s a harder thing to replicate than a chatbot with a friendly name. And, for now, Buddy.ai’s most defensible edge.

Salomé is a Medellín-born journalist and Senior Reporter at Espacio Media Incubator. With a background in History and Politics, Salomé’s work emphasizes the social relevance of emerging technologies. She has been featured in Al Jazeera, Latin America Reports, and The Sociable, among others