Tyler Weitzman is the Co-Founder, Head of Artificial Intelligence & President at Speechify, the #1 text-to-speech app in the world, totaling over 100,000 5-star reviews. Weitzman is a graduate of Stanford University, where he received a BS in mathematics and a MS in Computer Science in the Artificial Intelligence track. He has been selected by Inc. Magazine as a Top 50 Entrepreneur, and he has been featured in Business Insider, TechCrunch, LifeHacker, CBS, among other publications. Weitzman’s Masters degree research focused on artificial intelligence and text-to-speech, where his final paper was titled: “CloneBot: Personalized Dialogue-Response Predictions.”
You began coding when you were only 9 years old, what initially attracted you to computer science?
I was pretty obsessed as a kid with Dragon Ball Z, and I wanted to learn to animate myself. I learned Adobe Flash and Photoshop and put my own animations of Goku on a fan webpage I built. It was soon after I began learning about systems and algorithms, and when I learned I could actually program for a living that was pretty exciting. I thought it was just a hobby like playing games.
You then began building iphone apps when you were only 12 years old, what were some of these apps?
One app is called Black SMS that allows people to send encrypted text messages to each other. Another app was called Frontback that enables users to take selfies and photos of what’s in front of them at the exact same time.
Could you discuss your research at Stanford University and how it was centered around natural language processing and speech synthesis?
My research spanned multiple uses for transformer networks, including language generation models for chat, part-of-speech tagging, punctuation prediction, and text-to-speech. Optimizing neural network inference for mobile CPUs was a primary focus and that directly translated to the offline voices available on Speechify, which work even on airplane mode.
Could you share the genesis story behind Speechify?
I’m blind in one eye and my brother Cliff is dyslexic. We’ve used audiobooks and text to speech audio technology for as long as we can remember to get through school and when we were young for reading books like Harry Potter. As we got older and started to use more technology products, we started to realize there was an opportunity to build better text to speech apps on web and mobile with better voices thanks to advancements in AI and a better user experience. So we decided to go for it in Speechify.
What are some of the different machine learning technologies that are used at Speechify?
We’ve adopted cutting-edge techniques for advanced generative architectures— transformers/conformers, large-scale pretraining, distributed training, gradient accumulation, auto-encoded latent spaces, diffusion, adversarial networks, and language modeling. We employ supporting techniques for feature processing surrounding phonemization, pitch, and emotion, to better model speech specifically.
What are some of the challenges behind building a text-to-speech app?
One key challenge is building high quality voices that sound like real humans rather than robots. Our goal is for people to not be able to tell the difference between how our voices sound and how humans sound, so that our users are comfortable listening to content on Speechify for long periods of time. A second challenge is distributing our AI models to millions of users. It’s one thing to build high quality AI voices and another to make sure millions of users across the world actually find out about them and use them.
Speechify is the #1 app in its category in the app store, what do you attribute this success to?
We believe we’ve built the best products in the market for people who want to listen to the reading they need to consume – whether it’s students with homework, professionals who are reading for work, or leisure readers who just want to be entertained. We have the best selection of voices, including celebrities like Snoop Dogg, and the best user interface for people to easily upload and access the content that they want to consume. And our user experience is seamless across the Speechify ecosystem – you can start listening to an article on your computer and then easily zap it to keep listening on your phone.
What are some of the biggest use cases for this app?
Speechify's generative AI solves real problems for students who want to get through lots of homework faster, real people with Dyslexia and ADHD who have trouble reading, seniors with low vision, professionals who want to read more and be more productive, writers who want to listen to their work, auditory learners, and countless others.
What is your vision for the future of AI?
We want AI – and specifically AI text to speech voices – to eliminate barriers to learning regardless of your income level, learning differences, geography, or language. We see AI as a tool for social good to elevate the quality of life humans can live through improving their education.
Thank you for the great interview, readers who wish to learn more should visit Speechify.