Patricia Thaine, CEO at Private AI – Interview Series
Patricia Thaine is the Co-Founder and CEO of Private AI, a Computer Science PhD Candidate at the University of Toronto, and a Postgraduate Affiliate at the Vector Institute doing research on privacy-preserving natural language processing, with a focus on applied cryptography. She also does research on computational methods for lost language decipherment.
Patricia is a recipient of the NSERC Postgraduate Scholarship, the RBC Graduate Fellowship, the Beatrice “Trixie” Worsley Graduate Scholarship in Computer Science, and the Ontario Graduate Scholarship. She has eight years of research and software development experience, including at the McGill Language Development Lab, the University of Toronto’s Computational Linguistics Lab, the University of Toronto’s Department of Linguistics, and the Public Health Agency of Canada.
What initially attracted you to computer science?
The ability to problem-solve and be creative at the same time. It’s like a craft. You get to see your product ideas come to life, much like a carpenter builds furniture. As I overheard someone say once: programming is the ultimate creative tool. The fact that the products you build can scale and be used by people anywhere in the world is such a cherry on top.
Could you discuss the genesis story behind Private AI and how it originated from your observation that there is a lack of tools that are easy to integrate for preserving privacy?
Through speech and writing, some of our most sensitive information is produced and transferred over to the companies whose services we use. When we were considering which NLP products to build, there was a layer of privacy that we’d have to integrate which simply did not exist in the market. To use privacy solutions, either companies needed to transfer their users’ data to a third party, use sub-par open-source solutions which just don’t cut it for properly protecting user privacy, or build a solution in-house with very little expertise in privacy. So, we decided to focus on creating the best products possible for developers and AI teams who need to have the outputs of privacy enhancing technologies easily work for their needs.
Why is privacy-preserving AI important?
Roughly 80 percent of information produced is unstructured and AI is the only way to make sense of all of that data. It can be used for good, like helping detect falls for an elderly population, or for bad, like profiling and tracking individuals of underrepresented populations. Ensuring that privacy is built into the software we create makes it much more difficult for AI to be used in a detrimental way.
How is privacy a competitive advantage?
There are many reasons, but here are just a few:
- More and more users care about privacy and, as consumers become more educated, this concern is growing: 70 percent of consumers are concerned about the privacy of their data.
- It’s much easier to do business with other businesses if you have proper data protection and data privacy protocols and technologies in place.
- When you have built your products in a privacy-preserving way, you are keeping better track of where the points of vulnerability are in your service and, especially through data minimization, you’re getting rid of the data you don’t need and would get you into trouble when a cyberattack happens.
Could you discuss the importance of training data privacy and why it is susceptible to reverse engineering?
This is a great question and there needs to be so much more education on this. Simplistically, machine learning models memorize information. The bigger the models, the more they memorize corner cases. What this means is that the information those models were trained on can be spewed out in production. This has been shown in several research papers, including The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks and Extracting Training Data from Large Language Models.
It has also been shown that personal information can be extracted from word embeddings and, for those with any doubts about this being a real problem, there was also a scandal this year when a Korean love bot was writing out user details in chats with other users.
What are your views on federated learning and user privacy?
Federated learning is a great step when the use case allows. However, it’s still possible to extract information about a user’s inputs from the weight updates sent over to the cloud from a particular users’ device, so it’s important to combine federated learning with other privacy enhancing technologies (differential privacy and homomorphic encryption/secure multiparty computation). Each privacy enhancing technology has to be chosen according to the use case – none can be used as a hammer to solve all problems. We go over the decision tree here. One big gain is that you never send your raw data outside of your device. One big drawback is that if you need data in order to debug a system or see if it’s being trained properly, it becomes much more difficult to obtain. Federated learning is a great start with plenty of unsolved problems that research and industry are both working on.
Private AI enables developers to integrate privacy analysis with several lines of code to ensure privacy, how does this work?
Our tech runs as a REST API which our users send POST requests to with the text they want to redact, de-identify, or pseudonymize/augment with realistic data. Some of our customers send through call transcripts that need to be redacted in order to be PCI compliant, while others send through entire chats so they can then use the information to train chatbots, sentiment analyzers, or other NLP models. Our users can also choose which entities they need to keep or even use as metadata to track where personal data are stored. We take away the pain of having to train up an accurate system to detect and replace personal information in really messy data.
Why is privacy for IoT devices a current issue and what are your views on solving it?
Ultimately, the best way to solve a privacy problem is very use-case dependent, and IoT devices are no different. While several use cases might rely on edge deployment, edge inference, and privacy-preserving federated learning (e.g., crowd sensing in smart cities), other use cases might need to rely on data aggregation and anonymization (e.g., energy usage information). With that said, IoT devices are a prime example of how privacy and security must go hand in hand. These devices are notoriously insecure to cyberattacks, so there’s only so much privacy enhancing technologies can do without fixing core device vulnerabilities. On the other hand, without thinking of ways to enhance user privacy, information collected from within our homes can be shared, unchecked, to unknown parties, making it exceedingly difficult to guarantee the security of the information. We have two fronts to improve upon here and the draft legislation being written by the European Commission on IoT device security might end up being what shakes device manufacturers into taking their responsibility towards the security and privacy of consumers seriously.
Is there anything else that you would like to share about Private AI?
We’re a group of experts in privacy, natural language, spoken language, image processing, machine learning model deployment in low-resource environments, backed by M12, Microsoft’s venture fund.
We make sure the products we create, on top of being highly accurate, are also computationally efficient so you don’t have a massive cloud bill on your hands at the end of the month. Also, our customers’ data never ever gets transferred to us – everything is processed in their own environment.
Thank you for the great interview, to learn more visit Private AI.