Interviews

Dr. Judith Bishop, Senior Director of AI Specialists at Appen – Interview Series

Updated on May 20, 2021

Dr. Judith Bishop, is a Senior Director of AI Specialists for the APAC/US region at Appen. She is leading and growing a top-notch team of highly qualified and experienced linguists, computational linguists, and experts in all modes of human communication (speech, writing and gesture), to deliver AI training data with an unrivaled combination of quality and speed.

What initially attracted you to linguistics?

I first heard about linguistics from a favorite English teacher in high school. I was one of those kids who are equally drawn to foreign languages and humanities, and math and science subjects. Linguistics is the science of how language works, so it brought those interests together for me. Like so many people, once I learned about it, I was completely hooked. What could be more fascinating than how we communicate our thoughts and feelings to each other? Linguistics explores the language structures that, for all the differences in sounds and writing systems, are often similar under the surface, since all are a product, ultimately, of our common human existence.

Could you share the genesis story of how you found yourself working in AI?

I’ve worked at Appen since 2004 supporting the development of language technology products and services. Over this time, AI has emerged as a comprehensive framework, mission and vision for technology to mimic and extend human capabilities of communication, reasoning and perception. In 2019 my team rebranded itself as AI Specialists, recognizing that our linguistic and language knowledge is critical to the AI enterprise. Our annotated data provides essential support for the success of human interactions with AI products and services.

You’ve been working in AI for over 16 years, what are some of the biggest changes that you have seen?

The major shift has been a diversification of focus from core technology development to the long tail of use cases and applications. For most of my career, the focus of language-based AI has been to develop and refine a core set of models that mimic human speech perception and production, namely, speech recognition, speech synthesis, and natural language processing. Datasets typically conformed to common labelling and data sampling standards and conventions, such as those developed by the Speecon consortium (Speech-Driven Interfaces for Consumer Devices.) These standards have allowed core technology developers to benchmark their performance on common data structures and supported the rapid evolution of AI.

The pervasive expansion of AI use cases in more recent years, however, has brought with it the recognition that the core, generic AI models built with this data do not work adequately on more specialized data types without further tuning. Moreover, having been developed on data that was deliberately clean and ‘standard’, these models must now be trained or updated to understand and respond to all the diversity of human inputs: all dialects, all accents, all ethnicities, all genders, and all other dimensions of human difference.

Could you discuss the importance of unbiased data in machine learning?

Machine learning models, whether supervised, unsupervised or reinforcement learning models, will reflect biases that are present in the data they are trained on. Alyssa Simpson Rochwerger and Wilson Pang provide several excellent examples of this issue in their recent book, Real World AI. If there is insufficient training data for a segment of the population, the AI model will be less accurate for that segment.

In another common case, the representation of the population may suffice, but if the training data contains correlations between data points that reflect actual, but undesirable, conditions in the world (such as a lower rate of full employment for women, or a higher rate of incarceration for African Americans,) resulting AI applications can reinforce and perpetuate those conditions.

Associations present in language at large can create biases in NLP applications, which rely on statistical relationships known as word embeddings. If ‘she’ and ‘nurse’ are more frequently associated in the chosen training data than ‘they’ or ‘he’ and ‘nurse,’ then the resulting application will use ‘she’ when forced to choose a singular pronoun to refer to a nurse. To address this specific issue, researchers have recently developed a gender-neutral variant of a commonly used word embedding algorithm, GN-GloVe.

In sensitive applications, bias issues such as these can have a devastating impact on users and can wipe out the business investment. The good news is that, in addition to the development of new, more transparent and inclusive datasets, a growing number of data science applications are being developed to check for the presence of bias in existing training datasets and AI applications.

Appen recently launched new diverse training datasets for natural language processing (NLP) initiatives. Could you share some details on how these datasets will enable end users to receive the same experience regardless of language variety, dialect, ethnolect, accent, race or gender?

For the reasons mentioned above, datasets are needed to correct existing biases in AI production systems, in addition to more inclusive datasets for training future systems. The Appen datasets you mention will support the correction of biases related to ethnicity and associated ethnolects, such as African American Vernacular English. They will provide supplemental training data to boost the representation of this population in AI language models.

Ethnicity is emerging as a critical demographic dimension for explicit labelling in AI data. Linguists refer to the language varieties associated with particular ethnicities as ‘ethnolects.’ AI data providers such as Appen now recognize that unless key diverse and minority populations are represented explicitly in AI training datasets, we cannot ensure that resulting systems perform equally well for these populations.

Equal performance means the system recognises with equal accuracy the user’s words and intents (their meanings, or the actions they want to accomplish) and in some cases, sentiment; and that it responds in ways that equally satisfy the user’s needs, and does not produce a more negative impact on a particular population of users, either practically or psychologically.

A longstanding data collection approach has been to focus on geographically and dialectally representative sampling in databases – assuming this would ensure the technology will generalize to the whole population of language speakers. The relatively poorer performance of language technologies recently documented for African American Vernacular English speakers has shown this isn’t so. Populations that are diverse in ethnicity, race, gender and accent, among other dimensions, need to be proactively included in training data sets to ensure their voices are heard and understood by AI products and services. Appen’s diverse AI training datasets address this need.

Outside of AI, you are also a poet with several of your poems winning different industry awards. What are your views on future AI exhibiting this type of creativity, including writing poetry?

That’s a fascinating question. Poetry and other forms of human creativity draw on all our human resources of memory, perception, sensation and emotion, as well as the structures and nuances of language and image, to produce insights that resonate with contemporary concerns. Emily Dickinson wrote, “If I read a book and it makes my whole body so cold no fire can warm me, I know that is poetry. If I feel physically as if the top of my head were taken off, I know that is poetry.” There must be an element of perceptual, sensory, or emotional recognition, but also genuine surprise.

Advanced AI models such as GPT-3 statistically model the likelihood of words appearing together in different genres, including poetry. This means they can produce something we recognise as “poetic” language, such as the use of heightened diction, rhyme, and unexpected or surreal combinations of words. But these generative language models lack most of the resources, mentioned above, which are needed to produce a work of art that illuminates what it means to be human in the present time.

What I do find compelling about AI in a creative context is its potential to produce entirely new insights – insights that are different in kind and beyond the reach of any single human mind, even the most polymathic or deeply read and experienced human mind. Once AI has consistent access to sensory and perceptual data for analysis across a broad range of human domains (visual, tactile, auditory, physiological, emotional) there is no knowing what we will learn about ourselves and the world. AI’s analytical capabilities may produce fertile new grounds for creative human exploration.

You’ve had a phenomenal career so far, in your opinion what is holding more women back from joining STEM and specifically AI?

The lack of role models can be a powerful factor (and a vicious circle). There is a genuine difficulty – culturally, socially and practically – in breaking into areas where women, and people of other diverse genders, don’t yet have a deeply established presence, and where the respect for what we can contribute is too often lacking. My own experience as a leader has shown me time and again how resilient, creative and successful teams can be when they are inclusive of diverse experiences and orientations. Leaders need to be adventurous in their hiring and brave in their confidence that they can handle the challenges to their way of thinking that diverse perspectives bring, knowing that this bravery has also been shown to be strongly correlated with financial and corporate success.

Is there anything else that you would like to share about Appen or AI in general?

Data providers such as Appen have a powerful potential to influence AI outcomes for the better by providing inclusive training data.

However, reaching the goal of inclusive AI will require everyone to participate. Data buyers must also recognize their responsibility to explicitly ask – and pay – for the inclusive data that will ensure the optimal performance of their systems for all users in the real world. And those from diverse communities who supply their data for AI development must be able to trust the uses to which it will be put. Building that trust will require strong transparency and ethical practices on the part of all who handle sensitive data.

Thank you for the great interview, I enjoyed learning more about your views on AI and linguistics. Readers who wish to learn more should visit Appen.