Across the world, the number of English language learners continues to rise. Educational institutions and employers need to be able to assess the English proficiency of language learners – in particular, their speaking ability, since spoken language remains among the most essential language abilities. The challenge, for both assessment developers and end users, is finding a way to do so that is accurate, fast and financially viable. As part of this challenge, scoring these assessments comes with its own set of factors, especially when we consider the different areas (speech, writing, etc.) one is being tested on. With the demand for English-language skills across the globe only expected to increase, what would the future of speech scoring need to look like in order to meet these needs?
The answer to that question, in part, is found in the evolution of speech scoring to date. Rating constructed spoken responses has historically been done using human raters. This process, however, tends to be expensive and slow, and has additional challenges including scalability and various shortcomings of human raters themselves (e.g., rater subjectivity or bias). As discussed in our book Automated Speaking Assessment: Using Language Technologies to Score Spontaneous Speech, in order to address these challenges, an increasing number of assessments now make use of automated speech scoring technology as the sole source of scoring or in combination with human raters. Before deploying automated scoring engines, however, their performance needs to be thoroughly evaluated, particularly in relation to the score reliability, validity (does the system measure what it is supposed to?) and fairness (i.e., the system should not introduce bias related to population subgroups such as gender or native language).
Since 2006, ETS’s own speech scoring engine, SpeechRater®, has been operationalized in the TOEFL® Practice Online (TPO) assessment (used by prospective test takers to prepare for the TOEFL iBT® assessment), and since 2019, SpeechRater has also been used, along with human raters, for scoring the speaking section of the TOEFL iBT® assessment. The engine evaluates a wide range of speaking proficiency for spontaneous non-native speech, including pronunciation and fluency, vocabulary range and grammar, and higher-level speaking abilities related to coherence and progression of ideas. These features are computed by using natural language processing (NLP) and speech processing algorithms. A statistical model is then applied to these features in order to assign a final score to a test taker’s response.
While this model is trained on previously observed data scored by human raters, it is also reviewed by content experts to maximize its validity. If a response is found to be non-scorable due to audio quality or other issues, the engine can flag it for further review to avoid generating a potentially unreliable or invalid score. Human raters are always involved in the scoring of spoken responses in the high-stakes TOEFL iBT speaking assessment.
As human raters and SpeechRater are currently used together to score test takers’ responses in high-stakes speaking assessments, both play a part in what the future of scoring English language proficiency can be. Human raters have the ability to understand the content and discourse organization of a spoken response in a deep way. In contrast, automated speech scoring engines can more precisely measure certain detailed aspects of speech, such as fluency or pronunciation, exhibit perfect consistency over time, can reduce overall scoring time and cost, and are more easily scaled to support large testing volumes. When human raters and automated speech scoring systems are combined, the resulting system can benefit from the strengths of each scoring approach.
In order to continuously evolve automated speech scoring engines, research and development needs to focus on the following aspects, among others:
- Building automatic speech recognition systems with higher accuracy: Since most features of a speech scoring system rely directly or indirectly on this component of the system that converts the test taker’s speech to a text transcription, highly accurate automatic speech recognition is essential for obtaining valid features;
- Exploration of new ways to combine human and automated scores: In order to take full advantage of the respective strengths of human rater scores and automated engine scores, more ways of combining this evidence need to be explored;
- Accounting for abnormalities in responses, both technical and behavioral: High-performing filters capable of flagging such responses and excluding them from automated scoring are necessary to help ensure the validity and reliability of the resulting assessment scores;
- Assessment of spontaneous or conversational speech that occurs most often in day-to-day life: While automated scoring of such interactive speech is an important goal, these items present numerous scoring challenges, including overall evaluation and scoring;
- Exploring deep learning technologies for automated speech scoring: This relatively recent paradigm within machine learning has produced substantial performance increases on many artificial intelligence (AI) tasks in recent years (e.g., automatic speech recognition, image recognition), and therefore it is likely that automated scoring also may benefit from using this technology. However, since most of these systems can be considered “black-box” approaches, attention to the interpretability of the resulting score will be important to maintain some level of transparency.
To accommodate a growing and changing English-language learner population, next-generation speech scoring systems must expand automation and the range of what they are able to measure, enabling consistency and scalability. That is not to say the human element will be removed, especially for high-stakes assessments. Human raters will likely remain essential for capturing certain aspects of speech that will remain hard to evaluate accurately by automated scoring systems for a while to come, including the detailed aspects of spoken content and discourse. Using automated speech scoring systems in isolation for consequential assessments also runs the risk of not identifying problematic responses by test takers— for instance, responses that are off-topic or plagiarized, and, as a consequence, can lead to reduced validity and reliability. Using both human raters and automated scoring systems in combination may be the best way for scoring speech in high-stakes assessments for the foreseeable future, particularly if spontaneous or conversational speech is evaluated.
ETS works with education institutions, businesses and governments to conduct research and develop assessment programs that provide meaningful information they can count on to evaluate people and programs. ETS develops, administers and scores more than 50 million tests annually in more than 180 countries at more than 9,000 locations worldwide. We design our assessments with industry-leading insight, rigorous research and an uncompromising commitment to quality so that we can help education and workplace communities make informed decisions. To learn more visit ETS.
Huma Abidi, Senior Director of AI Software Products at Intel – Interview Series
Huma Abidi is a Senior Director of AI Software Products at Intel, responsible for strategy, roadmaps, requirements, machine learning and analytics software products. She leads a globally diverse team of engineers and technologists responsible for delivering world-class products that enable customers to create AI solutions. Huma joined Intel as a software engineer and has since worked in a variety of engineering, validation and management roles in the area of compilers, binary translation, and AI and deep learning. She is passionate about women’s education, supporting several organizations around the world for this cause, and was a finalist for VentureBeat’s 2019 Women in AI award in the mentorship category.
What initially sparked your interest in AI?
I’ve always found it interesting to imagine what could happen if machines could speak, or see, or interact intelligently with humans. Because of some big technical breakthroughs in the last decade, including deep learning gaining popularity because of the availability of data, compute power, and algorithms, AI has now moved from science fiction to real world applications. Solutions we had imagined previously are now within reach. It is truly an exciting time!
In my previous job, I was leading a Binary Translation engineering team, focused on optimizing software for Intel hardware platforms. At Intel, we recognized that the developments in AI would lead to huge industry transformations, demanding tremendous growth in compute power from devices to Edge to cloud and we sharpened our focus to become a data-centric company.
Realizing the need for powerful software to make AI a reality, the first challenge I took on was to lead the team in creating AI software to run efficiently on Intel Xeon CPUs by optimizing deep learning frameworks like Caffe and TensorFlow. We were able to demonstrate more than 200-fold performance increases due to a combination of Intel hardware and software innovations.
We are working to make all of our customer workloads in various domains run faster and better on Intel technology.
What can we do as a society to attract women to AI?
It’s a priority for me and for Intel to get more women in STEM and computer science in general, because diverse groups will build better products for a diverse population. It’s especially important to get more women and underrepresented minorities in AI, because of potential biases lack of representation can cause when creating AI solutions.
In order to attract women, we need to do a better job explaining to girls and young women how AI is relevant in the world, and how they can be part of creating exciting and impactful solutions. We need to show them that AI spans so many different areas of life, and they can use AI technology in their domain of interest, whether it’s art or robotics or data journalism or television. Exciting applications of AI they can easily see making an impact e.g. virtual assistants like Alexa, self-driving cars, social media, how Netflix knows which movies they want to watch, etc.
Another key part of attracting women is representation. Fortunately, there are many women leaders in AI who can serve as excellent role models, including Fei-Fei Li, who is leading human-centered AI at Stanford, and Meredith Whittaker, who is working on social implications through the AI Now Institute at NYU.
We need to work together to adopt inclusive business practices and expand access of technology skills to women and underrepresented minorities. At Intel, our 2030 goal is to increase women in technical roles to 40% and we can only achieve that by working with other companies, institutes, and communities.
How can women best break into the industry?
There are a few options if you want to break into AI specifically. There are numerous online courses in AI, including UDACITY’s free Intel Edge AI Fundamentals course. Or you could go back to school, for example at one of Maricopa County’s community colleges for an AI associate degree – and study for a career in AI e.g. Data Scientist, Data Engineer, ML/DL developer, SW Engineer etc.
If you already work at a tech company, there are likely already AI teams. You could check out the option to spend part of your time on an AI team that you’re interested in.
You can also work on AI if you don’t work at a tech company. AI is extremely interdisciplinary, so you can apply AI to almost any domain you’re involved in. As AI frameworks and tools evolve and become more user-friendly, it becomes easier to use AI in different settings. Joining online events like Kaggle competitions is a great way to work on real-world machine learning problems that involve data sets you find interesting.
The tech industry also needs to put in time, effort, and money to reach out to and support women, including women who are also underrepresented ethnic minorities. On a personal note, I’m involved in organizations like Girls Who Code and Girl Geek X, which connect and inspire young women.
With Deep learning and reinforcement learning recently gaining the most traction, what other forms of machine learning should women pay attention to?
AI and machine learning are still evolving, and exciting new research papers are being published regularly. Some areas to focus on right now include:
- Classical ML techniques that continue to be important and are widely used.
- Responsible/Explainable AI, that has become a critical part of AI lifecycle and from a deployability of deep learning and reinforcement learning point-of-view.
- Graph Neural Networks and multi-modal learning that get insights by learning from rich relation information among graph data.
AI bias is a huge societal issue when it comes to bias towards women and minorities. What are some ways of solving these issues?
When it comes to AI, biases in training samples, human labelers and teams can be compounded to discriminate against diverse individuals, with serious consequences.
It is critical that diversity is prioritized at every step of the process. If women and other minorities from the community are part of the teams developing these tools, they will be more aware of what can go wrong.
It is also important to make sure to include leaders across multiple disciplines such as social scientists, doctors, philosophers and human rights experts to help define what is ethical and what is not.
Can you explain the AI blackbox problem, and why AI explainability is important?
In AI, models are trained on massive amounts of data before they make decisions. In most AI systems, we don’t know how these decisions were made — the decision-making process is a black box, even to its creators. And it may not be possible to really understand how a trained AI program is arriving at its specific decision. A problem arises when we suspect that the system isn’t working. If we suspect the system of algorithmic biases, it’s difficult to check and correct for them if the system is unable to explain its decision making.
There is currently a major research focus on eXplainable AI (XAI) that intends to equip AI models with transparency, explainability and accountability, which will hopefully lead to Responsible AI.
In your keynote address during MITEF Arab Startup Competition final award ceremony and conference you discussed Intel’s AI for Social Good initiatives. Which of these Social Good projects has caught your attention and why is it so important?
I continue to be very excited about all of Intel’s AI for Social Good initiatives, because breakthroughs in AI can lead to transformative changes in the way we tackle problem solving.
One that I especially care about is the Wheelie, an AI-powered wheelchair built in partnership with HOOBOX Robotics. The Wheelie allows extreme paraplegics to regain mobility by using facial expressions to drive. Another amazing initiative is TrailGuard AI, which uses Intel AI technology to fight illegal poaching and protect animals from extinction and species loss.
As part of Intel’s Pandemic Response Initiative, we have many on-going projects with our partners using AI. One key initiative is contactless fever detection or COVID-19 detection via chest radiography with Darwin AI. We’re also working on bots that can answer queries to increase awareness using natural language processing in regional languages.
For women who are interested in getting involved, are there books, websites, or other resources that you would recommend?
There are many great resources online, for all experience levels and areas of interest. Coursera and Udacity offer excellent online courses on machine learning and seep learning, most of which can be audited for free. MIT’s OpenCourseWare is another great, free way to learn from some of the world’s best professors.
Companies such as Intel have AI portals that contain a lot of information about AI including offered solutions. There are many great books on AI: foundational computer science texts like Artificial Intelligence: A Modern Approach by Peter Norvig and Stuart Russell, and modern, philosophical books like Homo Deus by historian Yuval Hararri. I’d also recommend Lex Fridman’s AI podcast on great conversations from a wide range of perspectives and experts from different fields.
Do you have any last words for women who are curious about AI but are not yet ready to leap in?
AI is the future, and will change our society — in fact, it already has. It’s essential that we have honest, ethical people working on it. Whether in a technical role, or at a broader social level, now is a perfect time to get involved!
Thank you for the interview, you are certainly an inspiration for women the world over. Readers who wish to learn more about the software solutions at Intel should visit AI Software Products at Intel.
AI Education Startup Riiid Seeks Worldwide Expansion After New Funding Round
The South Korean-based AI education startup Riiid has announced that the company raised $41.8 million in a pre-Series D funding round. The new investment, which includes the state-run Korea Development Bank (KDP), NVESTOR, Intervest, and existing investor IMM Investments, brings the company’s total funding up to $70.2 million.
According to the company, the funding is another indicator of its success, with over 200 percent annual sales growth and more than a million users since 2017.
Mobile Test Prep
One of Riiid’s biggest contributions to the field of education is a mobile test prep application called Santa. The application focuses on the Test of English for International Communication (TOEIC), and it has been used by more than one million students in Korea and Japan.
The company’s proprietary AI technology has helped launch it to No. 1 in sales among education apps in both Korea and Japan. The AI is able to provide analysis of student data and content, predict user behavior and scores, and what may be its most impressive feature is the ability to recommend personalized study plans in real-time. The use of personalized lessons has been regarded by many as one of the most effective approaches to education.
With the company’s success in the Santa application, it will now look to provide back-end solutions all across the globe for companies, school districts, and education ministries.
Y J Jang is Riiid’s CEO.
“Riiid successfully completed domestic funding amid a slower investment environment due to the unprecedented COVID-19 pandemic and has made significant progress in negotiating with overseas financial investors to accelerate global expansion,” said Jang. “Riiid is already in the process of forming various global partnerships based on its verified AI technology in both academic and commercial markets, and will soon unveil new products and services. We are committed to creating a future for education beyond our imagination through in-depth R&D and commercialization of technology.”
The company will use the secured funding to improve the company’s deep learning technology even further. One of its goals is to provide solutions that help students achieve learning objectives throughout the entire education process, not just for specific tests or tasks. This would be done through constant evaluation and feedback.
The company will also look to continue its expansion outside of South Korea, moving into the United States, South America, the Middle East, and other areas of the world. The company has recently opened up Riiid Labs in Silicon Valley, which acts as the global headquarters of the company.
“Riiid is establishing a global standard while defining valid technologies and leading researches in the field of AI EdTech,” said Intervest Director, Jay Jeon. “At a time when the need for effective remote learning solutions is expanding not only in the education market but also in various industries, the investment was made highly valuing the marketability of Riiid’s proven business model in Santa, excellent talent pool, and various global partnerships that are underway based on a scalable technology structure.”
Riiid also contributes to AI research and publishes papers at top AI conferences such as Neural Information Processing Systems (NeurIPS), the International Conference on Computer Supported Education (CSEDU), and others.
The company also launched EdNet in early 2020, which is the world largest open database for AI education.
Researchers Develop Tool Able to Turn Equations Into Illustrations
Researchers at Carnegie Mellon University have created a tool that is able to turn the abstractions of mathematics into illustrations and diagrams through software.
The process works by users typing ordinary mathematical expressions which are then turned into illustrations by the software. One of the major developments in this project is that the expressions are not required to be basic functions, as in the case of a graphing calculator. Instead, they can be complex relationships coming from various different fields within mathematics.
The tool was named Penrose by the researchers, inspired by the mathematician and physicist Roger Penrose, who is known for using complex mathematical and scientific ideas through diagrams and drawings.
Penrose will be presented by researchers at the SIGGRAPH 2020 Conference on Computer Graphics and Interactive Techniques. The conference will take place virtually this year due to the COVID-19 pandemic.
Keenan Crane is an assistant professor of computer science and robotics.
“Some mathematicians have a talent for drawing beautiful diagrams by hand, but they vanish as soon as the chalkboard is erased,” Crane said. “We want to make this expressive power available to anyone.”
Diagrams are not used as much in technical communication, due to the required amount of high-skill and tedious work required in order to produce them. To get around this, the Penrose tool allows experts to encode the steps in the system, and other users are then able to access this by using mathematical language. All of this means that the computer is doing most of the work.
Katherine Ye is a Ph.D student in the Computer Science Department.
“We started off by asking: ‘How do people translate mathematical ideas into pictures in their head?'” said Ye. “The secret sauce of our system is to empower people to easily ‘explain’ this translation process to the computer, so the computer can do all the hard work of actually making the picture.”
The computer first learns how the user wants the mathematical objects visualized, such as an arrow or a dot, and it then draws up multiple diagrams. The user selects and edits one of those diagrams.
According to Crane, mathematicians should have no problem learning the special programming language that the team developed.
“Mathematicians can get very picky about notation,” he said. “We let them define whatever notation they want, so they can express themselves naturally.”
Penrose is seen as a step towards something even bigger.
“Our vision is to be able to dust off an old math textbook from the library, drop it into the computer and get a beautifully illustrated book — that way more people understand,” Crane said.
The team that developed Penrose also included Nimo Ni and Jenna Wise, who are Ph.D. students in CMU’s Institute for Software Research (ISR); Jonathan Aldrich, professor in ISR; Joshua Sunshine, ISR senior research fellow; Max Krieger, cognitive science undergraduate; and Dor Ma’ayan, former master’s student at the Technion-Israel Institute of Technology.
The research was supported by the National Science Foundation, Defense Advanced Research Projects Agency, the Sloan Foundation, Microsoft Research, and the Packard Foundation.
- Matt Carlson, VP Business Development at WiBotic – Interview Series
- U.S. National Institutes of Health Turns to AI for Fight Against COVID-19
- WiBotic Receives Industry-First FCC Approval for High Power Wireless Charging of Robots
- AI Browser Tools Aim To Recognize Deepfakes and Other Fake Media
- Dave Ryan, General Manager, Health & Life Sciences Business at Intel – Interview Series