Artificial Intelligence

OpenAI’s ChatGPT Tackles University Accounting Exams

Published

1 year ago

May 1, 2023

Photo by Sarah Elizabeth on Unsplash

OpenAI recently launched its groundbreaking AI chatbot, GPT-4, which has been making waves in various fields. With a 90th percentile score on the bar exam, passing 13 out of 15 AP exams, and scoring near-perfectly on the GRE Verbal test, GPT-4's performance has been nothing short of extraordinary.

Researchers at Brigham Young University (BYU) and 186 other universities were curious about how OpenAI's technology would perform on accounting exams. They tested the original version, ChatGPT, and found that while there is still room for improvement in the accounting domain, the technology is a game changer that will positively impact the way education is delivered and received.

Since its debut in November 2022, ChatGPT has become the fastest-growing technology platform ever, reaching 100 million users in under two months. In light of the ongoing debate about the role of AI models like ChatGPT in education, lead study author David Wood, a BYU professor of accounting, decided to recruit as many professors as possible to assess the AI's performance against actual university accounting students.

ChatGPT vs. Students on Accounting Exams

The research involved 327 co-authors from 186 educational institutions across 14 countries, who contributed 25,181 classroom accounting exam questions. BYU undergraduates also provided 2,268 textbook test bank questions. The questions covered various accounting subfields, such as accounting information systems (AIS), auditing, financial accounting, managerial accounting, and tax. They also varied in difficulty and type.

Although ChatGPT's performance was impressive, students outperformed the AI, with an average score of 76.7% compared to ChatGPT's 47.4%. On 11.3% of questions, ChatGPT scored higher than the student average, particularly excelling in AIS and auditing. However, it struggled with tax, financial, and managerial assessments, possibly due to its difficulty with mathematical processes.

ChatGPT performed better on true/false questions (68.7% correct) and multiple-choice questions (59.5%) but had difficulty with short-answer questions (28.7% to 39.1%). It generally struggled with higher-order questions, sometimes providing authoritative written descriptions for incorrect answers or answering the same question in different ways.

The Future of ChatGPT in Education

Despite its limitations, researchers anticipate that GPT-4 will improve on accounting questions and address the issues they discovered. The most promising aspect is the chatbot's potential to enhance teaching and learning, such as helping design and test assignments or draft portions of a project.

“This is a disruption, and we need to assess where we go from here,” said study coauthor and fellow BYU accounting professor Melissa Larson. “Of course, I'm still going to have TAs, but this is going to force us to use them in different ways.”

As AI continues to advance, educators must adapt and find new ways to incorporate these technologies into their teaching methods.