Kąt Andersona

AI Struggles to Emulate Historical Language

Published May 2, 2025

Updated April 3, 2026

Martin Anderson

A collaboration between researchers in the United States and Canada has found that large language models (LLMs) such as ChatGPT struggle to reproduce historical idioms without extensive pretraining – a costly and labor-intensive process that lies beyond the means of most academic or entertainment initiatives, making projects such as completing Charles Dickens’s final, unfinished novel effectively through AI an unlikely proposition.

The researchers explored a range of methods for generating text that sounded historically accurate, starting with simple prompting using early twentieth-century prose, and moving to fine-tuning a commercial model on a small collection of books from that period.

They also compared the results to a separate model that had been trained entirely on books published between 1880 and 1914.

In the first of the tests, instructing ChatGPT-4o to mimic fin‑de‑siècle language produced quite different results from those of the smaller GPT2-based model that had been fine‑tuned on literature from the period:

Asked to complete a real historical text, even a well-primed ChatGPT-4o (lower left) cannot help lapsing back into 'blog' mode, failing to represent the requested idiom. By contrast, the fine-tuned GPT2 model captures the language style well, but is not as accurate in other ways. Source: https://arxiv.org/pdf/2505.00030

Asked to complete a real historical text (top-center), even a well-primed ChatGPT-4o (lower left) cannot help lapsing back into ‘blog’ mode, failing to represent the requested idiom. By contrast, the fine-tuned GPT2 model (lower right) captures the language style well, but is not as accurate in other ways. Source: https://arxiv.org/pdf/2505.00030

Though fine-tuning brings the output closer to the original style, human readers were still frequently able to detect traces of modern language or ideas, suggesting that even carefully-adjusted models continue to reflect the influence of their contemporary training data.

The researchers arrive at the frustrating conclusion that there are no economical short-cuts towards the generation of machine-produced idiomatically-correct historical text or dialogue. They also conjecture that the challenge itself might be ill-posed:

‘[We] should also consider the possibility that anachronism may be in some sense unavoidable. Whether we represent the past by instruction-tuning historical models so they can hold conversations, or by teaching contemporary models to ventriloquize an older period, some compromise may be necessary between the goals of authenticity and conversational fluency.

‘There are, after all, no “authentic” examples of a conversation between a twenty-first-century questioner and a respondent from 1914. Researchers attempting to create such a conversation will need to reflect on the [premise] that interpretation always involves a negotiation between present and [past].’

The new study is titled Can Language Models Represent the Past without Anachronism?, and comes from three researchers across University of Illinois, University of British Columbia, and Cornell University.

Complete Disaster

Initially, in a three-part research approach, the authors tested whether modern language models could be nudged into mimicking historical language through simple prompting. Using real excerpts from books published between 1905 and 1914, they asked ChatGPT‑4o to continue these passages in the same idiom.