Anderson's Angle

What AI Can Tell Us About Hidden Agendas in the News

Published July 16, 2025

Martin Anderson

ChatGPT-style models are being trained to detect what a news article really thinks about an issue – even when that stance is buried under quotes, framing, or (sometimes disingenuous) ‘neutrality’. By breaking articles into segments such as headlines, leads, and quotations, a new system learns to spot bias even in long-form professional journalism.

The ability to understand the true viewpoint of a writer or speaker – a pursuit known in the literature as stance detection – addresses one of the most difficult interpretive problems in language: gleaning the intent from content that may be designed to hide or obscure it.

From Jonathan Swift’s A Modest Proposal, to recent performances by political actors borrowing the polemics of their ideological opponents, the surface of a statement is no longer a reliable indicator of its intent; the rise of irony, trolling, disinformation and strategic ambiguity has made it harder than ever to pinpoint what side a text actually lands on, or whether it lands at all.

Often, what goes unsaid carries as much weight as what is stated, and simply choosing to cover a topic can signal the author’s position.

That makes the task of automatic stance detection unusually challenging, since an effective detection system needs to do more than tag isolated sentences as ‘supportive’ or ‘oppositional’: instead it must iterate through layers of meaning, weighing small cues against the shape and drift of the whole article; and this is harder in long-form journalism, where tone may shift and where opinion may rarely be stated outright.

Agents For Change

To address some of these issues, researchers in South Korea have developed a new system called JOA-ICL (Journalism-guided Agentic In-Context Learning) for detecting the stance of long-form news articles.

The core idea behind JoA-ICL is that article-level stance is inferred by aggregating segment-level predictions produced by a separate language model agent. Source: https://arxiv.org/pdf/2507.11049

Instead of judging an article as a whole, JOA-ICL breaks it into structural parts (headline, lead, quotations, and conclusion) and assigns a smaller model to label each one. These local predictions are then passed to a larger model, which uses them to determine the article’s overall stance.

The method was tested on a newly compiled Korean dataset containing 2,000 news articles annotated for both article-level and segment-level stance. Each article was labeled with input from a journalism expert, reflecting how stance is distributed across the structure of professional news-writing.

According to the paper, JOA-ICL outperforms both prompting-based and fine-tuned baselines, demonstrating particular strength in detecting supportive stances (which models with a similar ambit tend to miss). The method also proved effective when applied to a German dataset under matched conditions, indicating that its principles are potentially resilient to language forms.

The authors state:

‘Experiments show that JOA-ICL outperforms existing stance detection methods, highlighting the benefits of segment-level agency in capturing the overall position of long-form news articles.”

The new paper is titled Journalism-Guided Agentic In-Context Learning for News Stance Detection, and comes from various faculties at Seoul’s Soongsil University, as well as KAIST’s Graduate School of Future Strategy.

Method

Part of the challenge of AI-augmented stance detection is logistical, and related to how much signal a machine learning system can retain and collate at one time, at the current state-of-the-art.

News articles tend to avoid direct statements of opinion, relying instead on an implicit or assumed stance, signaled through choices about which sources to quote, how the narrative is framed, and what details are left out, among many other considerations.

Even when an article does take a clear position, the signal is often scattered across the text, with different segments pointing in different directions. Since language models (LMs) still struggle with limited context windows, this can make it difficult for models to assess stance in the way they do with shorter content (such as tweets and other short-form social media), where the relationship between the text and the target is more explicit.

Therefore standard approaches often fall short when applied to full-length journalism; a case where ambiguity is a feature rather than a flaw.

The paper states:

‘To address these challenges, we propose a hierarchical modeling approach that first infers the stance at the level of smaller discourse units (e.g., paragraphs or sections), and subsequently integrates these local predictions to determine the overall stance of the article.

‘This framework is designed to retain local context and capture dispersed stance cues in assessing how different parts of a news story contribute to its overall position on an issue.’

To this end, the authors compiled a novel dataset titled K-NEWS-STANCE, drawn from Korean news coverage between June 2022 and June 2024. Articles were first identified through BigKinds, a government-backed metadata service operated by the Korea Press Foundation, and full texts were retrieved using the Naver News aggregator API. The final dataset comprised 2,000 articles from 31 outlets, covering 47 nationally-relevant issues.

Each article was annotated twice: once for its overall stance toward a given issue, and again for individual segments; specifically the headline, lead, conclusion, and direct quotations.

The annotation was led by journalism expert Jiyoung Han, also the paper’s third author, who guided the process through the use of established cues from media studies, such as source selection, lexical framing, and patterns of quotation. By these means a total of 19,650 segment-level stance labels were obtained.

To ensure the articles contained meaningful viewpoint signals, each was first classified by genre, and only those labeled as analysis or opinion (where subjective framing is more likely to be found) were used for stance annotation.

Two trained annotators labeled all articles, and were instructed to consult related articles in case the stance was unclear, with disagreements resolved through discussion and additional review.

Sample entries from the K-NEWS-STANCE dataset, translated into English. Only the headline, lead, and quotations are shown; full body text is omitted. Highlighting indicates stance labels for quotations, with blue for supportive and red for oppositional. Please refer to the cited source PDF for clearer rendition.

JoA-ICL

Rather than treating an article as a single block of text, the authors’ proposed system divides it into key structural parts: headline, lead, quotations, and conclusion, assigning each of these to a language model agent, which labels the segment as supportive, oppositional, or neutral.

These local predictions are passed to a second agent that decides the article’s overall stance, with the two agents coordinated by a controller that prepares the prompts and gathers the results.

Thus JoA-ICL adapts in-context learning (where the model learns from examples in the prompt) to match the way that professional news stories are written, using segment-aware prompts instead of a single generic input.

(Please note that most of the examples and illustrations in the paper are lengthy and difficult to reproduce legibly in an online article. We therefore implore the reader to examine the original source PDF)

Data and Tests

In tests, the researchers used macro F1 and accuracy to evaluate performance, averaging results over ten runs with random seeds from 42 to 51 and reporting standard error. Training data was used to fine-tune baseline models and segment-level agents, with few-shot samples selected through similarity search using KLUE-RoBERTa-large.

Tests were run over three RTX A6000 GPUs (each with 48GB of VRAM), using Python 3.9.19, PyTorch 2.5.1, Transformers 4.52.0, and vLLM 0.8.5.

GPT-4o-mini, Claude 3 Haiku, and Gemini 2 Flash were utilized via API, at a temperature of 1.0 and with max tokens set to 1000 for chain-of-thought prompts, and 100 for others.

For full fine-tuning of Exaone-3.5-2.4B, the AdamW optimizer was used at a 5e-5 learning rate, with 0.01 weight decay, 100 warmup steps, and with the data trained for 10 epochs at a batch size of 6.

For baselines, the authors used RoBERTa, fine-tuned for article-level stance detection; Chain-of-Thought (CoT) Embeddings, an alternate tuning of RoBERTa for the assigned task; LKI-BART, an encoder-decoder model that adds contextual knowledge from a large language model by prompting it with both the input text and the intended stance label; and PT-HCL, a method that uses contrastive learning to separate general features from those specific to the target issue:

Performance of each model on the K-NEWS-STANCE test set for overall stance prediction. Results are shown as macro F1 and accuracy, with the top score in each group in bold.

JOA-ICL achieved the best overall performance across both accuracy and macro F1, an advantage evident across all three model backbones tested: GPT-4o-mini, Claude 3 Haiku, and Gemini 2 Flash.

The segment-based method consistently outperformed all other approaches, with, the authors observe, a notable edge in detecting supportive stances, a common weakness in similar models.

Baseline models performed worse overall. RoBERTa and Chain-of-Thought variants struggled with nuanced cases, while PT-HCL and LKI-BART fared better, while still trailing JOA-ICL across most categories. The most accurate single result came from JOA-ICL (Claude), with 64.8% macro F1 and 66.1% accuracy.

The image below shows how often the models got each label right or wrong:

Confusion matrices comparing the baseline and JoA-ICL, showing that both methods struggle most with detecting ‘supportive’ stances.

JoA-ICL did better overall than the baseline, getting more labels correct in every category. However, both models struggled most with supportive articles, and the baseline misclassified nearly half, often mistaking these for neutral.

JoA-ICL made fewer mistakes but showed the same pattern, reinforcing that ‘positive’ stances are harder for models to spot.

To test whether JoA-ICL works beyond the confines of the Korean language, the authors ran it on CheeSE, a German dataset for article-level stance detection. Since CheeSE lacks segment-level labels, the researchers used distant supervision, wherein every segment was assigned the same stance label as the full article.

Stance detection results on the German-language CheeSE dataset. JoA-ICL consistently improves over zero-shot prompting across all three LLMs and outperforms fine-tuned baselines, with Gemini-2.0-flash yielding the strongest overall performance.

Even under these ‘noisy’ conditions, JoA-ICL outperformed both fine-tuned models and zero-shot prompting. Of the three backbones tested, Gemini-2.0-flash gave the best results.

Conclusion

Few tasks in machine learning are more politically charged than stance prediction; yet it is often handled in cold, mechanical terms, while more attention is given to less complex issues in generative AI, such as video and image creation, which trigger far louder headlines.

The most encouraging development in the new Korean work is that it offers a significant contribution to analysis of full-length content, rather than tweets and short-form social media, whose incendiary effects are more quickly forgotten than a treatise, essay or other significant work.

One notable omission in the new work and (as far as I can tell) in the stance prediction corpus in general is the lack of consideration given to hyperlinks, which frequently stand in for quotes as optional resources for readers to learn more about a subject; yet it must be clear that the choice of such URLs is potentially very subjective and even political.

That said, the more prestigious the publication, the less likely that it will include any links at all that guide the viewer away from the host domain; this, together with diverse other SEO uses and abuses of hyperlinks, makes them more difficult to quantify than explicit quotes, titles, or other parts of an article that may seek, consciously or not, to influence the reader’s opinion.

First published Wednesday, July 16, 2025

Related Topics:influence engineering Large Language Models (LLMs)nlp