Thought Leaders
Why AI Models Don’t Solve Radiology’s Core Problems

Most AI work in radiology today is centered on diagnostic models. The focus is on improving accuracy by training on more data and refining models. This approach treats radiology mainly as a visual recognition task, where better detection is expected to improve the whole system. At first glance, this seems reasonable: if detection improves, diagnostic quality should improve as well. This view misses the main issue.
The core problem in radiology is how diagnosis itself is structured. It is not a limitation of pattern recognition, but of how work is organized and decisions are produced. And improving models does not fix that. Below, we look at why this happens and what changes are needed beyond model accuracy.
The Problem Is Structural
CV models help with recognizing patterns in images. In controlled studies, they can reach performance comparable to radiologists, with AUC often above 0.90 and, in some screening settings, matching or exceeding specialist performance. In many domains, the ability to detect findings is no longer the main constraint.
At the same time, they work only with image data. They highlight possible findings, but do not take part in the diagnostic process itself. They do not combine different inputs into a single diagnosis or guide how decisions are made. In many domains, the ability to detect findings is no longer the main constraint. They are designed to recognize patterns, not to participate in diagnosis as a process.
In practice, this means the structure does not change. Radiologists remain responsible for reviewing results and making final decisions, and AI outputs often require additional verification. Responsibility stays with the physician. The workflow does not change. The bottleneck remains. Each study is still assigned to a single radiologist who is expected to review, interpret, validate, and report. Studies show that clinician performance still depends heavily on how AI errors are handled, and incorrect outputs can negatively affect decision-making.
A model cannot revisit its own decision. If it makes a mistake, there is no way to reconsider the case with additional context. The result remains fixed, while real diagnostics requires continuous adjustment. Their outputs are static, while clinical reasoning is inherently iterative.
As a result, bottlenecks remain, variability persists, and the process continues to depend on a single radiologist. As imaging volumes grow, a system built around one specialist per case inevitably reaches its limits.
A Crisis of Trust
AI in radiology also raises a problem of trust.
A radiologist does not just question a model’s output. They question the system behind it, especially when AI is presented as a replacement rather than a tool they can control. Trust in the model is unstable. Less experienced doctors may rely on it too much and miss errors. More experienced ones, after a single mistake, may stop using it. These two patterns — overreliance and disengagement — both introduce new risks instead of reducing them.
After an error, the model often drops out of the decision-making process. It is no longer used as a tool, but simply ignored. This breaks continuity and limits its practical value. In practice, this leads to inconsistent usage rather than stable integration into workflow.
This reflects a deeper conflict with how medicine works. A radiologist cannot pass responsibility for a diagnosis to a system. Clinical and legal responsibility stays with the physician. Treating AI as a substitute goes against the structure of medical practice.
This creates a structural conflict where the model suggests but the doctor must verify. Recognition can be automated. Responsibility cannot. And as long as responsibility cannot be transferred, every model output must be checked, limiting its ability to reduce workload.
The Illusion of Efficiency
AI does not remove work from the radiologist. It adds a layer of control. The physician still interprets the study and also has to verify the model’s output. This creates double work instead of saving time. Instead of eliminating effort, AI often redistributes it into additional steps of validation and correction.
The system may appear faster, but in practice time shifts into additional checks rather than being reduced. Over time, this accumulation of tools increases operational overhead without fundamentally improving how diagnosis is produced.
AI also introduces fragmentation. Most models are built for a narrow task: a single anatomical region, modality, or pathology. To cover real clinical needs, a clinic has to use multiple models. Instead of a unified system, this leads to a fragmented set of tools.
Each model brings its own interface, integration, and maintenance requirements. This increases system complexity and adds more points of interaction.
As a result, the system becomes harder to manage, while the core diagnostic process remains unchanged.
Limits of Scaling AI
With each new version of a model, trust has to be rebuilt. The workflow starts to adjust to the model, instead of supporting the physician. This further shifts focus away from improving the system itself toward adapting to individual tools.
This is made harder by the narrow focus of most models. Each one is designed for a specific task or anatomical region. In practice, this means a clinic needs several models at once, each with its own constraints, integrations, and costs. The system becomes harder to operate.
At the same time, it becomes clear where AI has the most impact today. Not in interpreting images, but in improving the system around it — routing studies, managing queues, assigning cases to the right specialist, and forecasting workload. These areas benefit from coordination rather than isolated prediction.
What Happened in Practice
In our network of clinics, with more than one million MRI and CT studies performed annually across over 40 clinics in three countries, we implemented several AI models, including FDA-approved solutions. The setup was simple: the model analysed imaging studies, generated a report, and the radiologist reviewed and confirmed it. The expected outcome was faster turnaround and fewer errors.
In practice, experienced radiologists continued to interpret each study themselves before looking at the model’s output. Working with generated text often took more effort than writing from scratch, as it did not match their reporting style. The process did not speed up and in many cases became slower. In some cases, interacting with the model added more friction than value.
User behaviour also created risk. When the model made an error, experienced radiologists tended to lose trust and stop using it. Less experienced doctors were more likely to rely on the system, increasing the chance of missed errors and related clinical and legal exposure.
These observations point to a deeper issue. Improving individual tools does not change how the system works. The bottleneck lies in a workflow built around a single radiologist handling a study from start to finish. These constraints make it difficult to improve the system by adding more tools alone.
At DICO, we approached this differently. Instead of adding AI to the existing workflow, we focused on the structure itself. Cases can be broken down into parts, with each handled by the relevant specialist working directly within the imaging data. The final report is assembled from these inputs, with AI supporting coordination. Diagnosis, in this model, is no longer written by one person — it is assembled from multiple contributions.
These were not issues of model quality. They were issues of structure.
Rethinking the Diagnostic System
Modern radiology is built on a single assumption. One study, one radiologist. This creates bottlenecks, variability and limits scalability. The main issue is not that models make errors. The system is not built to work with them. It is built on a non-scalable foundation.
A non-iterative tool is placed into a process that depends on iteration. In clinical practice, radiology is not a single-pass decision. Studies show that disagreement between initial and second interpretations occurs in ~30% of cases, and clinically significant discrepancies can affect management in ~18-20% of cases. Second opinions are not an exception, but a routine part of diagnostic work.
More broadly, across medicine, second opinions lead to a change in diagnosis in 10-62% of cases, depending on the setting.
Without the ability to revise, coordinate, and re-evaluate with context, model outputs remain static, while real diagnostics requires continuous adjustment.
The next step is not more accurate models. It is a redesign of the diagnostic system.
The shift is from separate AI tools to a unified way of managing diagnostics — where diagnosis is not a single decision, but a process that is routed and interpreted across multiple steps. The limitation is not the lack of intelligent models, but the absence of a system that can coordinate diagnostic work as a process.












