Thought Leaders
When AI’s ‘Knowledge’ Is 50 Years Old: The Compliance Risk You Can’t Ignore

The issue of false AI insights is an urgent challenge as enterprises increase their use of generative tools. Despite widespread enthusiasm about AI adoption, there’s also a strong current of criticism. Critical commentators often point to apparently random, unpredictable inaccuracies in AI’s output, which undermine its value – and can even threaten real harm to humans, particularly in sectors like healthcare and transportation, where false outputs could theoretically lead to everything from the wrong prescription to trains on a collision course.
Often, these inaccuracies have been put down to AI ‘hallucinations’ – instances in which the AI generates a ‘best guess’ answer, conveyed with the same confidence as a ‘genuine’ answer, rather than informing the user of a gap in its knowledge or ability. Hallucinations can be hard to spot at first glance – but there is a quieter, equally serious problem that’s even tougher to detect.
Data quality debt: AI’s Achilles heel
When AI systems pull from outdated, incomplete, or inaccurate data, false outputs occur but are less immediately discernible. For example, you might ask an AI to identify the symptoms of a medical condition and receive an answer based on a 50-year-old paper instead of current research. The result is unlikely to appear obviously, laughably wrong – but that initial veneer of plausibility poses a real risk to both the patient in question and the healthcare provider.
The same is true across industries – if the data being fed to the AI model includes old, outdated, or partial information, there’s a high risk of false outputs. And as more companies integrate AI into business-critical processes, the risk of drawing false conclusions from poorly governed data grows.
Accuracy for the regulator
This isn’t just a problem for day-to-day operations – it’s also a significant compliance challenge. Regulatory requirements are fast evolving to address concerns about inaccurate AI. For example, a number of early regulatory actions on AI have taken place; notably when Italy temporarily banned ChatGPT over privacy concerns, and the EU Data Protection Board launched a dedicated taskforce to coordinate potential enforcement actions against ChatGPT.
One of the most telling regulatory changes has been the passage of the EU AI Act, the world’s first comprehensive legal framework for AI. The Act sets out obligations based on the risk level of AI systems, from ‘unacceptable risk’ systems, which are banned, to ‘high-risk’ systems, which face strict requirements around transparency, data quality, governance and human oversight.
The significance of the EU AI Act lies not in its ambitious scope but importantly on the precedent it sets. Regulators are making clear AI will be subject to binding, enforceable rules and that organisations must treat compliance and transparency around where and how AI is used as integral to AI adoption rather than an afterthought.
The Act has a wide remit, with the potential to impact a large proportion of AI developments. At its heart is to make AI safe whilst respecting fundamental rights and values. Within this new principled ecosystem comes the diagnosis of potential sources for AI inaccuracies, including the data and datasets feeding the models, model opacity and access, and system design and use. AI solutions are a construct of all three – issues with any of these can have a negative outcome. Not only that, but the data that goes into the design, model development, deployment, and operation of AI is likely to be primarily made up of business records which are themselves subject to various compliance requirements.
In other words, the regulatory environment surrounding AI is becoming increasingly stringent – and that’s just as true for data input as it is for data output, even though the latter gets a lot more headlines.
Five steps to feeding AI compliant, current, relevant dataÂ
To solve this dual challenge – ensuring both compliant data handling and high-quality input that enables high-quality output – businesses need control over training and inference data. Unfortunately, this is something many enterprises still lack.
At the very least, organisations should be applying their broader compliance and governance programs to AI initiatives. They need to start capturing and maintaining appropriate records on the data they feed AI models, how models and systems are designed, as well as the decisions and content generated via AI.
However, it’s also becoming critically important for organisations to go a step beyond that and ensure they have full control over all data that could feasibly be used in AI deployments, whether for initial training or ‘live’ work. This requires a high-quality data management and storage strategy, ensuring all relevant data is intelligently gathered, cleaned, stored, classified, and entitled. To achieve this, organisations need to consider four key steps:
1. Data lineage and provenance
This includes maintaining a record of the source of the data, its origin, ownership, and any changes in metadata (if permitted) throughout its life cycle. It also means maintaining rich metadata and all the underlying documents or artifacts from which it is derived.
2. Data authenticity
This requires maintaining a clear chain of custody for all data, storing objects in their native forms, and hashing objects received to demonstrate data remains unchanged. In addition, organisations must maintain a full audit history for each object, and for all actions and events with respect to any changes.
3. Data classification
Establishing the nature of a set or type of data is important. Organisations need to be able to govern structured data, semi-structured data, and structured sets of data. Giving each class a unique schema can allow organisations to manage diverse sets of data without a one-size-fits-all fixed ontology – avoiding the data being unnecessarily manipulated to force it into an inflexible data structure.
4. Data normalisation
Establishing common definitions and formats of metadata is important for use in analytics and AI solutions. Clearly defined schemas are an important element, along with tools that can transform or map data to maintain consistent, normalised views of related data.
5. Data entitlements
Enterprises need granular entitlement controls, including at an object or field level, based on user or system profiles. This means the right data is available to users and systems who are entitled to access it, while restricting or limiting access to those who are not.
With these crucial elements in place, businesses will be best placed to ensure the data provided to AI models is both high-quality and compliant. AI will drive improvements and efficiencies across industries – but for that to happen, a rock-solid data foundation is essential.