Thought Leaders

A Good Spend Taxonomy Has Two Customers

mm

A good spend taxonomy has two customers: the people who need to use it, and the models that need to classify against it.

Most leaders understand taxonomy as a category structure—a way to organize what they’re spending into meaningful buckets. In reality, it is much more than a reporting framework. It shapes how people interpret that spend, how it shapes into data, and increasingly influences how AI systems categorize, analyze, and generate insights from that data.

That is the part of spend visibility implementations that often gets underestimated. The taxonomy is usually treated like a setup step. Define the hierarchy, load it into the platform, map the spend, and move on. But AI adoption in procurement is accelerating; in 2025, 80% of CPOs planned to deploy generative AI within three years. Only 36% had meaningful implementations.

In reality, taxonomy is usually where that gap starts. It becomes the language the business uses to understand spend and one of the most important inputs into AI-driven categorization. If it fails either audience, the downstream impact shows up quickly: poor adoption, lower trust, and models that are harder to tune than they need to be.

The Adoption Problem

For users, taxonomy design is a change-management issue. Category managers, sourcing teams, finance users, and executives need to look at spend buckets and understand what they mean without a translation layer.

Messy labels make that harder. So do internal acronyms, ambiguous category names, redundant categories, and inconsistent levels of detail across the hierarchy. A spend cube can classify transactions correctly and still create a poor user experience if users cannot interpret the categories. Gartner found that 63% of organizations either don’t have or aren’t sure they have the right data management practices for AI, and predicts that through 2026, 60% of AI projects unsupported by AI-ready data will be abandoned.

This is where implementations need category-team input. The people managing the categories understand how spend is sourced, negotiated, and acted on. They know whether a bucket is useful, whether a distinction matters, and whether a label reflects how the business actually talks about spend.

But that input needs guardrails. Every category team cannot design in isolation.

A Facilities team may want deep detail for every service type: labor, materials, asset type, repair type, and service frequency. An IT team may prefer broad categories such as Hardware, Software, and Services. Both views may make sense inside their own function. Neither should become the default design principle for the full enterprise taxonomy.

A centralized team has to create the framework. How many levels should the taxonomy have? Where does more granularity create better sourcing insight? Where does it create noise? Which labels will be clear to non-specialists? Which categories need to be separated, and which should stay consolidated?

A good taxonomy is not the most detailed version of every category team’s preferences. It is the shared language the enterprise uses to understand spend consistently.

The AI Problem

The same taxonomy also has to work for AI.

In AI-driven categorization, labels and definitions are not just documentation. They become part of the signal used to classify transactions. If two categories have vague or overlapping labels, the model has less basis for choosing one over the other. If a definition is too generic, it may over-match. If it uses language that never appears in the data, it may not match at all.

This is not simply a model maturity issue. It is a taxonomy design issue.

Good taxonomy design gives the model cleaner targets. Categories should be distinct, describable, recognizable in the underlying data, and clear about what belongs and what does not belong. That last point matters. Inclusion language tells the model what to look for. Exclusion language helps separate adjacent categories that may share similar vocabulary.

Consider areas like facilities maintenance, MRO, building services, equipment repair, and general industrial supplies. These categories can easily overlap. A human reviewer may understand the intended distinction from context. A model needs a clearer signal. If multiple categories all describe similar maintenance activity without specific boundaries, categorization confidence will suffer.

The same issue appears with fallback categories. A broad bucket, like MRO / General Industrial Supplies, can be useful when the data is truly vague. But it should not become a catch-all for spend that could be classified more precisely. If the data clearly indicates safety glasses, gloves, PPE, or first aid supplies, the taxonomy should provide enough signal to classify that spend as Safety Supplies instead of leaving it in a generic bucket.

What Better Taxonomy Design Looks Like

The best taxonomy work is not purely manual, and it is not fully automated. It is a hybrid approach.

Start with a centralized framework. Define naming conventions, hierarchy depth, fallback categories, and the level of granularity required for decision-making. Then bring in category teams to pressure-test the structure against how spend is actually managed.

From there, write practical definitions, not academic. A useful category definition should say what belongs, what does not belong, and what language is likely to appear in the data. Vendor names, product terms, service descriptions, and common abbreviations can all matter when they are used carefully.

Then test the taxonomy against real transactions. Review high-spend examples. Review low-confidence matches. Look for categories that are over-grabbing spend because their definitions are too broad. Look for categories that are under-matching because their definitions do not use the vocabulary found in the source data.

This is where AI is valuable. It can surface patterns, measure confidence, identify ambiguous matches, and help teams prioritize where refinement is needed. But the human-in-the-loop step still matters because the model cannot decide the business meaning of a category on its own.

Taxonomy design should be treated as both an implementation workstream and a model-quality input. Labels and definitions influence categorization. The broader shift toward AI-native procurement is making that foundation harder to ignore — data readiness is treated as a competitive differentiator rather than a technical requirement. Technical approaches like TF-IDF matching, semantic similarity, confidence thresholds, score margins, abbreviation expansion, and feedback loops work better when the taxonomy itself is clear and separable.

The point is not to overwhelm procurement teams with model terminology. The point is that taxonomy quality becomes model quality. Better labels and definitions create better signals. Better signals create stronger categorization. Stronger categorization creates more trust in the spend cube.

The Implementation Lesson

Taxonomy buildout deserves more time than it usually gets in the project plan.

Rushing this step creates two predictable problems. The first is poor adoption. Users do not trust a spend cube when the categories do not match how they think about spend or when the hierarchy feels inconsistent across teams.

The second is poor model performance. Categorization becomes harder when the target categories are vague, redundant, or disconnected from the language in the data.

Neither problem is solved by simply applying more AI. The foundation has to be right. That is the same pattern showing up across enterprise AI broadly: most AI project failures trace back to a data foundation that was not ready, not to the models themselves.

A strong taxonomy is governed centrally, informed by category experts, tested against real data, refined through model feedback, and maintained over time. It is not a one-time setup file. It is a core part of the spend visibility operating model.

The taxonomy is not an administrative cleanup. It is the foundation for trust in the spend cube. Increasingly, it is also the foundation for how well AI can classify, explain, and improve procurement data over time.

Mitch Couper is Vice President of Data and Analytics at SpendHQ, where he leads the team responsible for transforming complex procurement data into reliable, actionable business intelligence. With a decade at SpendHQ and a background in procurement consulting, he brings deep expertise to how enterprise organizations structure and use their spend data.