Thought Leaders
How I Transferred My Knowledge into AI Systems That Can Actually Make Decisions Like Human Experts

When I left Microsoft and continued working with enterprises on their AI deployments, I kept seeing that most of the AI systems people were excited about could not actually make decisions with true human judgment. Sure, they could write, summarize, and produce remarkably fluent text that sounded like a decision, but when you drop these systems into a real operational environment, where there are trade-offs, uncertainty, incomplete instructions, and actual consequences, they struggle fast. This matches up with data from MIT Project NANDA showing that while 60% of organizations evaluated AI tools, only 20% reached the pilot stage, and just 5% reached production. In other words, the industry is struggling to build systems that can actually hold up inside real workflows.
In enterprise settings, especially in areas like supply chain, manufacturing, and operations, getting an answer isn’t hard; it’s knowing which answer to trust, which variables matter most, and what’s likely to break downstream if you get it wrong. In my eyes, this is both an expertise and a judgment problem.
To be clear, AI has made extraordinary strides in producing better outputs. But better output is not the same as better decisions. These are two distinct milestones, and I think the industry has spent a lot of time treating them as interchangeable.
The lack of expertise and judgment is why I became interested in building AI that human experts can teach to make complex decisions the way they do. AI shouldn’t only be about automating tasks, but about effectively and safely transferring human judgment into AI that holds up.
Large language models (LLMs) speak like decision makers, but they aren’t
There’s no question that LLMs are useful, but they are not, by default, decision-making systems. They are prediction systems wrapped in language. And language is persuasive, which is part of the problem. If a system can explain itself fluently, we easily overestimate what it understands. You ask it a business question, it gives you a structured answer with trade-offs, caveats, and a neat little summary at the end, making it feel smarter than it is. Sounding coherent and being operationally competent are not the same thing, and this is where a lot of enterprise AI breaks. Models can tell you what a good decision sounds like without having any understanding of what makes a decision good under pressure, over time, or in context. This is one reason many organizations struggle to move beyond experimentation. Gartner found that at least 50% of generative AI projects are abandoned after proofs of concept, long before they deliver real operational impact, often due to unclear value and risk controls.
Information isn’t the same as expertise
One of the easiest traps to fall into with AI is assuming that if a system has enough information, it should be able to perform like an expert. Sounds reasonable, but when you think about it in our everyday lives, increasing our information about something doesn’t automatically make us experts. You can read every aviation manual and still not be ready to land a plane. You can memorize every best practice in the supply chain and still freeze when three things go wrong at once.
I could go on, but the point is that information doesn’t equate to capability. Capability comes from experience, specifically, repeated exposure to messy situations where the answer is not obvious.
Every day, I see that most of today’s AI systems are trained on static examples. This is all helpful for making predictions, but that’s only a small part of decision-making. Enterprises are not lacking in data per se, but they need structured environments for practice, which means giving systems environments where they can repeatedly:
- Encounter realistic scenarios
- Make choices
- See what happens
- Receive feedback
- Improve over time
AI can be trained using predictive algorithms, but that approach has limitations. What is needed next is AI that can be trained in a simulated environment with human oversight. I call this machine teaching, a methodology that breaks down complex decisions into scenarios and skills, providing a guide for human experts to teach AI through simulation. The resultant feedback and trial-and-error ultimately enable agents to learn and act with real-world autonomy directly from the people who built those processes.
Stop treating AI like a monolith
Another mistake I see a lot is the assumption that one large model should somehow do everything. No basketball team consists of just one person. No factory is run by an individual. Complex systems work because different components do different jobs, and there’s a structure holding them together.
AI should be built the same way. I don’t think the long-term future of enterprise decision-making is one giant model sitting in the middle of the company pretending to be universally competent. It’s much more likely to look like teams of specialized agents.
One agent could be an expert on data retrieval. Another is better at evaluating scenarios. Another handles planning. One checks compliance or catches contradictions. Another acts more like a supervisor, deciding when to escalate or when confidence is too low to proceed. Team architecture makes a lot more sense to me because it maps to how real organizations actually work and aligns with broader market trends. McKinsey’s findings reinforce that organizations get the most value from AI by redesigning workflows and operating structures around it.
Not all decisions are made the same way, and too often we assume the same model, the same data, and the same type of reasoning can handle them all. In reality, different decisions require different mechanisms.
The four ways decisions actually happen
In my experience, most decisions tend to fall into a few buckets:
- Control systems (rules and formulas): Decisions are made by applying predefined equations or rules to known inputs. If X happens, do Y.
- Search and optimization: Decisions are made by evaluating many possible options and selecting the best one based on a defined objective.
- Reinforcement learning (trial and error): Decisions are learned over time by taking actions, observing outcomes, and adjusting based on reward or penalty.
- Practice and experience (human-style learning): Decisions are shaped through repeated exposure, guided feedback, and accumulated judgment in real-world scenarios.
Most enterprise AI does well in the first two categories. The third and fourth categories are more challenging for AI, because that is where human-like judgment lives.
Autonomy without structure is risk
Whenever people talk about autonomous AI, the conversation tends to split into two extremes. One side thinks the systems are basically magic and ready to run everything. The other side acts as if they should never be trusted with anything meaningful.
I don’t think either view is useful. We should focus on autonomy within structure because autonomy without supervision, escalation logic, boundaries, or accountability is the main source of risk. Risk concerns are showing up more now, too, including in conversations shaped by efforts such as the National Institute of Standards and Technology’s AI Risk Management Framework, which reflects how seriously organizations are taking questions of oversight, accountability, and operational trust.
The future of enterprise AI lies in teams of agents. Organizations that get the most value from AI won’t be the ones that automate the most words. They are the ones who figure out how to transfer real expertise into systems that can hold up when the environment gets messy. That, in my view, is the difference between AI that looks impressive and AI that becomes genuinely useful, producing real ROI.












