Thought Leaders
The Next AI Breakthrough Is Buried in Your Backups

Imagine the possibilities of an insurance company being able to isolate all fire-related damage claims in the Pacific Northwest over the last decade within moments, or if a sales department could gather user feedback with a certain sentiment to proactively improve features before losing leads. The potential positive outcomes for engaging with your historical data to feed AI are endless, but only if backups stop operating like backups.
For years, enterprises across every vertical, from insurance to entertainment, have treated their old data like a dusty insurance policy — something you store away and hope you don’t have to use. These companies have created and are sitting on mountains of data records, files, and videos that barely see the light of day, and provide little to no value for them beyond serving as a backup policy or for compliance and regulatory needs.
So, what’s the problem if this data remains piled somewhere at the back of the digital closet? Much of it was shelved for a reason, right?
This “so what?” mindset overlooks the reality of the AI age, which has drastically shifted consumer expectations for all manner of services and experiences. In a world where businesses are expected to act on real-time insights powered by AI and deliver personalized, context-rich experiences, all that “dormant” data is now one of the most undervalued strategic assets in the enterprise.
Backups are Stuck in the Past
In today’s rapid cloud-first world, backups are too-often treated as static insurance policies – something companies can set, forget, and hope never to touch.
The reality is far messier, and far more costly.
Visibility is the first weakness. In fragmented backup ecosystems, resource sprawl, shadow IT, and misconfigured tags make it hard to prove what is actually protected or to discover when it’s not. Cloud-native backup tools add to the challenge, as while they are easy to switch on, they often lack critical capabilities like true searchability and single-item restore. Third-party tools attempt to fill these gaps, but introduce complexity of their own, requiring agents and additional machines deployed in the customer environment, complicated configuration of backup policies, and hidden pricing models where companies pay not only for licenses but also for every unit of data stored or transferred.
When data is needed – for compliance, legal, or operational needs – the restore processes of these traditional models fall short. Most tools require full snapshot restores, triggering full instance recovery even when only a tiny piece of data is needed. In other words, teams are forced to recover an entire database when only a table or even a single row is relevant to them. The result is enormous overhead in time, compute power, and costs. Most companies’ backup systems lack the granular restore capabilities to work around this redundant, wasteful process.
Compliance demands reveal yet another pain point. Few teams can prove real-time backup success during an audit or show that sensitive data retention policies, encryption, and access controls were properly applied. In a dynamic, multi-cloud world, this can lead to, at best, blanket retention and massive storage bloat, at worst, gaps where sensitive data is left unscrutinized and unsecured.
Organizations that treat backups in the same way they once treated passive archives like LTOs or Glacier face a growing gap between cloud speed and backup readiness. Without automated discovery or classification, data can slip through the cracks even in highly dynamic environments. Backups remain incomplete or inconsistent while spending continues to spiral to extinguish the resultant fires.
From Backups to Data Lakes: Unlocking AI’s Next Frontier
Simply “modernizing” storage won’t bring about the next era of data strategy. Rather, enterprises must transform their backups into fully searchable, analytics-ready data lakes – not only to meet compliance and recovery needs, but also to feed the vast, high-quality datasets that today’s AI models require to learn and operate effectively at scale.
In a data lake model, backups don’t live as static snapshots. They become dynamic repositories enriched with contextual metadata, indexed for granular search, and connected to analytical tools. Instead of merely meeting disaster recovery and compliance obligations, they actively contribute to business intelligence, product innovation, and customer engagement.
Key enablers of this shift include:
- Automated, contextual data extraction: With AI-driven tagging and natural language processing, historical records, documents, images, and videos can be annotated with rich, searchable descriptors.
- Granular restore capabilities: Instead of quarantining an entire dataset, companies can surgically retrieve individual files, transactions, tables or media clips in seconds, without disrupting wider datasets.
- Seamless integration into analytics pipelines: Once backups are searchable and queryable, they can feed directly into AI training datasets, real-time dashboards, and trend analysis workflows.
The impact is transformative. A bank, for example, could train fraud detection algorithms on a decade’s worth of long-static transactional data, spotting anomalies invisible in smaller samples. A healthcare provider could similarly retrieve all patient cases matching a specific genetic marker to support research, or an entertainment company could surface historical audience sentiment data to guide content production.
What was once “dead data” becomes an ever-growing strategic asset. Instead of a cost center, backups evolve into a competitive advantage – fueling innovation across industries.
Mining “Dead Data” for Business Potential
Fortunately, the status quo is shifting. Modern storage systems can already incorporate object- and topic-based storage, automated indexing, and contextual metadata extraction to make archives instantly searchable and business-ready.
For example, Google Cloud has been working with major manufacturers and automotive companies like Ford and Kyocera to connect historically siloed assets, process and standardize data, and improve visibility from the factory floor to the cloud. Financial institutions, which accumulate petabytes of transactional and client interaction data, are eager to access this gold-mine for training finance-specific AI models, highlighting how valuable deep historical data has become.
Even in media and entertainment, the use cases hold astounding potential. Take Netflix, for instance, whose spending on both original and licensed content is set to hit $18 Billion this year. In other words, Netflix is sitting on an Everest of backed-up data, media, metadata, video tagging info, and more, all of which must then be refracted through a slew of regional compliance regulations, numerous accessibility standards, and a variety of differing cloud providers. Scouring such a staggering amount of content in a single backup recovery snapshot is simply unfeasible. Now, imagine instead how much easier it would be to sift through the data with granular restore capabilities and instant searchability.
That’s exactly what the data lake shift enables.
The proof is in the outputting: with the right tools and the right strategic mindset, backup storage becomes a creative, value-add engine, not just an insurance policy.
Better Backups Mean Better Business Outcomes
Enterprises today are defined by data and speed. Legacy backup systems are holding teams back on both fronts.
Backups shouldn’t be treated as vaults or as a worst-case survival mechanism. Instead, they should be engines for growth, creativity, and competitive advantage, and new tech solutions are poised to enable this transition. The companies that modernize their backup architecture today will be the ones fueling tomorrow’s breakthroughs, in finance, healthcare, media, and beyond.












