Thought Leaders

The European Commission’s New GPAI Template – What Does This Mean for AI Training?

Published August 19, 2025

Viktorija Lapenyte, Head of Product Legal Counsel at Oxylabs

In July, the European Commission (EC) released a new general-purpose artificial intelligence (GPAI) template. This means that AI providers must disclose the content inputted into the models to train them. This comes after months and months of headlines regarding creators alleging content was used without consent to train AI.

With this new template, the EU has made its position clear: transparency is now non-negotiable. Black box training, where something is created without revealing its inner workings, will not be an option for AI developers. This marks a significant shift as operating in Europe will now require full visibility into model inputs and training data provenance, forcing a re-evaluation of data gathering and use.

Many have pointed out the stark difference between this and the recently released US AI Action Plan, which focuses heavily on deregulation. Like with any new law or regulation, businesses now have to take stock and assess exactly how the GPAI template will impact operations.

If they are operating across regions, they will be doing the same with the US AI Action Plan, further confusing things. Because of the complex nature of these and the fact that regulating AI development in this way is uncharted territory, outputs by developers are likely to greatly differ.

Dissecting the General-Purpose AI Model Template

In July this year, the European Commission published a mandatory template for GPAI providers so they can publish a public summary of the data used to train their models. As part of the EU AI Act, providers must disclose data categories such as publicly available datasets, private licensed data, scraped web content, user data, and synthetic data. The aim is to enable copyright holders, users, and downstream developers to exercise their legal rights under EU law.

GPTs are trained with large quantities of data; however, in the current market, there is limited information available regarding the origin of this data. The public summary that this template sets out will provide a comprehensive overview of the data used to train a model, list the main data collections, and explain other sources used.

Compare and Contrast, US AI Action Plan

In comparison, the US is adamant it will win the AI race and maintain its competitive edge over China, as the Trump administration announced its AI Action Plan earlier this summer. This new AI framework aims to accelerate the construction of energy-intensive data centres that power AI systems by easing environmental regulations. At the same time, it seeks to ramp up the global export of American AI technologies. Featuring 90 recommendations, the plan reflects growing efforts by the US to stay ahead of its global competitors.

The plan is built around three core pillars – accelerating innovation, building America’s AI infrastructure, and fostering leadership in international AI diplomacy and security.

As part of this, a key takeaway from the plan highlighted America’s ‘open-source’ push to fuel both innovation and accessibility. Similarly, the plan highlights how the US government will ‘lead by example’ when it comes to AI growth – through training, talent exchanges, and expanding adoption across industries.

With this plan, the US aims to streamline all its current technology regulations, particularly environmental ones, to ensure legislation isn’t slowing growth, while encouraging wider international distribution of US AI software and hardware. This ‘anti-regulatory’ approach marks a clear shift from earlier frameworks centred on ethics, transparency, and responsible innovation – instead moving towards a more aggressive ‘innovation first’ action plan.

The Missing Piece

It’s worth taking a step back at this stage and considering if these acts, although different, could suffer from the same flaws that will cause developers to see a lack of value in adhering to them. The EU and US approaches leave a critical gap around intellectual property in AI training datasets. The EU AI Act mandates training data summaries and a copyright compliance policy, but it does not establish a scalable framework for identifying or licensing copyrighted works.

In the US, no specific rules exist at all – leaving AI companies to navigate an evolving legal framework shaped by court rulings and ongoing disputes with rights holders. Beyond the legal text, what’s missing is the practical side; neither approach sets out workable, industry-wide methods for detecting protected content at scale, verifying lawful use, or streamlining licensing. Until such solutions are defined, uncertainty around copyright in AI training will remain a significant challenge for the industry.

The Hidden Cost of Businesses Skipping AI Traceability

Despite some of the flaws in these regulations, it will be assumed that they will cause AI developers to become highly focused on how to stay afloat from a legal perspective – but this isn’t always the case. In fact, the real divide in AI right now isn’t between EU and US regulation, but between companies that are investing in traceability today and those gambling they won’t have to. This is a repetition of what we saw years ago with the implementation of the General Data Protection Regulation (GDPR) – companies that built privacy-by-design early not only avoided fines but gained consumer trust and smoother access to other markets that later mirrored GDPR standards.

The same pattern may be emerging with AI. Traceability of training data and model decisions will most likely become a global baseline, and companies that delay will have to redesign their systems in the future. Going back to add documentation, provenance tracking, and audit features to an existing system is far more expensive and complex than building them in from the start, taking the focus away from more ROI focused built-outs the company wants to complete.

In other words, traceability and transparency aren’t optional add-ons; they must be embedded into AI systems from day one. Businesses that treat them as afterthoughts risk stalling innovation, facing regulatory backlash, and losing the race indefinitely.

Ethical AI Needs Global Unity

From a macro perspective, these polarised approaches create a real problem for global businesses. Companies in lighter-touch markets like the U.S. can scale faster in the short term, but when they decide to enter the EU, they face a compliance wall: the AI Act’s traceability and documentation rules require capabilities they never built.

Retrofitting provenance tracking, documentation, and audit features into an existing system is costly, slow, and disruptive, especially because traceability is one of the most resource-intensive parts of compliance. It’s the same pattern we saw with GDPR, where latecomers to privacy-by-design struggled with expensive overhauls and delayed market access, while early movers gained a lasting advantage.

Related Topics:europe european european commission GPAI Oxylabs

Viktorija Lapenyte, Head of Product Legal Counsel at Oxylabs

Viktorija Lapenyte is the Head of Product Legal Counsel at Oxylabs. With over a decade of legal experience in the IT sector, Viktorija Lapėnytė has developed deep expertise in navigating complex business and regulatory challenges as an in-house legal counsel. Today, Viktorija is the Head of Product Legal Counsel at Oxylabs, a market-leading web intelligence collection platform. Viktorija's team specializes in the legal complexities of emerging data technologies, from compliance and regulatory risk management to data privacy and industry-wide discussions on responsible data acquisition.

Unite.AI

The European Commission’s New GPAI Template – What Does This Mean for AI Training?

Dissecting the General-Purpose AI Model Template

Compare and Contrast, US AI Action Plan

The Missing Piece

The Hidden Cost of Businesses Skipping AI Traceability

Ethical AI Needs Global Unity

You may like