AI Central

AI Central

What Counts as AI-Generated?

The EU’s proposed AI transparency code still cannot define its central terms.

Jordamøn's avatar
Jordamøn
Mar 12, 2026
∙ Paid

Last week, the European Commission published the second draft of its Code of Practice on marking and labelling AI-generated content, one of the final implementation steps before Article 50 of the EU AI Act takes effect on August 2. The code lays out how providers and deployers of generative AI systems should disclose that content was made or altered by a machine: watermarks, metadata, a proposed EU icon, and detection tools offered free of charge.

The document is more streamlined than its predecessor, published in December. It also reveals, more clearly than the first draft did, the two problems that the regulatory framework cannot resolve on its current timeline. The first is definitional: at what point does a piece of content become “AI-generated” in a way that triggers the obligation to say so? The second is infrastructural: the marking technology that the code requires does not exist at scale. Both problems were present in the first draft. The second draft makes them harder to ignore.

The definition that disappeared

The structural notes for Section 2 of the second draft contain a disclosure that received almost no coverage: the Commission completely removed a taxonomy from the first draft that had attempted to distinguish between “AI-generated” and “AI-assisted” content. The first draft treated these as separate categories with different disclosure requirements. The second draft abandoned the distinction entirely.

The Commission’s language frames this removal as simplification. The second draft “has been streamlined and simplified,” providing “greater flexibility” and “reducing the compliance burden.” A less charitable reading is that the Commission attempted to define the most important boundary in the entire transparency regime, failed, and removed the definition rather than shipping one that would not hold.

The underlying difficulty is genuine. A photographer who uses an AI model to remove a power line from a landscape image and a propagandist who generates a fabricated image of a political figure from a text prompt are both using generative AI systems. The same is true of a journalist who asks a language model to restructure a paragraph and a marketing agency that has a model write the entire press release. The first draft tried to draw a line between these cases. The second draft replaced the taxonomy with design and placement requirements for labels, icons, and disclaimers, with specific regimes for artistic and creative works and an exception for text that has undergone genuine human editorial review. The question of what counts as AI-generated was left to deployers, guided by the legal text of Article 50 and whatever supplementary guidelines the Commission publishes later this year.

The result is that the central transparency question in the AI Act (when does the label appear?) remains substantially unanswered five months before enforcement begins. The Commission has delegated the determination to the entities that the regulation is meant to govern, and has provided no criteria for how the judgment should be made.

The infrastructure that does not exist

Even setting aside the definitional gap, the marking requirements themselves face a readiness problem that the compliance timeline does little to address.

The code calls for a multilayered approach: secured metadata, interwoven watermarking, and optional fingerprinting, all designed to ensure that provenance information survives the processing, compression, and redistribution that content undergoes as it moves across the internet. The technical standard that most closely matches this architecture is C2PA, the Content Credentials specification developed by a coalition that includes Adobe, Microsoft, and the BBC, now housed under the Linux Foundation. C2PA attaches cryptographically signed metadata to a file, recording who created it, what tools were used, and what modifications were made. If anyone tampers with the metadata, the digital signature breaks.

In controlled conditions, the approach works. In the actual ecosystem, it barely functions. A Microsoft research report published in February (notable because Microsoft co-founded C2PA in 2021) concluded that no single authentication method can reliably distinguish AI-generated content from authentic media at scale. The C2PA ecosystem now counts over 6,000 members, but deployment in newsroom workflows and consumer platforms remains limited. Most social media platforms still strip embedded metadata during upload, removing C2PA manifests before audiences ever see them. A screenshot defeats the entire chain. High-confidence authentication, the report found, requires secure hardware enclaves built into cameras and capture devices, technology that most consumer equipment does not yet include.

The report also introduced the concept of “sociotechnical provenance attacks,” in which bad actors do not merely strip or forge metadata but invert the signal, making authentic content appear synthetic and synthetic content appear authentic. Visible watermarks without cryptographic backing, the researchers warned, can make such attacks easier by training audiences to trust signals that can be forged.

A separate academic study that audited 50 generative AI systems found that invisible watermarking had been implemented in only eight. Of those providers that had adopted C2PA, several showed inconsistent implementation across their product lines. Open-source models present an additional challenge. The code states that such models should implement structural marking techniques encoded in the weights during training, but once the weights are released, the provider has limited ability to enforce downstream marking.

What arrives in August

These two gaps compound each other. The marking infrastructure is incomplete, but at least its requirements are concrete: implement watermarking, offer detection tools, preserve metadata through the distribution chain. A provider with sufficient engineering resources can comply, even if the broader ecosystem does not yet support full verification. The definitional gap is worse, because it leaves compliance indeterminate. A deployer operating an AI-assisted workflow in which some content is substantially human-authored and some is substantially machine-generated has no regulatory guidance on where the label attaches. The code provides the technical scaffolding for marking without resolving what the scaffolding is meant to mark.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Infogalactic AG · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture