Blog Intelligence

Google Transparency Report: 14.5 billion takedowns and what's actually in them

The largest single takedown dataset ever published runs at over 100 million URL requests per week. What's in the data, who is filing, and why the dataset froze for five months in 2025.

DP

Adrià Pérez

· 10 min read

Google's Transparency Report has been publishing DMCA takedown statistics since 2011. As of September 2025, the cumulative count crossed 14.5 billion URL takedown requests. About 5 billion arrived in 2025 alone — roughly 14 million per day, 100 million per week.

This is the single largest publicly accessible takedown dataset on the internet. It is also the one that gets the least serious analytical attention, partly because Google does not offer an API and partly because the bulk download tooling is fragile.

This post is a reference for what is actually in the dataset, who is doing the filing, what the headline numbers tell us, and why the dataset went silent for five months in 2025.

What is in the data

The Google Transparency Report DMCA section publishes:

  • Total URL requests received per day, week, month.
  • URLs delisted (where Google acted on the request).
  • URLs not delisted (where Google declined, typically because the URL was already removed, was duplicative, or did not match the targeted material).
  • Per-sender breakdowns: company name, total requests submitted, success rate.
  • Per-principal (copyright owner) breakdowns.
  • Per-domain breakdowns of the most-targeted domains.
  • Bulk CSV download of the underlying URL-level data (when available).

The granularity is URL-level. For each request, you can see the sender, the principal they claim to represent, the targeted URL, the domain, and the disposition.

Who is doing the filing

The concentration is striking. Link-Busters BV, a Dutch anti-piracy firm, alone is responsible for roughly 30% of all DMCA notices Google has received since 2012, per TorrentFreak's reporting on the September 2025 data update. Link-Busters represents major book publishers — Penguin Random House, HarperCollins, Hachette — and the bulk of recent volume is driven by book-piracy enforcement.

Other heavy senders include:
- MUSO (anti-piracy intelligence and takedowns)
- Topple Track
- Web Sheriff
- Rivendell
- MarkMonitor (corporate brand protection at scale)
- Various individual authors and small agencies representing photography, music, and video clients

The top 10 senders account for roughly 60% of total volume. The long tail is enormous: hundreds of thousands of distinct senders have filed at least one request.

What the headline numbers actually tell us

A few patterns worth noting:

Volume is dominated by automation

Of the ~5 billion requests Google received in 2025, the vast majority were generated by automated crawlers operated by the heavy-volume senders. Link-Busters and similar firms run continuous crawlers against piracy sites, file dictionaries, and discussion forums; matches generate notices; notices submit. This is industrial-scale enforcement.

The not-delisted rate is informative

Google delists in roughly 95% of cases. The 5% not-delisted often reflects: URLs already removed before the request arrived, duplicate requests, requests that failed Google's basic validity check, or requests where Google's reviewers determined the underlying claim was insufficient. The 5% is not a quality-control filter; it is closer to a triage artifact.

Sender concentration is a feature, not a bug

The fact that one company (Link-Busters) generates 30% of volume reflects the economics of anti-piracy enforcement: scale matters, automation matters, and a small number of well-resourced firms can operate the entire enforcement machinery for a content-industry segment.

Per-domain concentration is also high

A handful of pirate-content domains absorb most of the takedown volume. Google's per-domain reporting page surfaces these. Domains in the top 100 are typically large file-sharing or streaming sites; the long tail is small mirrors and individual user pages.

The 2025 freeze

From mid-April to mid-September 2025, the Google Transparency Report stopped updating. No public explanation. The bulk download archives stopped refreshing. The web UI continued to work but showed stale data.

TorrentFreak covered this in real time. The freeze ended around September 18, 2025 with a backfilled update. Anyone who built a workflow that depended on continuous updates from Google Transparency was either rebuilding their pipeline manually for five months or had a redundant source.

The freeze is the canonical reason Counterspine (and any serious takedown-monitoring product) ingests from at least three sources, not one. The other two — Lumen and the EU DSA Database — kept updating throughout. A workflow that depends on a single Google pipeline is a workflow that breaks every time Google has an unannounced production incident.

How to query Google Transparency Report

Three methods, each with limits:

1. The web UI

At transparencyreport.google.com/copyright/ you get search by sender, principal, and domain. Pagination is slow, faceting is limited, and the data is summary-level.

2. Bulk CSV download

Google publishes a compressed archive (~9.6 GB at last check) of URL-level data. Updated multiple times per week when the pipeline is healthy. Updated... not at all during the 2025 freeze.

The schema includes:

request_id, sender_name, principal_name, recipient_name,
date_received, target_url, target_domain, action_taken

3. Scraping

There is no official API. The web UI is JavaScript-rendered, so basic curl-based scraping does not work. You need a headless browser stack (Playwright or Puppeteer) plus a robust proxy strategy. The multi-provider failover stack we use in Counterspine — Cloudflare Worker → BrightData → Oxylabs → DataForSEO — handles the throttling, IP rotation, and JavaScript rendering reliably enough to ingest the bulk dataset on a regular cadence.

Google does not block Transparency Report scraping aggressively, but it does throttle. Polite scraping (1 request per second per IP, exponential backoff on 429s, distributed across proxies) is the only sustainable approach.

What you can actually do with the data

Audit your own domains

Search for any domain you operate. You will see every DMCA request Google received targeting it, the senders behind them, and the action taken. This is invaluable for SEO operators, publishers, and anyone doing reputation management.

Track competitors

If you operate in iGaming, affiliate marketing, or any space where DMCA-as-negative-SEO is documented, you can watch competitors' DMCA exposure as a proxy for their content strategy and operational stability.

Investigate senders

The per-sender pages on Google Transparency are the most accessible window into a sender's enforcement profile. Volume, success rate, principal portfolio. Cross-reference with Lumen for body text and detailed metadata.

Time-series analysis

The bulk download lets you do per-domain, per-sender, per-day analysis. Spikes in filing activity often correlate with content publication, competitor launches, or coordinated abuse campaigns.

What you cannot do with Google Transparency

  • Cannot get notice body text. Google publishes the URL-level data; the body of each notice is not in the export. For body text you need Lumen.
  • Cannot get a stable API contract. Google has rebuilt the pipeline at least three times since 2011; expect more rebuilds. Your scraping code will break.
  • Cannot get real-time updates. The publication cadence is roughly weekly when healthy, with occasional multi-month freezes.
  • Cannot rely on completeness. Some submissions never appear in Transparency Report, particularly those from internal Google enforcement processes (Content ID, YouTube Trusted Copyright Removal Tool).

Counterspine and Google Transparency

We ingest the bulk CSV when available, scrape the web UI when not, dedupe against our Lumen and DSA Database ingest via URL fingerprinting, and surface the results in dashboards, alerts, and the public lookup tool.

The 5 billion 2025 requests are searchable inside Counterspine alongside the Lumen and DSA data. You can ask "what did Google receive about my domain in the last 30 days?" and get an answer that combines all three sources, deduped, with sender clustering and abuse-pattern detection applied.

The pricing model that makes this sustainable: customers pay subscriptions, we pay for the proxy infrastructure, the data stays public-domain CC0, the scraping stays polite. Nobody pays Google; nobody pays Lumen; nobody pays the EU.

TL;DR

Google Transparency Report is the largest single window into the modern takedown record — 14.5 billion URLs since 2011, 5 billion in 2025 alone. It is concentrated in a few heavy senders, heavily automated, occasionally unreliable (the 2025 freeze), and requires headless-browser scraping if you want bulk access without depending on the unstable CSV export.

It is also one of three sources that, combined with Lumen and the EU DSA Database, gives you a defensible monitoring foundation. We built Counterspine on that combination. If you'd rather not build the pipelines yourself, start your free trial.

Take back control
of your takedown surface

Set up your first watched domain in 60 seconds. See every notice ever filed against it. Catch the next one in under 5 minutes.