Blog Intelligence

Lumen Database explained: what's in it, how to query it, and why it matters

67 million notices. 10 billion URLs. Two hundred thousand new entries every week. Lumen is the largest public record of internet takedown notices ever assembled — and most people don't know it exists. Here's how it works.

DP

Adrià Pérez

· 11 min read

If you have ever filed a DMCA notice against a Google Search result, a Reddit post, a Wikipedia article, or a Vimeo video, your notice is probably in Lumen Database. So is every back-dated abuse notice we caught last year, every reputation-management takedown against critical journalism, every MFA platform's Article 16 statement of reasons, and a remarkable share of the modern internet's unofficial paper trail.

Lumen is the most important dataset in the takedown space. It is also one of the least well-known. This post explains what is in it, how to query it, what it is good for, what its limits are, and what is changing in 2026.

What is Lumen

Lumen is a research database originally launched in 2002 as Chilling Effects. It is now hosted at Harvard Law School Library (the Berkman Klein Center handed off the day-to-day operation in late 2025). The mission is simple and important:

To collect and analyze legal complaints and requests for removal of online materials, helping internet users to know their rights and understand the law.

Notice senders agree, as part of submitting takedowns to participating recipients, that the notice will be forwarded to Lumen. The result is a continuously growing public corpus.

Scale

  • 67M+ notices total as of early 2026
  • 10B+ URLs referenced
  • 200,000+ new notices per week added
  • Recipients include Google, Meta, Reddit, Wikipedia, Vimeo, Medium, GitHub, Twitter/X, YouTube, and dozens of smaller platforms

Data

Each notice includes:
- Sender name, organization, country
- Principal (rights-holder) name
- Recipient (platform)
- Notice type (DMCA, defamation, trademark, court order, other)
- Date received
- Body of the notice (with PII redacted in the public view)
- List of targeted URLs
- Action taken by the recipient (removed, not removed, partially removed)
- Language and jurisdiction metadata

Licensing

Lumen data is CC0 (public domain). Anyone can use it. The catch is access: bulk download is not offered, and the API is gated to credentialed researchers.

How to query Lumen

Three ways:

1. The web UI

At lumendatabase.org you can search by keyword, sender, principal, recipient, or date. The UI is designed for occasional research, not workflows. Pagination is slow, faceting is limited, and the public view masks PII.

2. The researcher API

A REST API at /notices/search returns JSON with notice objects. Authentication is via an X-Authentication-Token header. Rate limits apply.

To get a token, email [email protected] with a use case. Important: Lumen explicitly grants credentials to journalists, academics, and policy researchers. Commercial uses are evaluated case by case, and Lumen reserves the right to revoke access. As of 2025, there is at least one documented case of denial for a journalist investigating a prolific takedown sender (covered by hackingbutlegal.com).

The API is RESTful and well-documented at github.com/berkmancenter/lumendatabase. A simple Ruby client by Thomas Leeper exists at github.com/leeper/lumendb. The endpoints you'll use most:

GET /notices/search?per_page=50&page=1&sort_by=date_received+desc
GET /notices/search?date_received_facet=>=2026-01-01
GET /notices/:id

3. Periodic bulk dumps

Lumen does not currently offer bulk downloads. There used to be quarterly dumps from the Chilling Effects era; those have not been continued.

What Lumen is good for

Forensic investigation of a single sender. Pull every notice ever filed by Link-Busters BV and you have the basis for understanding their filing patterns, their principal portfolio, and their abuse profile.

Auditing your own domain. Search for your domain in targeted URLs and you have the public record of every takedown ever filed against you, the senders behind them, and the recipients that acted on them.

Pattern detection across senders. Cluster notices by body fingerprint, sender email patterns, and timing, and you can surface mass-filing operations like the Nguyen/Pham case Google sued in 2023.

Legal research. Cited in dozens of academic papers and court briefs, including Lenz v. Universal Music (9th Cir. 2016) and the U.S. Copyright Office's 2020 §512 study.

Compliance auditing. EU-based platforms can use Lumen as a cross-reference against their own DSA Database submissions.

What Lumen is not good for

Real-time alerting. Submission lag varies by recipient. Google forwards within hours; smaller recipients sometimes take days. If you need real-time monitoring, you also need to ingest from Google Transparency Report and the EU DSA Database.

Comprehensive coverage. Many platforms do not forward notices to Lumen at all. You will not find Cloudflare's full notice list, AWS's full counter-notice activity, or YouTube's Content ID claims. Lumen is large but not exhaustive.

Building a commercial workflow. Lumen's Terms of Use explicitly state: "the database is not intended to be... part of the work-flow of any particular business model." Take that seriously. Build redundant pipelines (DSA Database, Google Transparency Report, scraped USCO directory) so your business is not at the mercy of a Lumen API key.

PII access. The public view redacts names, emails, and addresses. The full unredacted view is restricted to credentialed researchers and is governed by an additional data-use agreement.

What's changing in 2026

A few moving pieces worth tracking:

  • Tighter API access. Effective April 26, 2026, Lumen no longer accepts non-expiring API keys or keys without special characters. Existing keys will be rotated. This is part of a broader trend of API hardening; expect rate limits and use-case scrutiny to increase.
  • Migration to Harvard Law School Library. The operational handoff from Berkman Klein in late 2025 has been smooth, but governance questions remain. Subscribe to the Lumen blog for updates.
  • Increased commercial-use scrutiny. The denial of researcher credentials to certain investigations (per hackingbutlegal.com) suggests Lumen is becoming more selective. If you are building on top of Lumen for a commercial product, plan for the possibility that your access changes.

How Counterspine uses Lumen

We ingest from the Lumen researcher API on a continuous 30-minute cycle, normalize the JSON into our canonical Notice schema, dedupe across sources via URL fingerprinting, cluster sender aliases via our Senders::IdentityResolver, and run the abuse-pattern detectors before serving the data to dashboards and alerts.

We also pull from Google Transparency Report (bulk CSV when available, scraped multi-provider when not) and the EU DSA Transparency Database (daily JSONL dumps). The combination produces a denser, more reliable dataset than any single source. If Lumen revokes our credentials tomorrow, the product still works.

We are credentialed under journalistic-research framing, and our use of the data is consistent with Lumen's stated mission: making the takedown record legible to people who need to defend themselves against it.

How to get the most out of Lumen yourself

  • Start with the web UI for one-off questions. It is good enough for "has this sender ever filed against me?"
  • Apply for API credentials if you are doing real research. Be specific about your use case in the application; vague applications are increasingly denied.
  • Always have a backup pipeline. Lumen + Google Transparency Report + DSA Database is the trio. None alone is sufficient.
  • Respect the rate limits. Lumen is run on a small budget; thrashing the API is uncool and gets keys revoked.
  • Read the Terms of Use. Section 1.4 ("Limitations on Use") matters more than people think.

TL;DR

Lumen is the foundation of every serious DMCA intelligence workflow. It is huge, it is public, it is CC0, and it is increasingly gated. If you are a journalist or academic doing one-off research, apply for credentials. If you are building a workflow, ingest from Lumen plus at least two other public sources and keep your pipelines redundant.

If you want a unified, queryable, alert-driven view across all six public sources without building the pipelines yourself, start your free trial.

Take back control
of your takedown surface

Set up your first watched domain in 60 seconds. See every notice ever filed against it. Catch the next one in under 5 minutes.