Methodology | Banned Index

This page is our credibility anchor. It documents exactly how Banned Index collects, verifies, stores, and publishes data. Journalists, researchers, and grant reviewers should be able to read this page and understand every step between a book being challenged in a school district and that information appearing on our website.

Standing rules

A handful of rules apply to every analysis, page, and published figure on this site. They are enforced at the data-model level where possible and documented here so external users can apply the same constraints to their own work with our CC BY 4.0 data.

Always segment statistics by ban category. Every ban belongs to exactly one of five categories: school, prison, public library, platform, or higher education. We do not publish an aggregate number without the breakdown beside it. Example: the overall reinstatement rate of 3.0% is misleading — prison is 0%, school is 0.9%, public library is 97.6%. Each category has different enforcement mechanisms and different outcomes.
Classifications use public self-identification or public catalog data only.Author demographics (is the author a person of color? is the author LGBTQ?) and book genres are never inferred. When a classification is absent, it means “not publicly identified” — it does NOT mean “assumed white,” “assumed straight,” or “assumed ungenred.”
Access restrictions are tracked separately from content bans. Defunding, library budget cuts, librarian eliminations, library closures, and protective policies are tracked as access restrictions, not as content bans. Published analyses never mix these two counts.
Corrections go to an admin edit log, never to public timelines. Real historical events (a challenge filed, a board vote, a reinstatement) live on the public ban-event or law-event timelines. Typo fixes, metadata backfills, and admin edits are logged separately and are not shown on public detail pages.

Three principles

Every record must be sourceable.No inferred bans. No aggregated counts without provenance. If a book is marked banned in a district, we have a source URL (article, PDF, meeting minutes, court filing) that proves it, and that URL is preserved in the Internet Archive’s Wayback Machine so it survives link rot.
Methodology is public and current. This page is kept in sync with the actual pipeline code. When we change how data is collected, normalized, or represented, this page is updated in the same release.
The project should outlive any vendor. Data is licensed CC BY 4.0 so anyone can republish, cite, and build on it. Weekly database snapshots preserve the full dataset independently of any single service provider, and the project’s architecture is documented so a future maintainer can take over if needed.

What counts as a “ban”

Banned Index uses the term “ban” as a structural label covering the full range of actions a jurisdiction can take against a book. Every record has a status that distinguishes among them:

Status	Meaning
Reported	A concern was raised but no formal action taken yet
Under consideration	Formally under review by a committee or board
Challenged	Formal challenge filed; decision pending
Temporarily removed	Pulled pending review; may return to shelves or move to restricted circulation
Removed	Fully withdrawn from the collection by the jurisdiction
Upheld	Challenge rejected; book remains available
Reinstated	Previously removed; now returned to shelves
Withdrawn	The challenge itself was withdrawn before resolution
Under legal challenge	Subject to active litigation (injunction, appeal)
Moved to Adult Section	The book remains in the collection but has been relocated from a juvenile/children’s section to the adult section as a restriction. It is no longer accessible to younger patrons browsing the kids’ area.
Moved to Teen Section	Relocated from juvenile/children’s to the teen section. A milder age-gate than the adult section but still out of the children’s area.
Moved to Young Adult Section	Relocated from juvenile/children’s to the young-adult section. The mildest of the three relocation restrictions.

The three “Moved to” statuses are restrictions short of removal. The book stays in the collection but has been relocated to a section outside the area where its publisher intended it to be discovered. We track this as a distinct class of ban because it is a real reduction in access — a picture book intended for ages 4-8, relocated to the adult section, will no longer reach its intended audience even though it has not been removed. Aggregated counts treat these as bans.

Aggregated counts on the site (homepage, state pages, search) include every status above. If you need a narrower count — for example “only fully removed books” — use the search and filter tools on the site. Broader programmatic data access (a public API, downloadable snapshots under the CC BY 4.0 license) is on the roadmap.

Ban types

Every ban record is assigned to one of five categories. The enforcement mechanism, the actor with authority, and the typical outcome differ substantially across categories, so analyses on this site are always segmented by this field (see Standing Rules).

Category	Scope
School	K–12 school district decisions: library removals, classroom-set removals, curriculum removals framed as removals
Public library	City, county, or regional public library system decisions
Prison	State or federal Department of Corrections mail-room and library restrictions
Platform	Confirmed, content-based removals from retail, audio, or lending platforms (see Platform monitoring)
Higher education	University course restrictions, faculty teaching bans, and syllabus censorship. First entry is the Texas A&M Plato case. Higher-ed is a distinct category because the actors (boards of regents, state legislatures acting on universities, administrators) and the academic-freedom context differ from K–12.

Challenger group “flags” vs. bans

A book being flagged by a challenger group is NOT the same as a ban. We track challenger-group flags as a distinct signal and never roll them into ban counts, status breakdowns, or any aggregate that represents jurisdictional enforcement.

Several organizations publish lists of books they oppose or want removed from libraries — for example, BookLooks.org, chapters of Moms for Liberty, Brave Books, No Left Turn in Education, and Pavement Education Project. These lists are valuable context because:

They reveal which books are likely to be challenged next in districts where these groups are active
They help journalists and researchers trace the lineage of a specific challenge back to its origin list
They document the broader ecosystem of book-challenge activism that exists outside of any one district’s library

But being on such a list is fundamentally different from being removed by a school board or library system. Many books on challenger lists have never been formally challenged in any district. Conversely, many books that have been banned in specific districts were not on any public challenger list beforehand — they were pulled via independent parent complaints, automated reviews, or direct board action.

In the database we store challenger flags separately from ban records. The two are never combined in ban counts. On book detail pages, challenger flags appear in a clearly-labeled “Also flagged by N challenger groups” section with an amber accent distinct from the red accents we use for confirmed bans. The distinction is enforced at the data model level, not just in the UI.

See the challenger groups index for the list of tracked organizations and their ideology tags. The database currently tracks 14 challenger groups.

Terminology note.We chose “challenger” deliberately because it ties to the standard librarian term “book challenge” used by PEN America and the American Library Association. It is plain-language clear about which side of the challenge these groups are on.

Individual ban records can also be linked to the specific challenger group that filed or organized the challenge. This is distinct from the general flag relationship (“this group has flagged this book somewhere”): a challenger-group link on a ban says “this group was the challenger on this specific ban in this specific jurisdiction.” Currently 122 ban records are linked to a challenger group this way (all from CDF-organized challenges at this time).

Data sources

We collect ban data from five categories of sources, in roughly this order of reliability:

Official government reports.The most authoritative source. Example: the Florida Department of Education’s annual “School District Reporting” required by FL Statute 1006.28(2). We have over 1,100 ban records from the 2022-2023 and 2023-2024 FL DoE reports, each linked to the original PDF and archived in the Wayback Machine.
Court filings and official legal documents. Lawsuits against school districts (e.g. PEN America vs. Escambia County), injunctions, rulings. When available, these are authoritative and well-cited.
News articles from established publishers. Local newspapers, regional public radio, national outlets. Each article is archived in the Wayback Machine on ingest.
School board meeting minutes and official statements. Primary-source documentation of challenges and decisions.
Advocacy organization trackers. PEN America, ALA Office for Intellectual Freedom, EveryLibrary Institute, Florida Freedom to Read Project, state-level tracking projects. These are listed separately and treated as secondary sources we triangulate against primary documentation where possible.

We do not treat anonymous social media posts, opinion columns without factual claims, or unverified rumors as sources for ban records.

The pipeline

Data moves through the system in five stages:

Stage 1 — Discovery

Three parallel mechanisms surface potential ban events:

Automated news monitoring runs daily via a GitHub Action. It searches Google News RSS, GDELT, and direct RSS feeds for a keyword set covering book bans, library challenges, legislation, and relevant organizations. New articles land in an internal review queue where they await human confirmation before publication.
Bulk data importsrun on-demand when a major dataset is released. Example workflows: the annual FL DoE reports, PEN America’s quarterly reports, state legislature websites.
Manual research tasksare queued into the same admin review queue with a special source label. These represent data we know exists but need human help to acquire (a specific district’s meeting archive, a hard-to-scrape PDF, etc.).

Stage 2 — Extraction and dedup

For news articles, Claude (Anthropic’s AI assistant) reads each article and extracts structured data: book title, author, jurisdiction, status, stated reason, reason categories, challenged by, relevant law. The extraction prompt requires Claude to quote exact phrases from the article that support each field.

Duplicate detection runs at three levels:

URL fingerprint — the same article discovered via multiple search queries is deduped before it reaches the admin queue
Title-hash canonical URL — the same story covered by multiple publishers (e.g. AP wire reprinted in 40 local papers) is grouped via a Claude-powered event grouping step
Book + jurisdiction fingerprint — when a ban record already exists for the same book in the same jurisdiction, the importer updates the existing record instead of creating a duplicate

Stage 3 — Admin review

Every automatically-surfaced article goes through a human review step before it becomes a published ban record. The admin can accept Claude’s extraction, edit it, add a new ban record, update an existing one, add a law event, or dismiss the article as irrelevant. Bulk data imports (e.g. the FL DoE PDFs) bypass individual article review because the source itself is already an authoritative primary document — we archive the PDF and cite it as the source for every record in one step.

Stage 4 — Source preservation

Every source URL we cite is automatically saved to the Internet Archive’s Wayback Machine via the Save Page Now API at the moment we import the record. The archived URL is stored alongside the original source URL. When a source article disappears, moves behind a paywall, or is edited after publication, the Wayback snapshot preserves what we cited.

Google News RSS stub URLs are decoded to their real destination URLs before archiving. Some domains (certain news sites, state legislature websites) block the Save Page Now endpoint — approximately 96% of our ban sources and 25% of our law sources are successfully archived today.

Stage 5 — Publication

Reviewed records appear immediately on the public website (homepage counts, book pages, law pages, state pages, search results). A public REST API is planned for external consumers. Weekly database snapshots are committed to the GitHub repository as JSON files so the complete database history is auditable and recoverable.

Laws tracking

Laws are tracked separately from bans because a law can exist and still not produce confirmed book removals in every affected jurisdiction. Legislation data is sourced from LegiScan under their CC BY 4.0 license, supplemented by official state legislature websites and news coverage.

When a ban is linked to a law, it may be linked to multiple laws. For example, every book on the Florida DoE’s “removed or discontinued” list is linked to both HB 1467 (the 2022 law that established the reporting requirement under FL Statute 1006.28(2)(e)) and HB 1069 (the 2023 law that expanded the objection framework). Both laws apply to those records, so both are shown.

Law detail pages distinguish between jurisdictions with confirmed removals (at least one documented ban in our database) and jurisdictions that are subject to the law but have no confirmed removals on record. Absence of evidence is not evidence of absence — many districts may be enforcing the law without generating the kind of public documentation we can find.

The database currently contains 142 laws with full legislative timelines.

Administrative rules

Administrative rules are tracked separately from bills: state-level rulemaking that implements book-ban legislation. These are NOT bills passed by the legislature; they are rules promulgated by state education boards, library commissions, or similar agencies under authority delegated by an enabling statute.

We separate administrative rules from laws because the procedural posture, public-comment window, and enforcement pathway are different. A law is what the legislature passed; an administrative rule is how an agency chose to operationalize it. Both can independently produce or constrain book removals.

The first entry is Utah Admin Code R277-628, adopted by the Utah State Board of Education to implement HB 29 and HB 374. Administrative rule records link back to the enabling laws they implement, so a book removal in Utah can be traced from the removal event to the rule to both underlying statutes.

Lawsuits and litigation

Lawsuits track active and resolved litigation related to book bans: constitutional challenges to state laws, First Amendment cases against school boards, and appeals in the federal circuit courts. Each case is linked to the laws it challenges or defends (many cases contest multiple statutes at once).

The database currently contains 6 lawsuits:

ACLU v. Mat-Su Borough School District (Alaska)
ACLU v. Elizabeth School District (Colorado)
Book People v. Wong (Texas; decided at the 5th Circuit)
Penguin Random House v. Escambia County School District (Florida)
PEN America v. Utah State Board of Education
Iowa Safe Schools v. Reynolds (Iowa)

Ban records with status “Under legal challenge” are eligible for linkage to lawsuit records when the ban is named in or directly affected by the litigation.

Access restrictions (soft bans and protective policies)

Access restrictions are tracked separately from bans and never mixed into ban counts. There is no specific title being challenged — the mechanism is structural (defunding, closures, staff cuts, or policy protection), not content-based.

Access restrictions capture actions that affect whether library materials are available to a community without targeting any specific title. Examples include:

State-level defunding of library programs (e.g. the Dolly Parton Imagination Library defunding in four states)
Library budget cuts large enough to force collection reductions (e.g. Grossmont Union HSD)
Librarian eliminations that leave a district without a certified professional to manage the collection
Library closures, either outright or through consolidation
Protective policies— affirmative anti-ban resolutions, local ordinances defending library access, etc. (e.g. Central York SD’s anti-ban policy)

Each record is tagged with a direction:

Restrictive— the action reduces access (defunding, cuts, closures)
Protective— the action defends or expands access (resolutions against bans, ordinances mandating intellectual-freedom training, etc.)

Protective policies share a dataset with restrictive actions because they operate through the same structural lever — institutional policy affecting library access as a whole — and it is useful for researchers to see them side by side. The direction tag keeps the two cleanly separable for any analysis. The database currently contains 6 access restriction records.

State data architecture

Every state in the database follows the same canonical structure to ensure consistency and cross-state comparability:

Bans are district-level only.There are no “State of Florida” aggregate ban rows. The laws dataset and its book links carry the state-level story; the bans dataset is the district-level ledger. The same book banned in five districts equals five ban rows.
Every banned book links to its authorizing law(s). A Florida book banned under the 2022–2023 framework is linked to both HB 1467 and HB 1069. A Utah book is linked to both HB 29 and HB 374. This allows users to trace from any individual removal to the legislation that enabled it.
Law jurisdiction coverage distinguishes between districts with confirmed removals and districts that are subject to a law but have no documented bans yet. Absence of evidence is not evidence of absence.
District naming is consistent within each state. Florida districts follow the pattern “Clay County School District, FL”; Utah follows “Davis School District, UT”; Texas follows “Katy ISD, TX”. The two-letter state suffix is always included for disambiguation.
Original source files are archived. Every PDF, spreadsheet, or CSV from an authoritative publisher is saved to the repository, alongside the parsed JSON intermediate and the import script that produced the database rows.

We currently track legislation from 42 jurisdictions (38 states plus federal) via LegiScan and manual research, covering both restrictive laws (enabling book bans) and protective laws (defending library access). The database contains 142 lawswith full legislative timelines sourced from LegiScan’s CC BY 4.0-licensed data. Database totals as of the last update: 15,005 books, 15,608 bans, 14 challenger groups, 142 laws, 6 lawsuits, and 6 access restrictions.

Challenger group ratings

Some challenger groups assign ratings to the books they flag. These ratings reflect the group’s own assessment, not ours. We store and display them for research context.

Group	Rating Scale	How We Source It
BookLooks.org	1 (All Ages) through 5 (Adults Only), based on content analysis of text excerpts and illustrations	PDF book reports archived from the Wayback Machine (site shut down ~2024). Each report is individually archived with its original URL preserved.
No Left Turn in Education	1 (Appropriate) through 4 (Adults Only), based on ratedbooks.org rubric	Scraped from ratedbooks.org via Playwright. Each flag links to the source page.

Ratings are stored alongside the reason categories (sexual content, LGBTQ+ themes, racial themes, etc.) extracted from the group’s own classification. We never editorialize these — they are recorded verbatim from the source.

Platform availability monitoring

In addition to school district and library bans, Banned Index monitors whether frequently-banned books remain available on major retail and lending platforms:

Retail:Amazon, Barnes & Noble, Bookshop.org, Apple Books, Google Play Books, Kobo
Audio: Audible, Kindle Unlimited
Libraries: OverDrive/Libby, Hoopla, Scribd

The monitoring script checks each platform for ISBN-based availability. A platform check only flags a potential removal when a book transitions from available to unavailable — first-time checks that find a book absent are not flagged, since the book may never have been listed there.

Human review is required before any platform availability change becomes a ban record. Platform unavailability has many benign causes (temporary inventory issues, regional licensing, publisher decisions). Only confirmed, deliberate removals linked to content objections are recorded as platform bans. The admin review queue surfaces flagged transitions with Confirm / False Positive / Temporary actions.

Book classifications

Book records carry optional classification fields that let us answer questions like “what share of sexual-content bans target young-adult fiction vs. adult romance?” or “are LGBTQ authors over-represented in specific ban waves?” Every classification field is governed by the Standing Rule on classifications: public self-identification or public catalog data only, and NULL never means “assumed default.”

Genres

Each book can be tagged with one or more genres drawn from a controlled 17-category vocabulary:

YA fiction · adult romance · fantasy · literary fiction · classic literature · graphic novel · picture book · middle grade · memoir · nonfiction · sex education · poetry · science fiction · horror · historical fiction · LGBTQ fiction · LGBTQ nonfiction · YA verse novel

Genre tags are populated from OpenLibrary subject data, filtered through a title-match validation step so a subject tag from a different book with a similar title is not carried over incorrectly. Books can carry multiple tags (e.g. a book might be both YA fiction and LGBTQ fiction). A book with no tags has no genre value recorded — the absence reflects gaps in OpenLibrary coverage, not a claim about the book.

Author demographics

Two fields record whether an author has publicly self-identified as a person of color or as LGBTQ.

“Not publicly identified” does NOT mean “white” or “straight.”

Any analysis using these fields must present three buckets — yes, no, and not publicly identified — and must not collapse the third bucket into either of the first two.

A “yes” classification is recorded only when the author has publicly self-identified in a verifiable source: their own author website, a published interview, their Wikipedia article citing such a source, or a publisher bio. Every classification has a citation. A “no” classification is recorded only when the author has publicly identified as not belonging to the category (rare and usually explicit). Otherwise the classification is left blank as “not publicly identified.”

We do not infer demographics from names, photos, assumed biography, or any other indirect signal. This is an ethical constraint, not just a data-quality one.

Published analyses

Formal analyses based on the database are written up as dated Markdown documents. Each one specifies the query, the ban category segmentation used, and the caveats that apply. Current analyses:

Reinstatement rate— segmented by category: school 0.9%, prison 0%, public library 97.6%. The public-library figure is driven by a small number of high-profile reversals in well-publicized cases and should not be read as predictive for prospective challenges.
Cross-district co-occurrence— 83 books have been banned in 5 or more districts, with a peak of concurrent activity in Q2 2024.
Genre analysis— 45% of bans citing “sexual content” target YA fiction. Adult romance is a small share of the sexual-content category despite its prominence in public debate.
Author demographics— supports a two-mechanism model of book bans: mass removals driven by state legislation that sweep up broad title lists, and targeted local challenges that concentrate on authors from marginalized communities.

What we DON’T include

Private school book decisions. Private schools are not subject to public records laws and can restrict any materials for any reason. We focus on public institutions.
Publisher decisionsto discontinue a title, unless it’s in response to a specific challenge we can document.
Parental decisions about what their own children can read. A parent opting their own child out of a book is not a ban.
Curriculum changes that remove a book from a required reading list but leave it available in the library. These are tracked as curriculum changes, not bans, unless they are explicitly framed as removals.
Bans enforced outside the United States. Scope is currently US-only. International tracking may be added later.

Known limitations and biases

Being explicit about where our data might fall short:

Geographic skew toward well-covered states. States with active journalism, strong FOIA traditions, and aggressive advocacy organizations (Florida, Texas, Iowa) are over-represented. States where bans happen quietly and receive little coverage are under-represented.
Language bias. Our news monitoring currently covers English-language sources only. Bans covered in Spanish, Vietnamese, or other US-prevalent languages are less likely to be captured.
Recency bias. Older bans (pre-2020) have weaker coverage because our systematic news monitoring started in 2025 and backfilled historical data is limited.
LLM extraction errors. Claude occasionally misreads a news article — for example, attributing a ban to the wrong jurisdiction or assigning the wrong status. Human review catches most of these, but not all. If you spot one, please report it via our Corrections Policy.
Overreliance on the FL DoE reports.Because Florida is one of the few states that mandates statewide reporting, our Florida coverage is exceptionally detailed while coverage of peer states (Texas, Tennessee) depends on journalism, advocacy trackers, or research task acquisitions. This is not because Florida has more bans per capita; it’s because Florida has better reporting infrastructure.

Verification

Every ban record carries a verification marker noting whether it has been reviewed, who reviewed it, and when. For bulk imports (e.g. FL DoE), verification is credited to the original publishing body. For admin-reviewed news articles, verification is credited to the human reviewer.

We do not currently run automated fact-checking against external databases. Accuracy audits against other book-ban trackers are planned for future releases.

Contact and corrections

If you find an error, believe a record is out of date, or want to contribute data for a jurisdiction we’re missing, see our Corrections Policy or email hello@bookhoardapp.com.