When government data disappears: How to cross-check, preserve, and investigate
- Elizabeth Clemons
- Oct 7
- 4 min read
Updated: Oct 9

IRE 2025 and AccessFest 2025 Tip Sheet

Anna Massoglia, anna@sunlightresearch.net, X, LinkedIn, BlueSky
Jay Hunter, jay@hunterindex.org, X, LinkedIn, Substack
Michael Nolan, michael@sunlightsearch.net, LinkedIn
Jason Leopold, jleopold15@bloomberg.net, LinkedIn
Elizabeth Clemons, elizabeth@sunlightresearch.net, LinkedIn
This tip sheet can be used to find information on the disappearance of government data and explore what you can do to keep track of data relevant to your research and reporting.
Examples of Government Data and Information Archives
DRP Bluesky: https://bsky.app/profile/datarescueproject.org
Data Rescue Tracker - datasets: https://baserow.datarescueproject.org/public/grid/Nt_M6errAkVRIc3NZmdM8wcl74n9tFKaDLrr831kIn4Â
Data Rescue Tracker - maintainers: https://baserow.datarescueproject.org/public/gallery/kIH2BAiLD6PyrEoDkekgDkpRy0U6knh8HTyIkB3Qu5oÂ
DRP Open Collective: https://opencollective.com/datashelter/projects/datarescueproject
Data Lumos: https://archive.icpsr.umich.edu/datalumos/homeÂ
Data Rescue Event Toolkit: https://osf.io/zbdxt/Â
Questions about the Data Rescue Project? Reach out to Mikala Narlock at mnarlock@iu.edu.
GW’s National Security Archive
Hunter Index (politicians’ personal financial information)
Requests (FOIA)Â
Want to monitor a specific webpage for new updates? It’s now easy to get alerts when a page — or even just a specific part of a page — has changed, thanks to Klaxon Cloud. The Add-On builds on the Marshall Project’s original Klaxon site monitoring tool to let you specify a page to watch and then get email alerts when the part of the page you care about — maybe a list of documents, a key official’s biography, or a daily count on inmates — changes.
It also integrates with the Internet Archive’s Wayback Machine for page snapshots, creating a history of tracked pages update, change and even disappear over time, giving you a copy of each version of the page along the way.
To use it, just log in to DocumentCloud and pull up the Klaxon Add-On. You can pin it by clicking the thumbtack icon to make it easier to access down the line — pinned Add-Ons appear on the left-hand sidebar.
Klaxon is great if you just want to keep tabs on when a web page updates, but DocumentCloud is most useful if you have documents to actually analyze. Fortunately, the Scraper Add-On will fetch all the linked documents on a given page and drop them into your DocumentCloud account for safe keeping. You can optionally specify a project to put them in.
Questions? Contact MuckRock's Dillon Bergin at dillon@muckrock.com.
527 Explorer (IRSÂ
Nonprofit Explorer (IRS Form 990s)
Data Store (static, historical)
OpenSecrets.org (campaign finances, lobbying)
Alternative source for health data (non-government)
Association of Health Care Journalists' Health Journalism Data
KFF (for health policy research, polling and news)
Government data is disappearing before our eyes, The Hill, Anna Massoglia, March 19, 2025
Tools for Archiving Content
Perma.cc (free for journalists)
Conifer Webrecorder by Rhizome (more complex websites)
ArchiveBox (self-hosted)
Tools for Archiving Data
Scraping code: https://github.com/m-nolan/doge-scrapeÂ
BLN Updating code: https://github.com/biglocalnews/sync-doge-scrape/
Creating web scrapers
Finding undocumented APIs: https://inspectelement.org/apis.html
Tools for Change Detection
Visualping – Visual/text change tracking with alerts, Chrome extension (freemium)
WebSite-Watcher – Advanced local tool for Windows ($$)
Distill.io – Content tracking with local app, browser, Chrome extension (freemium)
PageCrawl.io – Team-friendly archiving and alerts (freemium)
Wachete – Tracks private/password-protected pages (freemium)
ChangeTower – Keyword and content change alerts (freemium)
Fluxguard – HTML, visual and text tracking, translation (freemium)
Follow That Page – Basic email alerts for text changes
SiteDelta – Firefox-only in-browser tracker
KeyCDN Tools – check HTTP header for when page was last modified
Tips for Archiving
CURATE(D) Steps: z.umn.edu/curate
Data Primers: https://datacuration.network/outputs/data-curation-primers/Â
Asynchronous learning modules: https://datacurationnetwork.github.io/CURATED/Â
Curating for Data Rescue: https://datacuration.network/2025/02/05/curating-for-data-rescue/Â
DCN Slides: Becoming a Data Preserver via Curating Data
Questions about the Data Curation Network? Contact Sophia Lafferty-Hess at sophia.lafferty.hess@duke.edu.
Creating Web Scrapers
Finding undocumented APIs: https://inspectelement.org/apis.html
DOGE Scraper Github Links
Scraping code: https://github.com/m-nolan/doge-scrapeÂ
BLN Updating code: https://github.com/biglocalnews/sync-doge-scrape/
Articles written with BLN DOGE tables
University of Minnesota: Cuts, halted grant reviews could be ‘absolutely crippling’ to research, MinnPost, March 03, 2025
Unpacking DOGE’s plan to cut 14 federal office leases across Minnesota, MinnPost, March 19, 2025
How much federal money flows into Minnesota for health care, education, agriculture?, MinnPost, March 31, 2025
Evanston woman says her contract is among DOGE website errors, NBC Chicago, March 07, 2025
'We got un-DOGE'd': Inaccurate info discovered in claimed taxpayer savings, NBC Boston, March 14, 2025
From DOGE cuts to tariffs, see Trump's first 100 days by the numbers, NBC10 Philadelphia, April 29, 2025
DEI, Project 2025 and the Constitution: Tracking Trump's impact in his first 100 days, NBC4 Washington, April 30, 2025
DOGE claims at least $117 million in Bay Area contract cuts, spurring layoffs and uncertainty, NBC Bay Area, May 08, 2025
EV uncertainty: How Trump's charging station funding freeze in affecting the DC area, NBC4 Washington, May 15, 2025
Other Sunlight Research Center Resources
Questions or comments about Sunlight's workshops and resources? Contact Elizabeth at elizabeth@sunlightresearch.net.

