The Internet is Forever: How to uncover hidden information
- Elizabeth Clemons
- Oct 10
- 5 min read

AccessFest 2025 Tip Sheet

Sadie Brown, sbrown@schnepsmedia.com, LinkedIn
Michael Nolan, michael@sunlightsearch.net, LinkedIn
Audrey Nielsen, audrey@sunlightresearch.net, LinkedIn
Dillon Bergin, dillon@muckrock.com, LinkedIn
Elizabeth Clemons, elizabeth@sunlightresearch.net, LinkedIn
Individuals and organizations are increasingly scrubbing their digital footprints, but journalists can often still track down what has been removed. This tip sheet provides you with tools to uncover deleted records, scrubbed public data, and hidden connections using investigative techniques and open-source intelligence.
Removed and Changed Sites
Tools for change detection
Visualping – Visual/text change tracking with alerts, Chrome extension (freemium)
WebSite-Watcher – Advanced local tool for Windows ($$)
Distill.io – Content tracking with local app, browser, Chrome extension (freemium)
PageCrawl.io – Team-friendly archiving and alerts (freemium)
Wachete – Tracks private/password-protected pages (freemium)
ChangeTower – Keyword and content change alerts (freemium)
Fluxguard – HTML, visual and text tracking, translation (freemium)
Follow That Page – Basic email alerts for text changes
SiteDelta – Firefox-only in-browser tracker
KeyCDN Tools – check HTTP header for when page was last modified
Requests (FOIA)
Want to monitor a specific webpage for new updates? It’s now easy to get alerts when a page — or even just a specific part of a page — has changed, thanks to Klaxon Cloud. The Add-On builds on the Marshall Project’s original Klaxon site monitoring tool to let you specify a page to watch and then get email alerts when the part of the page you care about — maybe a list of documents, a key official’s biography, or a daily count on inmates — changes.
It also integrates with the Internet Archive’s Wayback Machine for page snapshots, creating a history of tracked pages update, change and even disappear over time, giving you a copy of each version of the page along the way.
To use it, just log in to DocumentCloud and pull up the Klaxon Add-On. You can pin it by clicking the thumbtack icon to make it easier to access down the line — pinned Add-Ons appear on the left-hand sidebar.
Klaxon is great if you just want to keep tabs on when a web page updates, but DocumentCloud is most useful if you have documents to actually analyze. Fortunately, the Scraper Add-On will fetch all the linked documents on a given page and drop them into your DocumentCloud account for safe keeping. You can optionally specify a project to put them in.
Questions? Contact MuckRock's Dillon Bergin at dillon@muckrock.com.d
Tracking social media and search engine record
DMCA takedowns, legal notices, de-indexing demands, cease-and-desist letters.
Government and court-ordered takedown requests to Search, YouTube, and Blogger.
Right to Be Forgotten (EU/UK only)
Personal de-indexing requests to delist under GDPR/UK data protection laws.
Video deletions, copyright strikes, policy violations, government takedowns.
Meta Transparency Center (Facebook & Instagram)
Government takedown requests, coordinated inauthentic behavior, andIP-related removals.
Wikimedia Transparency Reports
Government takedown requests, DMCA notices, and account takedowns on Wikipedia.
X Transparency Center: Removal Requests
Government and non-government requests for removal and account information
Removals/Removal Requests
RSS/Feed aggregators
Create unique or filtered RSS feeds
Location data
Scrubbed Public Data
Tools for change detection
DRP Bluesky: https://bsky.app/profile/datarescueproject.org
Data Rescue Tracker - datasets: https://baserow.datarescueproject.org/public/grid/Nt_M6errAkVRIc3NZmdM8wcl74n9tFKaDLrr831kIn4
Data Rescue Tracker - maintainers: https://baserow.datarescueproject.org/public/gallery/kIH2BAiLD6PyrEoDkekgDkpRy0U6knh8HTyIkB3Qu5o
DRP Open Collective: https://opencollective.com/datashelter/projects/datarescueproject
Data Lumos: https://archive.icpsr.umich.edu/datalumos/home
Data Rescue Event Toolkit: https://osf.io/zbdxt/
Questions about the Data Rescue Project? Reach out to Mikala Narlock at mnarlock@iu.edu.
Hunter Index (politicians’ personal financial information)
527 Explorer (IRS
Nonprofit Explorer (IRS Form 990s)
Data Store (static, historical)
OpenSecrets.org (campaign finances, lobbying)
Tips for finding missing content
The Internet Archive’s Wayback Machine can find older versions of web pages.
OSINT Framework is a good place to start if you don’t know where to look.
Use Boolean search operators to find web pages that may still exist but have been removed from indexes.
site: search a specific domain
Example: site:ire.org
AND/OR – search multiple terms
Example: “New Orleans” AND (IRE OR “Investigative Reporters”
HYPHEN/MINUS SYMBOL – exclude terms or sites
Example: “investigations” -site:ire.org
FILETYPE – search for a specific type of file
Example: filetype:pdf
ASTERISK – placeholder for word
Example: Investigative Reporters * Editors”
Tools for archiving content
Perma.cc (free for journalists)
Conifer Webrecorder by Rhizome (more complex websites)
ArchiveBox (self-hosted)
Picuki - download copies of TikTok videos
Scraping code: https://github.com/m-nolan/doge-scrape
BLN Updating code: https://github.com/biglocalnews/sync-doge-scrape/
Creating web scrapers
Finding undocumented APIs: https://inspectelement.org/apis.html
CURATE(D) Steps: z.umn.edu/curate
Data Primers: https://datacuration.network/outputs/data-curation-primers/
Asynchronous learning modules: https://datacurationnetwork.github.io/CURATED/
Curating for Data Rescue: https://datacuration.network/2025/02/05/curating-for-data-rescue/
DCN Slides: Becoming a Data Preserver via Curating Data
Questions about the Data Curation Network? Contact Sophia Lafferty-Hess at sophia.lafferty.hess@duke.edu.
Hidden Connections
Secretary of State Records
Court Records
How-to: Search the SEC EDGAR for nonprofits, private companies and individuals.
How-to: Where to start? SEC Filings
Check out this demo to learn the six SEC forms to start with.
Court Basics
Legal Information Institute dictionary: legal glossary
Court Express: for having someone get the records for you
Federal Court Resources
Use the RECAP Chrome Extension while using PACER to contribute to the RECAP archive, and see if documents you are searching for are already available for free.
CourtListener database
US Tax Court Case Search: If a company petitions a tax court to reduce their taxes, the filings here can be revealing about their finances.
United States Merit Systems Protection Board: quasi-judicial; works to protect federal merit systems and to ensure protection for federal employees against abuses by agency management
State Court Resources
Nat’l Center for State Courts (NCSC): links to state court websites
Court Reference: links to online court records by state
Personal Privacy
Campaign finance records don’t capture total spending on Amarillo abortion “travel ban” election, The Texas Tribune, Jayme Lozano Carver, Dec. 9, 2024
Sunlight Research Center contributed research to a Texas Tribune investigation of campaign spending during Amarillo's contentious “Sanctuary City for the Unborn” ballot measure. The proposition would have allowed lawsuits against people accused of helping others travel through Amarillo for out-of-state abortions. Sunlight researched campaign spending on advertisements by both proponents and opponents of the ballot measure. We found that only two groups disclosed billboard-related expenses totaling $7,300. Yet, we identified at least 21 billboard ads supporting and opposing the ballot measure that would have cost at least $20,650 to $24,300.
Political campaigns often use billboards, but campaign finance reports don't always fully disclose spending. This guide will help reporters compare reported campaign spending with actual billboard advertising to identify potential discrepancies.
Questions or comments about Sunlight's workshops and resources? Contact Elizabeth at elizabeth@sunlightresearch.net.

