gitmyhub

kronikier

Python ★ 52 updated 6d ago

🗄️ Get historical contacts for a website from web.archive.org snapshots

CLI tool that extracts historical email and phone numbers from archived website snapshots via the Wayback Machine, with timeline data and CSV export.

PythonWayback Machine APICSV exportCommand-line interfacesetup: easycomplexity 3/5

kronikier is a Python command-line tool for recovering historical contact information from archived versions of websites. It queries the Wayback Machine (web.archive.org), which preserves snapshots of web pages over time, and extracts email addresses and phone numbers that appeared on a given domain across all available snapshots. The main use case is research or investigation where a website has since removed its contact details, been taken down, or changed ownership.

You give it a domain name and it works through the available snapshots, pulling out any email or phone number it finds. Results include when each contact first appeared and when it was last seen, so you get a timeline rather than just a list. Output goes to a summary table in the terminal and a CSV file saved to the current directory. The CSV includes nine columns, with both a normalized version of each contact and the raw text as it appeared on the page, which matters for phone numbers where reformatting can introduce errors.

The tool tries to behave politely toward the Internet Archive. It defaults to four requests per second and caches downloaded snapshots on disk so that rerunning a scan does not re-fetch content you already have. The cache lives in a folder in your home directory and can be disabled or cleared through flags.

For investigations, you can scan a single URL to see how one page changed over time, run an exhaustive scan that covers every URL the host ever had, or feed in a file of multiple targets for batch processing.

The README notes that all data extracted is already public, that contacts found in old snapshots may no longer be associated with the domain, and that the tool is intended for use within legal investigations. The test suite uses the Theranos and Enron domains as live regression tests, since their archived contacts are documented historical record. The license is MIT.

Where it fits