ArchiveBox
ArchiveBox is an open-source, self-hosted web archiving tool that saves websites offline in multiple formats from URLs, bookmarks, and other sources.
Description
ArchiveBox is an open-source, self-hosted web archiving solution designed to preserve internet content by collecting, saving, and viewing websites offline. It takes URLs from various sources like browser history, bookmarks, Pocket, Pinboard, and more, and saves them in multiple redundant formats including HTML, JavaScript, PDFs, media files, and WARC archives. The tool addresses the issue of link rot and content degradation on the internet by allowing users to maintain control over their data, archiving both public and private content. It can be deployed via Docker, pip, or other package managers, and offers a CLI, web UI, Python API, and SQLite3 access for managing archives. ArchiveBox uses standard tools like Chrome, wget, and yt-dlp, storing data in ordinary files and folders for long-term durability and accessibility without proprietary formats.
Features
- Free and open-source with self-hosting capabilities for data privacy and control.
- Supports input from various sources: URLs, browser history, bookmarks, Pocket, Pinboard, RSS feeds, and more via a browser extension.
- Saves content in multiple redundant formats: HTML, PDF, PNG, WARC, MP3/MP4, git clones, and article text extraction.
- Usable as a CLI tool, self-hosted web app, Python library, or one-off command with Docker, pip, apt, and other installation methods.
- Includes features like scheduled importing, tagging, configurable archiving behavior, and support for private or authenticated content.
- Uses standard, durable formats and tools (e.g., Chrome, wget, yt-dlp) for long-term preservation without complex proprietary systems.
Benefits
- Preserves internet content against link rot and degradation, ensuring access to important web pages over time.
- Enables archiving of both public and private content, offering more control than centralized services like Archive.org.
- Provides data ownership and privacy through self-hosting, reducing reliance on third-party platforms.
- Supports a wide range of content types, including social media, videos, articles, and source code, for comprehensive archiving.
- Facilitates legal evidence preservation, research data collection, and personal bookmark backup with customizable configurations.
- Offers flexibility with multiple usage modes (CLI, web UI, API) and installation options for different environments and skill levels.
Links
- Home: https://archivebox.io
- Source code: https://github.com/ArchiveBox/ArchiveBox
- Open Source
- ✅
- European
- ❌
- Source code
- https://github.com/ArchiveBox/ArchiveBox