Skip to content
@internetarchive

Internet Archive

The Internet Archive is "the library of the Internet", and a big supporter of Free Software.

Pinned Loading

  1. openlibrary openlibrary Public

    One webpage for every book ever published!

    Python 6.1k 1.7k

  2. bookreader bookreader Public

    The Internet Archive BookReader

    JavaScript 1.1k 457

  3. heritrix3 heritrix3 Public

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

    Java 3.1k 780

  4. cicd cicd Public

    build & test using github registry; deploy to nomad clusters

    19 1

Repositories

Showing 10 of 266 repositories
  • bookreader Public

    The Internet Archive BookReader

    internetarchive/bookreader’s past year of commit activity
    JavaScript 1,097 AGPL-3.0 457 137 (3 issues need help) 107 Updated Dec 22, 2025
  • openlibrary Public

    One webpage for every book ever published!

    internetarchive/openlibrary’s past year of commit activity
    Python 6,062 AGPL-3.0 1,710 800 (17 issues need help) 185 Updated Dec 22, 2025
  • Sparkling Public

    Internet Archive's Sparkling Data Processing Library

    internetarchive/Sparkling’s past year of commit activity
    Scala 15 MIT 2 1 0 Updated Dec 22, 2025
  • internetarchive/internetarchivebot’s past year of commit activity
    PHP 148 AGPL-3.0 37 0 3 Updated Dec 22, 2025
  • iaux-dropdown Public

    `<ia-dropdown>` web component that displays dropdown items given any button

    internetarchive/iaux-dropdown’s past year of commit activity
    TypeScript 1 AGPL-3.0 0 0 3 Updated Dec 22, 2025
  • internetarchive/iaux-collection-browser’s past year of commit activity
    TypeScript 8 AGPL-3.0 1 2 21 Updated Dec 22, 2025
  • Zeno Public

    State-of-the-art web crawler đź”±

    internetarchive/Zeno’s past year of commit activity
    Go 356 AGPL-3.0 51 34 (2 issues need help) 11 Updated Dec 22, 2025
  • tvnews_socialmedia_mentions Public

    Google Summer of Code (GSoC) 2025 TV News Archive Social Media Mentions project

    internetarchive/tvnews_socialmedia_mentions’s past year of commit activity
    Python 0 1 0 0 Updated Dec 21, 2025
  • nomad Public

    CI/CD code to manage and deploy to Nomad clusters. CI/CD uses a GitHub Actions reusable workflow; deploy phase sends just built containers to a nomad cluster. Contains helpful aliases for devs, including "hot sync" of code into deploys

    internetarchive/nomad’s past year of commit activity
    Shell 9 3 0 0 Updated Dec 20, 2025
  • brozzler Public

    brozzler - distributed browser-based web crawler

    internetarchive/brozzler’s past year of commit activity
    Python 765 Apache-2.0 109 36 20 Updated Dec 19, 2025