Skip to content

PeterM45/email-scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

email-scrape

Toolkit for extracting email addresses from HTML content and remote websites.

Installation

pnpm add email-scrape

Usage

import { scrapeEmailsFromWebsite, scrapeEmailFromWebsite, extractEmails, } from "email-scrape"; // Extract emails from a string const emails = extractEmails("Contact us at hello@example.com"); // Fetch a webpage and return ranked list of emails // Automatically checks contact/about pages for more emails const ranked = await scrapeEmailsFromWebsite("https://example.com"); // Skip contact page discovery for faster scraping const main = await scrapeEmailsFromWebsite("https://example.com", { followContactPages: false, }); // Convenience helper returning the single highest-ranked email const top = await scrapeEmailFromWebsite("https://example.com");

Features

  • Smart email validation: Rejects malformed emails and text that looks like emails but isn't properly formatted
  • Contact page discovery: Automatically finds and scrapes /contact, /about, and similar pages for additional email addresses
  • Ranked results: Returns emails sorted by source quality (mailto links ranked highest, then structured data, then plain text)
  • Keyword boosting: Emails containing keywords like "support", "contact", "info" get higher rankings

Options

scrapeEmailsFromWebsite(url, options)

  • fetch: custom fetch implementation (defaults to global fetch).
  • signal: abort signal to cancel the request.
  • userAgent: override the default user-agent string.
  • headers: additional headers to merge with defaults.
  • followContactPages: if true (default), automatically discovers and scrapes contact/about pages for additional emails.

Scripts

pnpm clean # remove dist/coverage artifacts pnpm lint # run Biome linting pnpm format # format code with Biome pnpm check # lint + format + auto-fix pnpm test # run unit tests pnpm test:integration # run integration tests (hits live websites) pnpm test:all # run all tests pnpm changeset # create a changeset for version bump pnpm release # publish using changesets

Publishing

The project uses Changesets for version management and npm provenance for secure, transparent publishing.

Automated Publishing (Recommended)

  1. One-time setup (if you haven't already):

    • Go to npmjs.com → Account Settings → Access Tokens
    • Create a new Automation token (granular access token with publish permission)
    • In your GitHub repo: Settings → Secrets and variables → Actions → New repository secret
    • Name it NPM_TOKEN and paste your token
    • The workflow now uses this with npm provenance for secure publishing
  2. To publish a new version:

    pnpm changeset # Describe your changes and choose semver bump (patch/minor/major) git add .changeset/* git commit -m "Add changeset for new feature" git push
  3. The CI workflow automatically:

    • Detects the changeset
    • Bumps the version in package.json
    • Publishes to npm with cryptographic provenance
    • Pushes version commits and tags back to the repo

Manual Publishing (if needed)

pnpm changeset version # Bump version pnpm install # Update lockfile pnpm test # Run tests pnpm release # Publish to npm

Development

pnpm install pnpm lint pnpm test

Packages

No packages published

Contributors 2

  •  
  •