Extract business contact emails from live websites with a Chrome Extension that performs deep DOM scanning, de-obfuscation, and background contact-page discovery.
HTML Email Parser is a lightweight browser automation utility for lead generation, sales operations, and business development workflows. The extension scans the active tab for contact information, normalizes obfuscated email patterns, and falls back to common contact-related routes such as /contact, /about, and /support when the landing page does not expose a visible mailbox.
Important
This repository contains a Chrome Extension, not a general-purpose logging library. The documentation below reflects the actual implementation shipped in this codebase.
- Features
- Tech Stack & Architecture
- Getting Started
- Testing
- Deployment
- Usage
- Configuration
- License
- Contacts & Community Support
- Deep client-side email extraction from visible text nodes and rendered DOM content.
- Attribute-aware scanning for
href,title,placeholder,alt,aria-label,data-email,data-contact,src, andsrcsetvalues. mailto:URL cleanup, query-string stripping, and result normalization before presentation.- De-obfuscation of common anti-scraping formats such as
name [at] domain [dot] com,name (at) domain (dot) com, and Unicode full-width separators. - Cloudflare
data-cfemaildecoding to recover addresses protected by email obfuscation. - Reverse-string detection for mailboxes embedded as reversed text blobs.
- Concatenated string recovery for email addresses split across JavaScript-style quoted fragments.
- False-positive suppression for image assets, hashed strings, noisy telemetry artifacts, and bundled resource names.
- Business-oriented result prioritization that promotes high-value inboxes such as
ads@,marketing@,sales@,contact@,info@, andsupport@. - Background augmentation that fetches common contact-oriented routes from the current origin without opening extra tabs.
- Root-page aware fallback behavior that expands scanning when the current page is likely a homepage or no email is discovered locally.
- Time-bounded execution model with a hard stop to reduce the risk of runaway scans on large or hostile pages.
- Popup-driven UX with email listing, clipboard copy,
mailto:launch, and manual refresh controls. - Manifest V3-compatible packaging with a service worker background script.
- Zero build-step local development workflow: clone, load unpacked, and test immediately in Chrome.
- JavaScript (plain ES2019+ style browser scripting).
- Chrome Extension Manifest V3 APIs.
- DOM traversal via
TreeWalker,querySelectorAll, andDOMParser. - Browser
fetch()for same-origin background HTML retrieval. - Static HTML and CSS for popup rendering.
- Apache License 2.0 for open-source distribution.
HTML-Email-parser/
├── LICENSE
├── README.md
├── background.js
├── content.js
├── manifest.json
├── popup.html
├── popup.js
├── styles.css
└── icons/
└── icon128.png
The extension injects content.js into the active tab, keeping extraction logic close to the live DOM. This makes it possible to inspect text nodes, attributes, and rendered page content without requiring external services.
If the page yields too few results, the scanner requests additional HTML from common contact routes on the same origin. This improves recall for sites that keep inboxes off the homepage.
Note
Contact-page augmentation only targets same-origin candidate paths already defined in the extension source. It does not crawl arbitrary domains.
The extraction logic uses layered normalization steps to maximize signal quality:
- Decode HTML entities.
- Normalize Unicode separators and invisible characters.
- Translate obfuscated
atanddotpatterns. - Remove
mailto:prefixes and trailing query strings. - Filter out image-like and garbage-like false positives.
- Score and sort business-relevant addresses.
A runtime ceiling prevents the scanner from consuming unbounded page resources. This is especially useful on very large documents, SPAs, or pages with deeply nested markup.
Warning
The scanner intentionally caps results and runtime. If you need exhaustive bulk extraction, you may need to tune the constants in content.js and validate performance on representative sites.
background.js is intentionally minimal because the extension primarily relies on popup and content-script messaging. This keeps the service worker easy to audit and compatible with Chrome's MV3 lifecycle model.
flowchart TD
A[User opens extension popup] --> B[popup.js queries active tab]
B --> C[popup.js injects content.js]
C --> D[Scan current DOM and attributes]
D --> E{Enough emails found?}
E -- Yes --> F[Normalize sort and send results]
E -- No --> G[Fetch same-origin contact paths]
G --> H[Parse fetched HTML with DOMParser]
H --> F
F --> I[chrome.runtime.sendMessage]
I --> J[popup.js renders prioritized list]
J --> K[User copies email or opens mailto]
| Component | Responsibility |
|---|---|
manifest.json |
Declares permissions, popup entrypoint, icons, and background worker. |
content.js |
Performs extraction, de-obfuscation, fallback fetching, sorting, and messaging. |
popup.js |
Injects the scanner, renders results, and exposes copy / mail actions. |
popup.html |
Defines popup layout and static UI shell. |
styles.css |
Styles the popup list, buttons, metadata, and responsive behavior. |
background.js |
Maintains a minimal MV3 service worker message responder. |
Before loading the extension, ensure you have the following:
- Google Chrome or another Chromium-based browser with Manifest V3 support.
- Access to
chrome://extensionsin developer mode. - Git, if you plan to clone the repository instead of downloading a ZIP archive.
- No Node.js, Python, Docker, or package manager installation is required for the current codebase.
git clone https://github.com/<your-account>/HTML-Email-parser.git
cd HTML-Email-parser- Open Chrome and navigate to
chrome://extensions. - Enable
Developer mode. - Click
Load unpacked. - Select the repository root containing
manifest.json. - Confirm that the extension appears in the extension list with version
1.0.11.
Tip
Pin the extension to the browser toolbar while testing. It makes iterative scans and manual refreshes much faster.
This repository does not currently include an automated unit test suite, integration harness, or linter configuration. Validation is primarily manual and browser-based.
- Load the unpacked extension in Chrome.
- Visit a page that exposes a visible email address.
- Open the popup and confirm the address appears in the list.
- Visit a site whose homepage does not show email addresses but whose
/contactor/aboutroute does. - Use the popup refresh button to verify fallback retrieval still returns results.
- Test an obfuscated address format such as
hello [at] company [dot] com. - Validate that
Copywrites to the clipboard andMailopens the system mail client.
Use the following commands to validate the static project files locally:
python -m json.tool manifest.jsongit diff -- README.md manifest.json popup.html popup.js content.js background.js styles.cssNote
If you plan to expand this project, adding an end-to-end Playwright or Puppeteer test harness would provide stronger regression protection for popup messaging and content-script behavior.
Because this project is a browser extension, deployment typically means packaging and distributing the extension rather than deploying a long-running server process.
- Verify the extension version in
manifest.json. - Confirm icons, popup assets, and scripts are present at the repository root.
- Run a final manual smoke test in Chrome.
- Create a ZIP archive of the extension contents for release distribution.
Example packaging command:
zip -r html-email-parser.zip manifest.json background.js content.js popup.html popup.js styles.css icons LICENSE README.md- Increment the version before each published release.
- Review requested permissions and remove anything unnecessary.
- Confirm all strings shown to end users are production-ready.
- Re-test on representative target sites, especially sites that use obfuscated or dynamically rendered contact information.
The repository does not currently include a CI workflow. If you introduce one, a pragmatic pipeline would include:
- JSON validation for
manifest.json. - Static syntax checks for JavaScript files.
- Optional browser-based smoke tests for popup injection and result rendering.
- Artifact packaging for release candidates.
Caution
Chrome Extension releases are sensitive to permission changes. Any additional host permissions or API access may require renewed user trust and store review scrutiny.
- Open a target website in Chrome.
- Click the extension icon.
- Wait for the scan to complete.
- Review the discovered emails in the popup.
- Click
Copyto copy an address orMailto open amailto:link. - Click
Обновитиif you want to force a fresh scan.
// popup.js
// Query the active tab, inject the content script, and render results
const [tab] = await chrome.tabs.query({ active: true, currentWindow: true });
await chrome.scripting.executeScript({
target: { tabId: tab.id },
files: ["content.js"]
});
chrome.runtime.onMessage.addListener((msg, sender) => {
if (msg?.action === "saveEmailsForTab") {
renderList(msg.emails || []);
}
});The popup acts as the UI orchestrator. It resolves the active tab, injects the scanner, listens for saveEmailsForTab messages, and refreshes the rendered email list when new results arrive.
// content.js
// Extract emails from the current document, then optionally fetch contact pages
let merged = new Set(scanCurrentDocument() || []);
if (merged.size < MAX_EMAILS) {
const contactEmails = await fetchAndScanContactPages(window.location.origin);
contactEmails.forEach(email => merged.add(email));
}
sendResults(Array.from(merged), true);This two-phase model keeps the local page scan fast while still giving the extension a recovery path when the homepage does not expose contact details directly.
sales@example.com
hello [at] company [dot] io
marketing (at) startup . co
moc.elpmaxe@ofni
Tip
For the highest-quality results, test the extension on root pages first. The fallback contact-path strategy is most effective when the current page shares the same origin as the site's public contact pages.
The project does not currently expose a user-editable .env file, CLI startup flags, or a packaged configuration module. Configuration is implemented directly in the source files.
The extension requests the following capabilities:
| Permission | Why it is needed |
|---|---|
activeTab |
Allows the extension to work against the currently focused browser tab. |
scripting |
Enables runtime injection of content.js. |
clipboardWrite |
Supports the popup copy-to-clipboard action. |
tabs |
Allows popup logic to inspect the active tab title and URL. |
host_permissions: <all_urls> |
Permits scanning and same-origin contact-page fetching on arbitrary websites opened by the user. |
The main runtime tuning values live in content.js:
| Constant | Default | Purpose |
|---|---|---|
MAX_EMAILS |
4 |
Limits how many emails are needed before fallback behavior can stop. |
MAX_RUN_MS |
60000 |
Caps total scanner runtime to approximately 60 seconds. |
VERBOSE |
window.__EMAIL_FINDER_VERBOSE |
Enables diagnostic console logging when truthy. |
The scanner combines multiple passes:
- Text-node traversal with
TreeWalker. - Element-level inspection across common text-bearing tags.
- Attribute extraction for likely contact-bearing properties.
- HTML entity decoding and Unicode normalization.
- Style-block scanning for email-like fragments embedded in CSS or inline payloads.
- Same-origin fetches against a curated contact-path allowlist.
The current implementation attempts these paths when additional discovery is needed:
/contact
/contact/
/contact-us
/contact-us/
/contacts
/contacts/
/about
/about/
/support
/support/
/help
/help/
The popup currently includes a Ukrainian refresh label and helper tip:
- Button label:
Обновити - Footer hint:
Tip: Якщо пошта одразу не знайшлась, натисніть "Обновити"
If you want a fully English UX, update the static strings in popup.html and any corresponding text logic in popup.js.
This project is distributed under the Apache License 2.0. See LICENSE for the full legal text.
If you find this tool useful, consider leaving a star on GitHub or supporting the author directly.