html-extractor

Here are 15 public repositories matching this topic...

miso-belica / sumy

Module for automatic summarization of text documents and HTML pages.

python nlp pagerank-algorithm text-extraction reduction summarization html-page summary lsa sumy textteaser summarizer html-extraction html-extractor

Updated Mar 31, 2026
Python

bookieio / breadability

Star

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

python text-mining text-extraction html-parsing html-extraction html-extractor

Updated May 9, 2024
HTML

cdimascio / essence

Star

Automatically extract the main text content (and more) from an HTML document

scraper extractor hacktoberfest webpage-extractor web-content-extractor website-extractor html-extractor

Updated Sep 1, 2022
Kotlin

zezhix / html-extractor

Star

基于行块分布函数的通用网页正文抽取算法优化，Python实现

python html-extractor

Updated Feb 17, 2020
Python

kwaziidev / textractor

Star

从html中提取正文,用于新闻类网页

go extractor extraction news-extractor article-extractor html-extractor

Updated Feb 24, 2023
Go

JanDC / css-from-html-extractor

Star

PHP library which determines which css is used from html snippets.

css php-library html-extractor

Updated Nov 7, 2019
PHP

Whomrx666 / Xtract-htmlV2

Star

Xtract-htmlV2 is a tool for getting the HTML code from the website you want and is the successor to the previous version

linux extract termux kali-linux html-extraction html-extractor termux-tool xtract-htmlv2

Updated Oct 16, 2025
Python

Whomrx666 / Xtract-html

Star

Xtract-html is a tool for extracting HTML display code from a website, which you can also use for your website.

linux html termux kali-linux html-extraction html-extractor termux-tool xtract-html

Updated Oct 16, 2025
Python

importcjj / go-readability

Star

Go package that cleans a HTML page for better readability.

go html golang text extractor text-extraction readability html2text html-extractor

Updated Aug 1, 2023
HTML

davidmillerpak / Media-Graper

Star

Media Graper is a open source tool for Linux which is developed to extract all the Images, links, Videos from a Webpage.

website scrapper linux-tools web-hacking hacking-tools image-extractor html-extractor

Updated Mar 17, 2023
Shell

chaelzvaethz / scrapeunblocker

Star

anti-bot bypass html extractor

python requests web-scraping anti-bot datadome html-extractor cloudflare-scraping scrapeunblocker scrapeunblocker-scraper bypass-protections unblocker-tool

Updated Dec 11, 2025

the-real-yey / Simple-HTML-Extractor-

Star

A simple extractor based on BeatufulSoup, You can use it to iterate through all the HTML files in the website root directory and get the text, placeholders and other text.

extractor beautifulsoup html-extractor

Updated Dec 16, 2019
Python

RayenMalouche / MCP-PDF-Extractor-server

Star

A Java-based server leveraging Apache Tika to extract content and metadata from files (PDF, DOCX, TXT, etc.) in a local files-to-extract directory. Supports HTML (with CSS styling) and text extraction, file listing, and metadata retrieval via MCP-compliant tools and REST APIs. Built with Spring Boot, Jetty, and MCP SDK.

java html pdf parser mcp extractor pdf-extractor html-extraction html-extractor pdf-extraction mcp-server modelcontextprotocol extractor-to-html

Updated Aug 30, 2025
Java

MorrisGlr / clinical-anki-generator

Star

Python CLI: Question bank vignettes → AI-enhanced Anki flashcards. Clerkship, shelf exam, and board exam prep for medical students and residents.

python html education flashcards anki medical-education active-learning anki-flashcards learning-resources html-extractor study-tool uworld medical-students openai-api board-exams llm clinical-reasoning shelf-exam clerkship

Updated May 6, 2026
Python

savedpixel / savedpixel-html-css-js-extractor

Star

Public repository for the SavedPixel HTML CSS JS Extractor Chrome extension: pick webpage elements and export clean HTML, CSS, fonts, images, optional scripts, and Markdown locally.

chrome-extension devtools developer-tools browser-extension web-archiving frontend-tools html-export html-extractor manifest-v3 markdown-export javascript-extractor css-extractor css-export dom-extraction savedpixel

Updated May 21, 2026
JavaScript

Improve this page

Add a description, image, and links to the html-extractor topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the html-extractor topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

html-extractor

Here are 15 public repositories matching this topic...

miso-belica / sumy

bookieio / breadability

cdimascio / essence

zezhix / html-extractor

kwaziidev / textractor

JanDC / css-from-html-extractor

Whomrx666 / Xtract-htmlV2

Whomrx666 / Xtract-html

importcjj / go-readability

davidmillerpak / Media-Graper

chaelzvaethz / scrapeunblocker

the-real-yey / Simple-HTML-Extractor-

RayenMalouche / MCP-PDF-Extractor-server

MorrisGlr / clinical-anki-generator

savedpixel / savedpixel-html-css-js-extractor

Improve this page

Add this topic to your repo