Skip to content
#

html-extractor

Here are 15 public repositories matching this topic...

A Java-based server leveraging Apache Tika to extract content and metadata from files (PDF, DOCX, TXT, etc.) in a local files-to-extract directory. Supports HTML (with CSS styling) and text extraction, file listing, and metadata retrieval via MCP-compliant tools and REST APIs. Built with Spring Boot, Jetty, and MCP SDK.

  • Updated Aug 30, 2025
  • Java

Public repository for the SavedPixel HTML CSS JS Extractor Chrome extension: pick webpage elements and export clean HTML, CSS, fonts, images, optional scripts, and Markdown locally.

  • Updated May 21, 2026
  • JavaScript

Improve this page

Add a description, image, and links to the html-extractor topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the html-extractor topic, visit your repo's landing page and select "manage topics."

Learn more