ContextGem: Effortless LLM extraction from documents
-
Updated
Mar 16, 2026 - Python
ContextGem: Effortless LLM extraction from documents
Use LLMs to robustly extract web data
Simplifies the retrieval, extraction, and training of structured data from various unstructured sources.
🕵️♂️ Privacy-focused AI job scraper, local storage, and interactive dashboard. Auto-scrapes AI/ML roles from top companies using ScrapeGraph-AI + LLM and LangGraph Agents, filters for relevance, and provides a Streamlit UI for tracking applications. Built for developers seeking AI careers.
HTML to Markdown with CSS selector and XPath annotations
Lightfeed SDK to search and filter web data
Schema-first LLM extraction framework with entity grounding, multi-pass extraction, and deterministic post-processing
CORSA is a Python tool for scraping, cleaning, and analyzing AUA course data from SONIS and GenEd sources.
Pipeline automatizzata per la ricerca di acceleratori in Europa e l'analisi dei portfolio startup.
Generates FAQs for any website using Firecrawl
AI-agent-driven venue governance database. Extracts editorial boards and program committees from journal websites using local LLMs, with entity resolution against OpenAlex.
Add a description, image, and links to the llm-extraction topic page so that developers can more easily learn about it.
To associate your repository with the llm-extraction topic, visit your repo's landing page and select "manage topics."