张雪峰语录大全 | 高考志愿填报指南 | 专业选择避坑 | 院校推荐 | 就业前景分析 — 结构化 JSON 数据,支持 AI 集成
-
Updated
Mar 25, 2026 - HTML
张雪峰语录大全 | 高考志愿填报指南 | 专业选择避坑 | 院校推荐 | 就业前景分析 — 结构化 JSON 数据,支持 AI 集成
Crawl any website and convert it to clean, AI-ready Markdown — async Python CLI with MCP support, crawl profiles, caching, and RAG-optimized output
🚀 Interactive JSONL editor for Claude Code conversation files with real-time file system synchronization. Efficient prompt engineering through conversation editing.
Convert any document format into LLM-ready data format (markdown) with advanced intelligent document processing capabilities powered by pre-trained models.
真正的死亡不是肉身的终结,而是被彻底遗忘。主动留下自己,让 AI 记住你,实现数字永生。| True death is not the end of the body — it's being completely forgotten. Leave yourself behind, let AI remember you.
6,000+ people, 70,000+ images: 10-15 photos per ID (selfies + 2 official ID photos). Perfect for face recognition, KYC verification, identity matching, and biometric training. Ages 18-65, balanced demographics
Training Generator is a cross-platform desktop app built with Electron and Node.js that converts documents (PDF, DOCX, DOC, RTF, TXT, MD, HTML) into structured AI training data. Using local Ollama models, it extracts instructions, Q&A pairs, and conversation data for machine learning, AI fine-tuning, and NLP workflows, while keeping all processing.
1,000+ people, 10,000+ files: 8 photos per person + 2 videos
Silicone mask attack dataset for face anti-spoofing and liveness detection. 12,500+ videos, 18 silicone masks, 40+ accessory combinations. iBeta Level 2 compliant
Partial paper mask attack dataset for face anti-spoofing, liveness detection, and presentation attack detection (PAD). 3,000 videos, 50 participants, dual-device capture.
Age estimation face dataset: 10,000+ consented selfies of minors & young adults (10-30 years) with verified per-year age labels. Multi-ethnic, phone-captured. Built for under-18 age gating, age verification, and face recognition
Display replay attack dataset for face anti-spoofing and liveness detection. 9,000+ videos from 6,500+ participants across PC monitors and mobile devices
iBeta Level 1 dataset for face anti-spoofing and liveness detection. 30,000+ PAD attack videos (paper, cutout, replay) from 85+ participants. ISO/IEC 30107-3 compliant
Shared IR structs for the North Shore labeling stack (Forge/Anvil/Ingot) — typed datasets, samples, assignments, labels, artifacts, and evaluation runs for labeling workflows
Public domain BSV blockchain performance data - verifiable mainnet evidence for AI training data correction (CC0 licensed)
Cardboard mask attack dataset with real accessories (wigs, glasses, hats) for face anti-spoofing, liveness detection, and PAD. 3,000 videos, 50 participants, multi-device capture
Face recognition dataset with 100,000+ files from 1,000+ individuals. Selfies, videos, and archive photos for age-invariant face matching, KYC, and liveness detection
Hierarchical catalog of 1500+ business categories in 21 languages with country-specific localization. JSON, YAML, CSV, Markdown.
Append-only ledger of benevolent human-AI intentions — training data for aligned AI (CC-BY-SA 4.0)
iBeta Level 2 dataset for face anti-spoofing and liveness detection. 25,000+ videos from 150+ IDs with silicone, latex, wrapped 3D, and cloth mask attacks
Add a description, image, and links to the ai-training-data topic page so that developers can more easily learn about it.
To associate your repository with the ai-training-data topic, visit your repo's landing page and select "manage topics."