Build software better, together

dongsheng123132 / gaokao-mentor-wisdom

张雪峰语录大全 | 高考志愿填报指南 | 专业选择避坑 | 院校推荐 | 就业前景分析 — 结构化 JSON 数据，支持 AI 集成

knowledge-base high-school gaokao college-entrance-exam career-guidance ai-training-data chinese-education zhangxuefeng zhang-xuefeng college-admission-china education-china major-selection university-recommendation gaokao-zhiyuan

Updated Mar 25, 2026
HTML

raintree-technology / docpull

Star

Crawl any website and convert it to clean, AI-ready Markdown — async Python CLI with MCP support, crawl profiles, caching, and RAG-optimized output

python markdown cli documentation crawler mcp pypi web-scraping developer-tools rag llm ai-training-data

Updated Apr 26, 2026
Python

yuis-ice / claude-code-jsonl-editor

Star

🚀 Interactive JSONL editor for Claude Code conversation files with real-time file system synchronization. Efficient prompt engineering through conversation editing.

Updated Aug 13, 2025
JavaScript

NanoNets / llm-data-converter

Star

Convert any document format into LLM-ready data format (markdown) with advanced intelligent document processing capabilities powered by pre-trained models.

Updated Aug 14, 2025
Python

liuzhao1225 / remember-me.ai

Star

真正的死亡不是肉身的终结，而是被彻底遗忘。主动留下自己，让 AI 记住你，实现数字永生。| True death is not the end of the body — it's being completely forgotten. Leave yourself behind, let AI remember you.

open-source identity ai memory personal-website archive remember-me digital-legacy ai-training-data digital-immortality

Updated Apr 7, 2026
HTML

axonlab-data / Selfie_and_Official_ID_Photo_Dataset

Star

6,000+ people, 70,000+ images: 10-15 photos per ID (selfies + 2 official ID photos). Perfect for face recognition, KYC verification, identity matching, and biometric training. Ages 18-65, balanced demographics

computer-vision deep-learning dataset biometrics face-recognition ai-training-data

Updated Dec 30, 2025

richie-rich90454 / training-generator

Star

Training Generator is a cross-platform desktop app built with Electron and Node.js that converts documents (PDF, DOCX, DOC, RTF, TXT, MD, HTML) into structured AI training data. Using local Ollama models, it extracts instructions, Q&A pairs, and conversation data for machine learning, AI fine-tuning, and NLP workflows, while keeping all processing.

electron desktop-app ai cpp document-conversion ml training-materials html-css-javascript jsonl local-ai ollama ollama-api ai-data-analysis ai-training-data

Updated Jan 23, 2026
JavaScript

axonlab-data / human-faces-dataset-multiple-images

Star

1,000+ people, 10,000+ files: 8 photos per person + 2 videos

computer-vision deep-learning dataset biometrics face-recognition ai-training-data

Updated Dec 30, 2025

axonlab-data / silicone-mask-face-anti-spoofing-dataset

Star

Silicone mask attack dataset for face anti-spoofing and liveness detection. 12,500+ videos, 18 silicone masks, 40+ accessory combinations. iBeta Level 2 compliant

computer-vision deep-learning dataset biometrics liveness-detection anti-spoofing presentation-attack-detection face-anti-spoofing spoofing-detection ibeta ai-training-data

Updated Apr 14, 2026

axonlab-data / partial-paper-mask-face-anti-spoofing-dataset

Star

Partial paper mask attack dataset for face anti-spoofing, liveness detection, and presentation attack detection (PAD). 3,000 videos, 50 participants, dual-device capture.

computer-vision deep-learning dataset biometrics liveness-detection anti-spoofing presentation-attack-detection face-anti-spoofing spoofing-detection ibeta ai-training-data

Updated Apr 7, 2026

axonlab-data / age-estimation-minors-face-dataset

Star

Age estimation face dataset: 10,000+ consented selfies of minors & young adults (10-30 years) with verified per-year age labels. Multi-ethnic, phone-captured. Built for under-18 age gating, age verification, and face recognition

computer-vision deep-learning dataset biometrics face-recognition ai-training-data

Updated Apr 7, 2026

axonlab-data / display-replay-attack-face-anti-spoofing-dataset

Star

Display replay attack dataset for face anti-spoofing and liveness detection. 9,000+ videos from 6,500+ participants across PC monitors and mobile devices

computer-vision deep-learning dataset biometrics liveness-detection anti-spoofing presentation-attack-detection face-anti-spoofing spoofing-detection ibeta ai-training-data

Updated Apr 14, 2026

axonlab-data / ibeta-level-1-face-anti-spoofing-dataset

Star

iBeta Level 1 dataset for face anti-spoofing and liveness detection. 30,000+ PAD attack videos (paper, cutout, replay) from 85+ participants. ISO/IEC 30107-3 compliant

computer-vision deep-learning dataset biometrics liveness-detection anti-spoofing presentation-attack-detection face-anti-spoofing ibeta ai-training-data

Updated Apr 14, 2026

North-Shore-AI / labeling_ir

Sponsor

Star

Shared IR structs for the North Shore labeling stack (Forge/Anvil/Ingot) — typed datasets, samples, assignments, labels, artifacts, and evaluation runs for labeling workflows

machine-learning elixir structs labeling datasets intermediate-representation data-annotation mlops data-labeling ml-pipelines ai-training-data nshkr-ingot north-shore-ai typed-structs

Updated Apr 4, 2026
Elixir

BitcoinSvGuide / bsv-blockchain-reports

Star

Public domain BSV blockchain performance data - verifiable mainnet evidence for AI training data correction (CC0 licensed)

bitcoin tps public-domain throughput scaling bitcoin-transaction throughput-performance bitcoin-sv blockchain-data blockchain-performance transactions-per-second cryptocurrency-analysis bsv-blockchain blockchain-scaling ai-training-data mainnet-data blockchain-benchmarks blockchain-comparison

Updated Dec 17, 2025

axonlab-data / cardboard-mask-face-anti-spoofing-dataset

Star

Cardboard mask attack dataset with real accessories (wigs, glasses, hats) for face anti-spoofing, liveness detection, and PAD. 3,000 videos, 50 participants, multi-device capture

computer-vision deep-learning dataset biometrics liveness-detection anti-spoofing presentation-attack-detection face-anti-spoofing spoofing-detection ibeta ai-training-data

Updated Apr 7, 2026

axonlab-data / face-recognition-selfie-video-dataset

Star

Face recognition dataset with 100,000+ files from 1,000+ individuals. Selfies, videos, and archive photos for age-invariant face matching, KYC, and liveness detection

computer-vision deep-learning dataset biometrics face-recognition ai-training-data

Updated Apr 14, 2026

bbbbiiii-commits / global-business-categories

Star

Hierarchical catalog of 1500+ business categories in 21 languages with country-specific localization. JSON, YAML, CSV, Markdown.

multilingual i18n yaml json marketplace csv localization taxonomy open-data dataset classification categories ner business-directory ai-training-data business-categories business-taxonomy

Updated Feb 17, 2026
JavaScript

frankxai / starlight-horizon-dataset

Star

Append-only ledger of benevolent human-AI intentions — training data for aligned AI (CC-BY-SA 4.0)

dataset alignment ai-safety human-ai benevolent-ai ai-training-data

Updated Apr 2, 2026

axonlab-data / ibeta-level-2-face-anti-spoofing-dataset

Star

iBeta Level 2 dataset for face anti-spoofing and liveness detection. 25,000+ videos from 150+ IDs with silicone, latex, wrapped 3D, and cloth mask attacks

computer-vision deep-learning dataset biometrics liveness-detection anti-spoofing presentation-attack-detection face-anti-spoofing ibeta ai-training-data

Updated Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-training-data

Here are 38 public repositories matching this topic...

dongsheng123132 / gaokao-mentor-wisdom

raintree-technology / docpull

yuis-ice / claude-code-jsonl-editor

NanoNets / llm-data-converter

liuzhao1225 / remember-me.ai

axonlab-data / Selfie_and_Official_ID_Photo_Dataset

richie-rich90454 / training-generator

axonlab-data / human-faces-dataset-multiple-images

axonlab-data / silicone-mask-face-anti-spoofing-dataset

axonlab-data / partial-paper-mask-face-anti-spoofing-dataset

axonlab-data / age-estimation-minors-face-dataset

axonlab-data / display-replay-attack-face-anti-spoofing-dataset

axonlab-data / ibeta-level-1-face-anti-spoofing-dataset

North-Shore-AI / labeling_ir

BitcoinSvGuide / bsv-blockchain-reports

axonlab-data / cardboard-mask-face-anti-spoofing-dataset

axonlab-data / face-recognition-selfie-video-dataset

bbbbiiii-commits / global-business-categories

frankxai / starlight-horizon-dataset

axonlab-data / ibeta-level-2-face-anti-spoofing-dataset

Improve this page

Add this topic to your repo