Skip to content

0xSweet/awesome-llm-security-alignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 

Repository files navigation

LLM Security, Alignment & Governance Resources Awesome

A curated list of research papers, experiments, and resources related to LLM security and alignment — including prompt injection, jailbreaks, hallucinations, defenses, governance, and ethical frameworks.
Organized for reference and study.

Last Updated: 2026-03-26


Legend

  • ⭐️ Foundational — Classic / seminal papers
  • 🛡️ Practical — Standards, guides, applied resources
  • 🧪 Experimental — New methods, ongoing research
  • 📊 Dataset/Benchmark — Data resources, benchmarks
  • 🧾 Survey — Reviews, surveys, taxonomies

Citation Style Guide

  • arXiv preprints[arXiv:XXXX.XXXXX]
  • Conference papers[VENUE YEAR]
  • Journal articles[Journal Name, Year]
  • Regulations & policy docs[Official Document ID]
  • GitHub repos[GitHub]
  • Blogs / Reports[Blog] / [Report]

Table of Contents


Prompt Injection & Jailbreaks

  1. Prompt Injection
  1. Jailbreaking / Adversarial Prompts

Hallucinations & Reliability


Defense Strategies


Alignment & Safety

⭐️ Foundational

🛡️ Practical

🧾 Survey

🧪 Experimental


Mechanistic Interpretability

⭐ Foundational

🧪 Experimental

🛡️ Practical

  • TransformerLens [GitHub] — Primary library for mechanistic interpretability research.

Governance & Policy

  • 🛡️ EU AI Act [EU 2024] — Core EU regulation; includes requirements for General-Purpose AI (GPAI), risk classifications, transparency, and safety conditions for “high risk” systems.
  • 🛡️ NIST AI RMF (2023) — US voluntary framework for identifying, assessing, and managing AI risks over the lifecycle; includes a Generative AI Profile published in mid-2024.
  • 🛡️ OWASP Top 10 for LLMs — Industry standard list of major security threats specific to large language models.

Note: Regulatory / policy docs evolve fast — always check latest versions or drafts from official sources.


Surveys & Overviews


Tools & Datasets


Privacy & Data Security


Multimodal Security


Model Cards (Major AI Labs)

Anthropic

  • 🧪 Claude 3 Family [Report] — Safety evaluations and model details.
  • 🧪 Claude 4 (Opus/Sonnet) — Available via Claude.ai interface.

OpenAI

Google DeepMind

Meta

Mistral AI

xAI


Other References


Maintainer: 0xSweet
License: CC BY 4.0

About

A curated list of research papers, experiments, and resources related to LLM security and alignment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors